{"title": "Seeing through water", "book": "Advances in Neural Information Processing Systems", "page_first": 393, "page_last": 400, "abstract": null, "full_text": " Seeing through water\n\n\n\n Alexei A. Efros Volkan Isler, Jianbo Shi and Mirko Visontai\n School of Computer Science Dept. of Computer and Information Science\n Carnegie Mellon University University of Pennsylvania\n Pittsburgh, PA 15213, U.S.A. Philadelphia, PA 19104\n efros@cs.cmu.edu {isleri,jshi,mirko}@cis.upenn.edu\n\n\n\n Abstract\n\n We consider the problem of recovering an underwater image distorted by\n surface waves. A large amount of video data of the distorted image is\n acquired. The problem is posed in terms of finding an undistorted im-\n age patch at each spatial location. This challenging reconstruction task\n can be formulated as a manifold learning problem, such that the center\n of the manifold is the image of the undistorted patch. To compute the\n center, we present a new technique to estimate global distances on the\n manifold. Our technique achieves robustness through convex flow com-\n putations and solves the \"leakage\" problem inherent in recent manifold\n embedding techniques.\n\n\n\n1 Introduction\n\nConsider the following problem. A pool of water is observed by a stationary video camera\nmounted above the pool and looking straight down. There are waves on the surface of the\nwater and all the camera sees is a series of distorted images of the bottom of the pool,\ne.g. Figure 1. The aim is to use these images to recover the undistorted image of the pool\nfloor as if the water was perfectly still. Besides obvious applications in ocean optics and\nunderwater imaging [1], variants of this problem also arise in several other fields, including\nastronomy (overcoming atmospheric distortions) and structure-from-motion (learning the\nappearance of a deforming object). Most approaches to solve this problem try to model the\ndistortions explicitly. In order to do this, it is critical not only to have a good parametric\nmodel of the distortion process, but also to be able to reliably extract features from the data\nto fit the parameters. As such, this approach is only feasible in well understood, highly\ncontrolled domains. On the opposite side of the spectrum is a very simple method used in\nunderwater imaging: simply, average the data temporally. Although this method performs\nsurprisingly well in many situations, it fails when the structure of the target image is too\nfine with respect to the amplitude of the wave (Figure 2).\n\nIn this paper we propose to look at this difficult problem from a more statistical angle. We\nwill exploit a very simple observation: if we watch a particular spot on the image plane,\nmost of the time the picture projected there will be distorted. But once in a while, when\nthe water just happens to be locally flat at that point, we will be looking straight down\nand seeing exactly the right spot on the ground. If we can recognize when this happens\n\n Authors in alphabetical order.\n\n\f\nFigure 1: Fifteen consecutive frames from the video. The experimental setup involved: a transparent\nbucket of water, the cover of a vision textbook \"Computer Vision/A Modern Approach\".\n\n\n\n\n\n Figure 2: Ground truth image and reconstruction results using mean and median\n\n\n\nand snap the right picture at each spatial location, then recovering the desired ground truth\nimage would be simply a matter of stitching these correct observations together. In other\nwords, the question that we will be exploring in this paper is not where to look, but when!\n\n\n2 Problem setup\n\nLet us first examine the physical setup of our problem. There is a \"ground truth\" image G\non the bottom of the pool. Overhead, a stationary camera pointing downwards is recording\na video stream V . In the absence of any distortion V (x, y, t) = G(x, y) at any time t.\nHowever, the water surface refracts in accordance with Snell's Law. Let us consider what\nthe camera is seeing at a particular point x on the CCD array, as shown in Figure 3(c)\n(assume 1D for simplicity). If the normal to the water surface directly underneath x is\npointing straight up, there is no refraction and V (x) = G(x). However, if the normal is\ntilted by angle 1, light will bend by the amount 2 = 1 - sin-1 ( 1 sin \n 1.33 1 ), so the\ncamera point V (x) will see the light projected from G(x + dx) on the ground plane. It\nis easy to see that the relationship between the tilt of the normal to the surface 1 and the\ndisplacement dx is approximately linear (dx 0.251h using small angle approximation,\nwhere h is the height of the water). This means that, in 2D, what the camera will be seeing\nover time at point V (x, y, t) are points on the ground plane sampled from a disk centered at\nG(x, y) and with radius related to the height of the water and the overall roughness of the\nwater surface. A similar relationship holds in the inverse direction as well: a point G(x, y)\nwill be imaged on a disk centered around V (x, y).\n\nWhat about the distribution of these sample points? According to Cox-Munk Law [2], the\nsurface normals of rough water are distributed approximately as a Gaussian centered around\nthe vertical, assuming a large surface area and stationary waves. Our own experiments,\nconducted by hand-tracking (Figure 3b), confirm that the distribution, though not exactly\nGaussian, is definitely unimodal and smooth.\n\nUp to now, we only concerned ourselves with infinitesimally small points on the image\nor the ground plane. However, in practice, we must have something that we can compute\nwith. Therefore, we will make an assumption that the surface of the water can be locally\napproximated by a planar patch. This means that everything that was true for points is now\ntrue for local image patches (up to a small affine distortion).\n\n\f\n3 Tracking via embedding\n\nFrom the description outlined above, one possible solution emerges. If the distribution of a\nparticular ground point on the image plane is unimodal, then one could track feature points\nin the video sequence over time. Computing their mean positions over the entire video will\ngive an estimate of their true positions on the ground plane. Unfortunately, tracking over\nlong periods of time is difficult even under favorable conditions, whereas our data is so fast\n(undersampled) and noisy that reliable tracking is out of the question (Figure 3(c)).\n\nHowever, since we have a lot of data, we can substitute smoothness in time with smoothness\nin similarity for a given patch we are more likely to find a patch similar to it somewhere\nin time, and will have a better chance to track the transition between them. An alternative\nto tracking the patches directly (which amounts to holding the ground patch G(x, y) fixed\nand centering the image patches V (x + dxt, y + dyt) on top of it in each frame), is to fix the\nimage patch V (x, y) in space and observe the patches from G(x + dxt, y + dyt) appearing\nin this window. We know that this set of patches comes from a disk on the ground plane\ncentered around patch G(x, y) our goal. If the disk was small enough compared to the\nsize of the patch, we could just cluster the patches together, e.g. by using translational\nEM [3]. Unfortunately, the disk can be rather large, containing patches with no overlap\nat all, thus making only the local similarity comparisons possible. However, notice that\nour set of patches lies on a low-dimensional manifold; in fact we know precisely which\nmanifold it's the disk on the ground plane centered at G(x, y)! So, if we could use the\nlocal patch similarities to find an embedding of the patches in V (x, y, t) on this manifold,\nthe center of the embedding will hold our desired patch G(x, y).\n\nThe problem of embedding the patches based on local similarity is related to the recent\nwork in manifold learning [4, 5]. Basic ingredients of the embedding algorithms are: defin-\ning a distance measure between points, and finding an energy function that optimally places\nthem in the embedding space. The distance can be defined as all-pairs distance matrix, or\nas distance from a particular reference node. In both cases, we want the distance function\nto satisfy some constraints to model the underlying physical problem.\n\nThe local similarity measure for our problem turned out to be particularly unreliable, so\nnone of the previous manifold learning techniques were adequate for our purposes. In the\nfollowing section we will describe our own, robust method for computing a global distance\nfunction and finding the right embedding and eventually the center of it.\n\n\n\n 1\n N\n Surface\n\n\n\n h 2\n\n G(x) G(x + dx)\n\n (a) (b) (c)\n\n\nFigure 3: (a) Snell's Law (b)-(c) Tracking points of the bottom of the pool: (b) the tracked position\nforms a distribution close to a Gaussian, (c): a vertical line of the image shown at different time\ninstances (horizontal axis). The discontinuity caused by rapid changes makes the tracking infeasible.\n\n\n\n\n4 What is the right distance function?\n\nLet I = {I1, . . . , In} be the set of patches, where It = V (x, y, t) and x =\n[xmin, xmax], y = [ymin, ymax] are the patch pixel coordinates. Our goal is to find a\ncenter patch to represent the set I. To achieve this goal, we need a distance function\n\n\f\nd : I I IR such that d(Ii, Ij) < d(Ii, Ik) implies that Ij is more similar to Ii than Ik.\nOnce we have such a measure, the center can be found by computing:\n\n I = arg min d(Ii, Ij) (1)\n IiI Ij I\n\nUnfortunately, the measurable distance functions, such as Normalized Cross Correlation\n(N CC) are only local. A common approach is to design a global distance function using\nthe measurable local distances and transitivity [6, 4]. This is equivalent to designing a\nglobal distance function of the form:\n\n d\n d(I local(Ii, Ij ), if dlocal(Ii, Ij) \n i, Ij ) = (2)\n dtransitive(Ii, Ij), otherwise.\n\n\nwhere dlocal is a local distance function, is a user-specified threshold and dtransitive\nis a global, transitive distance function which utilizes dlocal. The underlying assumption\nhere is that the members of I lie on a constraint space (or manifold) S. Hence, a local\nsimilarity function such as N CC can be used to measure local distances on the manifold.\nAn important research question in machine learning is to extend the local measurements\ninto global ones, i.e. to design dtransitive above.\n\nOne method for designing such a transitive distance function is to build a graph G = (V, E)\nwhose vertices correspond to the members of I. The local distance measure is used to place\nedges which connect only very similar members of I. Afterwards, the length of pairwise\nshortest paths are used to estimate the true distances on the manifold S. For example, this\nmethod forms the basis of the well-known Isomap method [4].\n\nUnfortunately, estimating the distance dtransitive(, ) using shortest path computations is\nnot robust to errors in the local distances which are very common. Consider a patch that\ncontains the letter A and another one that contains the letter B. Since they are different\nletters, we expect that these patches would be quite distant on the manifold S. However,\namong the A patches there will inevitably be a very blurry A that would look quite similar\nto a very blurry B producing an erroneous local distance measurement. When the transitive\nglobal distances are computed using shortest paths, a single erroneous edge will single-\nhandedly cause all the A patches to be much closer to all the B patches, short-circuiting\nthe graph and completely distorting all the distances.\n\nSuch errors lead to the leakage problem in estimating the global distances of patches. This\nproblem is illustrated in Figure 4. In this example, our underlying manifold S is a triangle.\nSuppose our local distance function erroneously estimates an edge between the corners of\nthe triangle as shown in the figure. After the erroneous edge is inserted, the shortest paths\nfrom the top of the triangle leak through this edge. Therefore, the shortest path distances\nwill fail to reflect the true distance on the manifold.\n\n\n5 Solving the leakage problem\n\nRecall that our goal is to find the center of our data set as defined in Equation 1. Note that,\nin order to compute the center we do not need all pairwise distances. All we need is the\nquantity dI (Ii) = d(I\n I i, Ij ) for all Ii.\n j I\n\n\nThe leakage problem occurs when we compute the values dI (Ii) using the shortest path\nmetric. In this case, even a single erroneous edge may reduce the shortest paths from many\ndifferent patches to Ii changing the value of dI(Ii) drastically. Intuitively, in order to\nprevent the leakage problem we must prevent edges from getting involved in many shortest\npath computations to the same node (i.e. leaking edges). We can formalize this notion by\ncasting the computation as a network flow problem.\n\n\f\nLet G = (V, E) be our graph representation such that for each patch Ii I, there is a\nvertex vi V . The edge set E is built as follows: there is an edge (vi, vj) if dlocal(Ii, Ij)\nis less than a threshold. The weight of the edge (vi, vj) is equal to dlocal(Ii, Ij).\n\nTo compute the value dI (Ii), we build a flow network whose vertex set is also V . All\nvertices in V - {vi} are sources, pushing unit flow into the network. The vertex vi is a sink\nwith infinite capacity. The arcs of the flow network are chosen using the edge set E. For\neach edge (vj, vk) E we add the arcs vj vk and vk vj. Both arcs have infinite\ncapacity and the cost of pushing one unit of flow on either arc is equal to the weight of\n(vj, vk), as shown in Figure 4 left (top and bottom). It can easily be seen that the minimum\ncost flow in this network is equal to dI (Ii). Let us call this network which is used to\ncompute dI (Ii) as N W (Ii).\n\nThe crucial factor in designing such a flow network is choosing the right cost and capacity.\nComputing the minimum cost flow on N W (Ii) not only gives us dI(Ii) but also allows us\nto compute how many times an edge is involved in the distance computation: the amount of\nflow through an edge is exactly the number of times that edge is used for the shortest path\ncomputations. This is illustrated in Figure 4 (box A) where d1 units of cost is charged for\neach unit of flow through the edge (u, w). Therefore, if we prevent too much flow going\nthrough an edge, we can prevent the leakage problem.\n\n\n d3/\n d1/ d\n Error u 2/c2 w\n u w d1\n v A: Shortest Path B: Convex Flow c\n d1/c1 1 c1 + c2\n\n\n\n u\n\n C: Shortest Path with Capacity\n Error \n d/ d1/c1\n v u w\n v c1\n\n w\n\n\nFigure 4: The leakage problem. Left: Equivalence of shortest path leakage and uncapacitated flow\nleakage problem. Bottom-middle: After the erroneous edge is inserted, the shortest paths from the\ntop of the triangle to vertex v go through this edge. Boxes A-C:Alternatives for charging a unit of\nflow between nodes u and w. The horizontal axis of the plots is the amount of flow and the vertical\naxis is the cost. Box A: Linear flow. The cost of a unit of flow is d1 Box B: Convex flow. Multiple\nedges are introduced between two nodes, with fixed capacity, and convexly increasing costs. The cost\nof a unit of flow increases from d1 to d2 and then to d3 as the amount of flow from u to w increases.\nBox C: Linear flow with capacity. The cost is d1 until a capacity of c1 is achieved and becomes\ninfinite afterwards.\n\n\n\nOne might think that the leakage problem can simply be avoided by imposing capacity\nconstraints on the arcs of the flow network (Figure 4, box C). Unfortunately, this is not\nvery easy. Observe that in the minimum cost flow solution of the network N W (Ii), the\namount of flow on the arcs will increase as the arcs get closer to Ii. Therefore, when we are\nsetting up the network N W (Ii), we must adaptively increase the capacities of arcs \"closer\"\nto the sink vi otherwise, there will be no feasible solution. As the structure of the graph\nG gets complicated, specifying this notion of closeness becomes a subtle issue. Further,\nthe structure of the underlying space S could be such that some arcs in G must indeed\n\n\f\ncarry a lot of flow. Therefore imposing capacities on the arcs requires understanding the\nunderlying structure of the graph G as well as the space S which is in fact the problem\nwe are trying to solve!\n\nOur proposed solution to the leakage problem uses the notion of a convex flow. We do not\nimpose a capacity on the arcs. Instead, we impose a convex cost function on the arcs such\nthat the cost of pushing unit flow on arc a increases as the total amount of flow through a\nincreases. See Figure 4, box B.\n\nThis can be achieved by transforming the network N W (Ii) to a new network N W (Ii).\nThe transformation is achieved by applying the following operation on each arc in\nN W (Ii): Let a be an arc from u to w in N W (Ii). In N W (Ii), we replace a by k\narcs a1, . . . , ak. The costs of these arcs are chosen to be uniformly increasing so that\ncost(a1) < cost(a2) < . . . < cost(ak). The capacity of arc ak is infinite. The weights\nand capacities of the other arcs are chosen to reflect the steepness of the desired convexity\n(Figure 4, box B). The network shown in the figure yields the following function for the\ncost of pushing x units of flow through the arc:\n\n d1x, if 0 x c1\n cost(x) = d1c1 + d2(x - c1), if c1 x c2 (3)\n d1c1 + d2(c2 - c1) + d3(x - c1 - c2), if c2 x\n\nThe advantage of this convex flow computation is twofold. It does not require putting\nthresholds on the arcs a-priori. It is always feasible to have as much flow on a single arc as\nrequired. However, the minimum cost flow will avoid the leakage problem because it will\nbe costly to use an erroneous edge to carry the flow from many different patches.\n\n\n5.1 Fixing the leakage in Isomap\n\nAs noted earlier, the Isomap method [4] uses the shortest path measurements to estimate\na distance matrix M . Afterwards, M is used to find an embedding of the manifold S via\nMDS.\n\nAs expected, this method also suffers from the leakage problem as demonstrated in Fig-\nure 5. The top-left image in Figure 5 shows our ground truth. In the middle row, we\npresent an embedding of these graphs computed using Isomap which uses the shortest path\nlength as the global distance measure. As illustrated in these figures, even though isomap\ndoes a good job in embedding the ground truth when there are no errors, the embedding\n(or manifold) collapses after we insert the erroneous edges. In contrast, when we use the\nconvex-flow based technique to estimate the distances, we recover the true embedding \neven in the presence of erroneous edges (Figure 5 bottom row).\n\n6 Results\n\nIn our experiments we used 800 image frames to reconstruct the ground truth image. We\nfixed 30 30 size patches in each frame at the same location (see top of Figure 7 for two\nsets of examples), and for every location we found the center. The middle row of Figure\n7 shows embeddings of the patches computed using the distance derived from the convex\nflow. The transition path and the morphing from selected patches (A,B,C) to the center\npatch (F) is shown at the bottom.\n\nThe embedding plot on the left is considered an easier case, with a Gaussian-like embed-\nding (the graph is denser close to the center) and smooth transitions between the patches in\na transition path. The plot to the right shows a more difficult example, when the embedding\nhas no longer a Gaussian shape, but rather a triangular one. Also note that the transitions\ncan have jumps connecting non-similar patches which are distant in the embedding space.\nThe two extremes of the triangle represent the blurry patches, which are so numerous and\n\n\f\n 0.6 0.6 0.6\n\n B B B\n 0.4 0.4 0.4\n\n\n 0.2 0.2 0.2\n\n\n 0 A 0 A 0 A\n\n -0.2 -0.2 -0.2\n\n\n -0.4 -0.4 -0.4\n C C C\n Ground Truth -0.6 -0.6 -0.6\n -0.6 -0.4 -0.2 0 0.2 0.4 -0.6 -0.4 -0.2 0 0.2 0.4 -0.6 -0.4 -0.2 0 0.2 0.4\n\n\n 0.6 0.6 B 0.6\n B\n 0.4 0.4 0.4\n\n\n 0.2 0.2 0.2\n\n\n 0 A 0 0\n B\n A\n -0.2 -0.2 -0.2 C\n\n A\n -0.4 -0.4 C -0.4\n C\n\n Isomap [4] -0.6 -0.6 -0.6\n -0.6 -0.4 -0.2 0 0.2 0.4 -0.6 -0.4 -0.2 0 0.2 0.4 -0.6 -0.4 -0.2 0 0.2 0.4\n\n\n 0.6 B 0.6 B 0.6\n B\n 0.4 0.4 0.4\n\n\n 0.2 0.2 0.2\n\n\n 0 0 0\n\n\n -0.2 A -0.2 -0.2 C\n C\n -0.4 -0.4 A C -0.4 A\n\n Convex flow -0.6 -0.6 -0.6\n -0.6 -0.4 -0.2 0 0.2 0.4 -0.6 -0.4 -0.2 0 0.2 0.4 -0.6 -0.4 -0.2 0 0.2 0.4\n\n\n\n\n\nFigure 5: Top row: Ground truth. After sampling points from a triangular disk, a kNN graph is\nconstructed to provide a local measure for the embedding (left). Additional erroneous edges AC\nand CB are added to perturb the local measure (middle, right). Middle row: Isomap embedding.\nIsomap recovers the manifold for the error-free cases (left). However, all-pairs shortest path can\n\"leak\" through AC and CB, resulting a significant change in the embedding. Bottom row: Convex\nflow embedding. Convex flow penalized too many paths going through the same edge correcting\nthe leakage problem. The resulting embedding is more resistant to perturbations in the kNN graph.\n\n\n\nvery similar to each other, so that they are no longer treated as noise or outliers. This\nresults in `folding in' the embedding and thus, moving estimated the center towards the\nblurry patches. To solve this problem, we introduced additional two centers, which ideally\nwould represent the blurry patches, allowing the third center to move to the ground truth.\n\nOnce we have found the centers for all patches we stitched them together to form the\ncomplete reconstructed image. In case of three centers, we use overlapping patches and\ndynamic programming to determine the best stitching. Figure 6 shows the reconstruction\n\n\n\n\n\nFigure 6: Comparison of reconstruction results of different methods using the first 800 frames, top:\npatches stitched together which are closest to mean (left) and median (right), bottom: our results\nusing a single (left) and three (right) centers\n\nresult of our algorithm compared to simple methods of taking the mean/median of the\npatches and finding the closest patch to them. The bottom row shows our result for a single\nand for three center patches. The better performance of the latter suggests that the two new\ncenters relieve the correct center from the blurry patches.\n\nFor a graph with n vertices and m edges, the minimum cost flow computation takes\nO(m log n(m + n log n)) time, therefore finding the center I of one set of patches can be\ndone in O(mn log n(m + n log n)) time. Our flow computation is based on the min-cost\nmax-flow implementation by Goldberg [7]. The convex function used in our experiments\nwas as described in Equation 3 with parameters d1 = 1, c1 = 1, d2 = 5, c2 = 9, d3 = 50.\n\n\f\n B\n A1\n FC\n C2 FA\n F\n\n C1 FB\n F A2\n C B1\n B2\n\n A\n\n\n\n F A FA A1 FB B2\n\n F B FA A2 FC C1\n\n F C FB B1 FC C2\n\nFigure 7: Top row: sample patches (two different locations) from 800 frames, Middle row:\nConvex flow embedding, showing the transition paths. Bottom row: corresponding patches (A, B,\nC, A1, A2, B1, B2, C1, C2) and the morphing of them to the centers F F, FA, FB, FC respectively\n\n\n\n7 Conclusion\n\nIn this paper, we studied the problem of recovering an underwater image from a video\nsequence. Because of the surface waves, the sequence consists of distorted versions of\nthe image to be recovered. The novelty of our work is in the formulation of the recon-\nstruction problem as a manifold embedding problem. Our contribution also includes a new\ntechnique, based on convex flows, to recover global distances on the manifold in a robust\nfashion. This technique solves the leakage problem inherent in recent embedding methods.\n\n\nReferences\n\n[1] Lev S. Dolin, Alexander G. Luchinin, and Dmitry G. Turlaev. Correction of an underwater object\n image distorted by surface waves. In International Conference on Current Problems in Optics of\n Natural Waters, pages 2434, St. Petersburg, Russia, 2003.\n\n[2] Charles Cox and Walter H. Munk. Slopes of the sea surface deduced from photographs of sun\n glitter. Scripps Inst. of Oceanogr. Bull., 6(9):401479, 1956.\n\n[3] Brendan Frey and Nebojsa Jojic. Learning mixture models of images and inferring spatial trans-\n formations using the em algorithm. In IEEE Conference on Computer Vision and Pattern Recog-\n nition, pages 416422, Fort Collins, June 1999.\n\n[4] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for\n nonlinear dimensionality reduction. Science, pages 23192323, Dec 22 2000.\n\n[5] Sam Roweis and Lawrence Saul. Nonlinear dimeansionality reduction by locally linear embed-\n ding. Science, 290(5500):23232326, Dec 22 2000.\n\n[6] Bernd Fischer, Volker Roth, and Joachim M. Buhmann. Clustering with the connectivity kernel.\n In Advances in Neural Information Processing Systems 16. MIT Press, 2004.\n\n[7] Andrew V. Goldberg. An efficient implementation of a scaling minimum-cost flow algorithm.\n Journal of Algorithms, 22:129, 1997.\n\n\f\n", "award": [], "sourceid": 2618, "authors": [{"given_name": "Alexei", "family_name": "Efros", "institution": null}, {"given_name": "Volkan", "family_name": "Isler", "institution": null}, {"given_name": "Jianbo", "family_name": "Shi", "institution": null}, {"given_name": "Mirk\u00f3", "family_name": "Visontai", "institution": null}]}