{"title": "Affine Structure From Sound", "book": "Advances in Neural Information Processing Systems", "page_first": 1353, "page_last": 1360, "abstract": null, "full_text": "Af\ufb01ne Structure From Sound\n\nSebastian Thrun\nStanford AI Lab\n\nStanford University, Stanford, CA 94305\n\nEmail: thrun@stanford.edu\n\nAbstract\n\nWe consider the problem of localizing a set of microphones together\nwith a set of external acoustic events (e.g., hand claps), emitted at un-\nknown times and unknown locations. We propose a solution that ap-\nproximates this problem under a far \ufb01eld approximation de\ufb01ned in the\ncalculus of af\ufb01ne geometry, and that relies on singular value decompo-\nsition (SVD) to recover the af\ufb01ne structure of the problem. We then\nde\ufb01ne low-dimensional optimization techniques for embedding the solu-\ntion into Euclidean geometry, and further techniques for recovering the\nlocations and emission times of the acoustic events. The approach is use-\nful for the calibration of ad-hoc microphone arrays and sensor networks.\n\n1\n\nIntroduction\n\nConsider a set of acoustic sensors (microphones) for detecting acoustic events in the envi-\nronment (e.g., a hand clap). The structure from sound (SFS) problem addresses the prob-\nlem of simultaneously localizing a set of N sensors and a set of M external acoustic events,\nwhose locations and emission times are unknown.\n\nThe SFS problem is relevant to the spatial calibration problem for microphone arrays.\nClassically, microphone arrays are mounted on \ufb01xed brackets of known dimensions; hence\nthere is no spatial calibration problem. Ad-hoc microphone arrays, however, involve a per-\nson placing microphones at arbitrary locations with limited knowledge as to where they\nare. Today\u2019s best practice requires a person to measure the distance between the micro-\nphones by hand, and to apply algorithms such as multi-dimensional scaling (MDS) [1] for\nrecovering their locations. When sensor networks are deployed from the air [4], manual cal-\nibration may not be an option. Some techniques rely on GPS receivers [8]. Others require\na capability to emit and sense wireless radio signals [5] or sounds [9, 10], which are then\nused to estimate relative distances between microphones (directly or indirectly, as in [9]).\nUnfortunately, wireless signal strength is a poor estimator of range, and active acoustic\nand GPS localization techniques are uneconomical in that they consume energy and re-\nquire additional hardware. In contrast, SFS relies on environmental acoustic events such\nas hand claps, which are not generated by the sensor network. The general SFS problem\nwas previously treated in [2] under the name passive localization. A related paper [3] de-\nscribes a technique for incrementally localizing a microphone relative to a well-calibrated\nmicrophone array through external sound events.\n\nIn this paper, the structure from sound (SFS) problem is de\ufb01ned as the simultaneous lo-\ncalization problem of N sound sensors and M acoustic events in the environment detected\nby these sensors. Each event occurs at an unknown time and an unknown location. The\n\n\fsensors are able to measure the detection times of the event. We assume that the clocks\nof the sensors are synchronized (see [6]); that events are spaced suf\ufb01ciently far apart in\ntime to make the association between different sensors unambiguous; and we also assume\nabsence of sound reverberation. For the ease of representation, the paper assumes a 2D\nworld; although the technique is easily generalized to 3D.\n\nUnder the assumption of independent and identically distributed (iid) Gaussian noise,\nthe SFS problem can be formulated as a least squares problem in a space over three types of\nvariables: the locations of the microphones, the locations of the acoustic events, and their\nemission times. However, this least squares problem is plagued by local minima, and the\nnumber of constraints is quite large.\n\nThe gist of this paper transforms this optimization problem into a sequence of simpler\nproblems, some of which can be solved optimally, without the danger of getting stuck in\nlocal minima. The key transformation involves a far \ufb01eld approximation, which presup-\nposes that the sound sources are relatively far away from the sensors. This approximation\nreformulates the problem as one of recovering the incident angle of the acoustic signal,\nwhich is the same for all sensors for any \ufb01xed acoustic event. The resulting optimization\nproblem is still non-linear; however, by relaxing the laws of Euclidean geometry into the\nmore general calculus of af\ufb01ne geometry, the optimization problem can be solved by sin-\ngular value decomposition (SVD). The resulting solution is mapped back into Euclidean\nspace by optimizing a matrix of size 2 (cid:2) 2, which is easily carried out using gradient de-\nscent. A subsequent non-linear optimization step overcomes the far \ufb01eld approximation\nand enables the algorithm to recover locations and emission times of the de\ufb01ning acoustic\nevents. Experimental results illustrate that our approach reliably solves hard SFS problems\nwhere gradient-based techniques consistently fail.\n\nOur approach is similar in spirit to the af\ufb01ne solution to the structure from motion\n(SFM) problem proposed by a seminal paper by Tomasi&Kanade [11], which was later\nextended to the non-orthographic case [7]. Like us, these authors expressed the structure\n\ufb01nding problem using af\ufb01ne geometry, and applied SVD for solving it. SFM is of course\nde\ufb01ned for cameras, not for microphone arrays. Camera measure angles, whereas micro-\nphones measure range. This paper establishes an af\ufb01ne solution to the structure from sound\nproblem that tends to work well in practice.\n\n2 Problem De\ufb01nition\n2.1 Setup\nWe are given N sensors (microphones) located in a 2D plane. We shall denote the location\nof the i-th sensor by (xi yi), which de\ufb01ned the following sensor location matrix of size\nN (cid:2) 2:\n\nWe assume that the sensor array detects M acoustic events. Each event has as unknown co-\nordinate and an unknown emission time. The coordinate of the j-th event shall be denoted\n(aj bj), providing us with the event location matrix A of size M (cid:2) 2. The emission time\nof the j-th acoustic event is denoted tj, resulting in the vector T of length M:\n\n(1)\n\n(2)\n\nX =\n\n0\nBB@\n\n0\nBB@\n\nx1\nx2\n\n...\n\ny1\ny2\n\n...\n\nxN yN\n\na1\na2\n\n...\n\nb1\nb2\n\n...\n\naM bM\n\n1\nCCA\n\n1\nCCA\n\nA =\n\nT =\n\n0\nBB@\n\nt1\nt2\n\n...\n\ntM\n\n1\nCCA\n\nX, A, and T , comprise the set of unknown variables. In problems such as sensor calibra-\ntion, only X is of interest. In general SFS applications, A and T might also be of interest.\n\n\f2.2 Measurement Data\nIn SFS, the variables X, A, and T are recovered from data. The data establishes the\ndetection times of the acoustic events by the individual sensors. Speci\ufb01cally, the data\nmatrix is of the form:\n\nD =\n\n0\nBB@\n\nd1;1\nd2;1\n\n...\n\nd1;2\nd2;2\n\n...\n\ndN;1\n\ndN;2\n\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n. . .\n(cid:1) (cid:1) (cid:1)\n\nd1;M\nd2;M\n\n...\n\ndN;M\n\n1\nCCA\n\nHere each di;j denotes the detection time of acoustical event j by sensor i. Notice that we\nassume that there is no data association problem. Even if all acoustic events sound alike,\nthe correspondence between different detections is easily established as long as there exists\nsuf\ufb01ciently long time gaps between any two sound events.\n\nThe matrix D is a random \ufb01eld induced by the laws of sound propagation (without re-\nverberation). In the absence of measurement noise, each di;j is the sum of the correspond-\ning emission time tj, plus the time it takes for sound to travel from (aj bj) to (xi yi):\n\ndi;j = tj + c(cid:0)1 (cid:12)(cid:12)(cid:12)(cid:12)\n\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:18) xi\nyi (cid:19) (cid:0)(cid:18) aj\n\nHere j (cid:1) j denotes the L2 norm (Euclidean distance), and c denoted the speed of sound.\n\n(4)\n\n2.3 Relative Formulation\nObviously, we cannot recover the global coordinates of the sensors. Hence, without loss of\ngenerality, we de\ufb01ne the \ufb01rst sensor\u2019s location as x1 = y1 = 0. This gives us the relative\nlocation matrix for the sensors:\n\n(3)\n\n(5)\n\n(cid:22)X =\n\n0\nBB@\n\nx2 (cid:0) x1\nx3 (cid:0) x1\n\n...\n\ny2 (cid:0) y1\ny3 (cid:0) y1\n\n...\n\nxN (cid:0) x1\n\nyN (cid:0) y1\n\n1\nCCA\n\nThis relative sensor location matrix is of dimension (N (cid:0) 1) (cid:2) 2.\n\nIt shall prove convenient to subtract from the arrival time di;j the arrival time d1;j\nmeasured by the \ufb01rst sensor i = 1. This relative arrival time is de\ufb01ned as (cid:1)i;j := di;j (cid:0)\nd1;j. In the relative arrival time, the absolute emission times tj cancel out:\n\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:18) aj\n\n(cid:1)i;j = tj + c(cid:0)1(cid:12)(cid:12)(cid:12)(cid:12)\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:0) tj (cid:0) c(cid:0)1(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:18) xi\nyi (cid:19) (cid:0)(cid:18) aj\n(cid:0)(cid:12)(cid:12)(cid:12)(cid:12)\n= c(cid:0)1(cid:26)(cid:12)(cid:12)(cid:12)(cid:12)\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:27)\n(cid:18) aj\nyi (cid:19) (cid:0)(cid:18) aj\n(cid:18) xi\n0\nBB@\n\nWe now de\ufb01ne the matrix of relative arrival times:\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n. . .\n(cid:1) (cid:1) (cid:1)\n\ndN;2 (cid:0) d1;2\nThis matrix (cid:1) is of dimension (N (cid:0) 1) (cid:2) M.\n\nd2;2 (cid:0) d1;2\nd3;2 (cid:0) d1;2\n\nd2;1 (cid:0) d1;1\nd3;1 (cid:0) d1;1\n\ndN;1 (cid:0) d1;1\n\n(cid:1) =\n\n...\n\n...\n\nd2;M (cid:0) d1;M\nd3;M (cid:0) d1;M\n\n...\n\ndN;M (cid:0) d1;M\n\n1\nCCA\n\n(6)\n\n(7)\n\n2.4 Least Squares Formulation\nThe relative sensor locations X and the corresponding locations of the acoustic events\nA can now be recovered through the following least squares problem. This optimization\nseeks to identify X and A so as to minimize the quadratic difference between the predicted\nrelative measurements and the actual measurements.\n\nhA(cid:3); X (cid:3)i = argmin\n\nX;A\n\nN\n\nXi=2\n\nM\n\nXj=1\n\n(cid:26)(cid:12)(cid:12)(cid:12)(cid:12)\n\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\nyi (cid:19) (cid:0)(cid:18) aj\n(cid:18) xi\n\n(cid:0)(cid:12)(cid:12)(cid:12)(cid:12)\n\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:18) aj\n\n(cid:0) (cid:1)i;j(cid:27)2\n\n(8)\n\n\fThe minimum of this expression is a maximum likelihood solution for the SFS problem\nunder the assumption of iid Gaussian measurement noise.\n\nIf emission times are of interest, they are now easily recovered by the following\n\nweighted mean:\n\nT (cid:3) =\n\n1\nN\n\nN\n\nXi=1\n\ndi;j (cid:0) c (cid:12)(cid:12)(cid:12)(cid:12)\n\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:18) xi\nyi (cid:19) (cid:0)(cid:18) aj\n\nThe minimum of Eq. 8 is not unique. This is because any solution can be rotated around\nthe origin of the coordinate system, and mirrored through any axis intersecting the origin.\nThis shall not concern us, as we shall be content with any solution of Eq. 8; others are then\neasily generated.\n\nWhat is of concern, however, is the fact that minimizing Eq. 8 is dif\ufb01cult. A straw\nman algorithm\u2014which tends to work poorly in practice\u2014involves starting with random\nguesses for X and A and then adjusting them in the direction of the negative gradient until\nconvergence. As we shall show experimentally, such gradient algorithms work poorly in\npractice because of the large number of local minima.\n\n(9)\n\n3 The Far Field Approximation\nThe essence of our approximation pertains to the fact that for far range acoustic events\u2014\ni.e., events that are (in\ufb01nitely) far away from the sensor array\u2014the incoming sound wave\nhits each sensor at the same incident angle. Put differently, the rays connecting the location\nof an acoustic event (aj bj) with each of the perceiving sensors (xi yi) are approximately\nparallel for all i (but not for all j!). Under the far \ufb01eld approximation, these incident angles\nare entirely parallel. Thus, all that matters are the incident angle of the acoustic events.\n\nTo derive an equation for this case, it shall prove convenient to write the Euclidean\ndistance between a sensor and an acoustic event as a function of the incident angle (cid:11). This\nangle is given by the four-quadrant extension of the arctan function:\n\n(10)\n\n(11)\n\n(cid:11)i;j = arctan2\n\nbj (cid:0) yi\naj (cid:0) xi\n\nThe Euclidean distance between (aj bj) and (xi yi) can now be written as\n\n(cid:12)(cid:12)(cid:12)(cid:12)\n\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\nyi (cid:19) (cid:0)(cid:18) aj\n(cid:18) xi\n\n= (cos (cid:11)i;j sin (cid:11)i;j) (cid:18) aj (cid:0) xi\nbj (cid:0) yi (cid:19)\n\nFor far-away points (aj bj), we can safely assume that all incident angles for the j-th\nacoustic event are identical:\n\n(cid:11)j\n\n:= (cid:11)1;j = (cid:11)2;j = : : : = (cid:11)N;j\n\n(12)\nHence we substitute (cid:11)j for (cid:11)i;j in Eq. 11. Plugging this back into Eq. 6, this gives us the\nfollowing expression for (cid:1)i;j:\n\n(cid:1)i;j = c(cid:0)1(cid:26)(cid:12)(cid:12)(cid:12)(cid:12)\n\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:0)(cid:12)(cid:12)(cid:12)(cid:12)\nbj (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:18) xi\nyi (cid:19) (cid:0)(cid:18) aj\n(cid:18) aj\n(cid:27)\n(cid:25) c(cid:0)1(cid:26)(cos (cid:11)j sin (cid:11)j) (cid:20)(cid:18) aj (cid:0) xi\nbj (cid:0) yi (cid:19) (cid:0)(cid:18) aj\n= c(cid:0)1 (cos (cid:11)j sin (cid:11)j) (cid:18) xi\nyi (cid:19)\n\nbj (cid:19)(cid:21)(cid:27)\n\n(13)\n\nThis leads to the following non-linear least squares problem for the desired sensor loca-\ntions:\n\nhX (cid:3); (cid:11)(cid:3)\n\n1; : : : ; (cid:11)(cid:3)\n\nM i =\n\nX (cid:18) cos (cid:11)1\n\nsin (cid:11)1\n\ncos (cid:11)2\nsin (cid:11)2\n\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n\nargmin\n\nX;(cid:11)1;:::;(cid:11)M(cid:12)(cid:12)(cid:12)(cid:12)\n\n2\n\n(14)\n\ncos (cid:11)M\n\nsin (cid:11)M (cid:19) (cid:0) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)\n\nThe reader many notice that in this formulation of the SFS problem, the locations of the\nsound events (aj; bj) have been replaced by (cid:11)j, the incident angles of the sound waves.\n\n\fOne might think of this as the \u201cortho-acoustic\u201d model of sound propagation (in analogy\nto the orthographic camera model in computer vision). The ortho-acoustic projection re-\nduces the number of variables in the optimization. However, the argument in the quadratic\nexpression is still non-linear, due to the non-linear trigonometric functions involved.\n\n4 Af\ufb01ne Solution for the Sensor Locations\nEq. 14 is trivially solvable in the space of af\ufb01ne geometry. Following [11], in af\ufb01ne ge-\nometry projections can be arbitrary linear functions, not just rotations and translations.\nSpeci\ufb01cally, let us replace the specialized matrix\n\n(cid:18) cos (cid:11)1\n\nsin (cid:11)1\n\ncos (cid:11)2\nsin (cid:11)2\n\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n\ncos (cid:11)M\n\nsin (cid:11)M (cid:19)\n\nby a general 2 (cid:2) M matrix of the form\n\n(cid:0) = (cid:18) (cid:13)1;1\n\n(cid:13)2;1\n\n(cid:13)1;2\n(cid:13)2;2\n\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n\n(cid:13)1;M\n\n(cid:13)2;M (cid:19)\n\nThis leads to the least squares problem\n\nhX (cid:3); (cid:0)(cid:3)i = argmin\n\njX(cid:0) (cid:0) (cid:1)j2\n\nX;(cid:0)\n\n(15)\n\n(16)\n\n(17)\n\nIn the noise free-case case, we know that there must exist a X and a (cid:0) for which X(cid:0) = (cid:1).\nThis suggests that the rank of (cid:1) should be 2, since it is the product of a matrix of size\n(N (cid:0) 1) (cid:2) 2 and a matrix of size 2 (cid:2) M.\n\nFurther, we can recover both X and (cid:0) via singular value decomposition (SVD). Specif-\nically, we know that the matrix (cid:1) can be decomposed as into three other matrices, U, V ,\nand W :\n\nU V W T = svd((cid:1))\n\n(18)\nwhere U is a matrix of size (N (cid:0) 1) (cid:2) 2, V a diagonal matrix of eigenvalues of size 2 (cid:2) 2,\nand W a matrix of size M (cid:2) 2. In practice, (cid:1) might be of higher rank because of noise or\nbecause of violations of the far \ufb01eld assumption, but it suf\ufb01ces to restrict the consideration\nto the \ufb01rst two eigenvalues.\n\nThe decomposition in Eq. 18 leads to the optimal af\ufb01ne solution of the SFS problem:\n\nand\n\nX = U V\n\n(19)\nHowever, this solution is not yet Euclidean, since (cid:0) might not be of the form of Eq. 15.\nSpeci\ufb01cally, Eq. 15 is a function of angles, and each row in Eq. 15 must be of the form\ncos2 (cid:13)j + sin2 (cid:13)j = 1. Clearly, this constraint is not enforced in the SVD.\n\nHowever, there is an easy \u201ctrick\u201d for recovering a X and (cid:0) for which this constraint is\n\n(cid:0) = W T\n\nat least approximately met. The key insight is that for any invertible 2 (cid:2) 2 matrix C,\n\nX 0 = U V C (cid:0)1\n\nand\n\n(cid:0)0 = CW T\n\n(20)\n\nis equally a solution to the factorization problem in Eq.18. This is because\n\nX 0(cid:0)0 = U V C (cid:0)1CW T = U V W T = X(cid:0)\n\n(21)\nThe remaining search problem, thus, is the problem of \ufb01nding an appropriate matrix C for\nwhich (cid:0)0 is of the form of Eq. 15. This is a non-linear optimization problem, but it is much\nlower-dimensional than the original SFS problem (it only involves 4 parameters!).\n\nSpeci\ufb01cally, we seek a C for which (cid:0)0 = CW T minimizes\n\n2\n\n(22)\n\nC (cid:3) = argmin\n\nC\n\n(cid:12)(cid:12)(cid:12) (1 1) ((cid:0)0 (cid:1) (cid:0)0)\n}\n|\n\n{z\n\n((cid:3))\n\n(cid:0) (1 1 (cid:1) (cid:1) (cid:1) 1) (cid:12)(cid:12)(cid:12)\n\nHere \u201c(cid:1)\u201d denotes the dot product. The expression labeled ((cid:3)) evaluates to a vector of ex-\npressions of the form\n\n((cid:13)2\n\n1;1 + (cid:13)2\n\n2;1 (cid:13)2\n\n1;2 + (cid:13)2\n\n2;2 (cid:1) (cid:1) (cid:1) (cid:13)2\n\n1;M + (cid:13)2\n\n2;M )\n\n(23)\n\n\f3.5\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\nl\n\n)\ns\na\nv\nr\ne\n\nt\n\nn\n\ni\n \n\ne\nc\nn\ne\nd\n\ni\nf\n\nn\no\nc\n \n%\n5\n9\n(\n \nr\no\nr\nr\ne\n\n(a) Error\n\ngrad. desc.\n\n@@R\n\nSVD\n\n?\n\nSVD+grad. desc.\n\n?\n\n2\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n\u22123\n\n\u22124\n\nl\n\n)\ns\na\nv\nr\ne\n\nt\n\nn\n\ni\n \n\ne\nc\nn\ne\nd\n\ni\nf\n\nn\no\nc\n \n%\n5\n9\n(\n \nr\no\nr\nr\ne\n\u2212\ng\no\n\nl\n\n0\n4\n\n6\n\n8\n\n10\n\nN, M (here N=M)\n\n12\n\n14\n\n\u22125\n4\n\n6\n\n(b) Log-error\n\n6\n\ngrad. desc.\n\nSVD\n?\n\n6\n\nSVD+grad. desc.\n14\n\n10\n\n12\n\n8\n\nN, M (here N=M)\n\nFigure 1: (a) Error and (b) log error for three different algorithms: gradient descent (red), SVD\n(blue), and SVD followed by gradient descent (green). Performance is shown for different values of\nN and M, with N = M. The plot also shows 95% con\ufb01dence bars.\n\n(a) ground truth\n\n(b) gradient descent\n\n(c) SVD\n\n(d) SVD + grad. desc.\n\nsensors\n\nacoustic events\n\nFigure 2: Typical SFS results for a simulated array of nine microphones spaced in a regular grid,\nsurrounded by 9 sounds arranged on a circle. (a) Ground truth; (b) Result of plain gradient descent\nafter convergence; the dashed lines visualize the residual error; (c) Result of the SVD with sound\ndirections as indicated; and (d) Result of gradient descent initialized with our SVD result.\n\nThe minimization in Eq. 22 is carried out through standard gradient descent. It involves\nonly 4 variables (C is of the size 2 (cid:2) 2), and each single iteration is linear in O(N + M )\n(instead of the O(N M ) constraints that de\ufb01ne Eq. 8). In (tens of thousands of) experiments\nwith synthetic noise-free data, we \ufb01nd empirically that gradient descent reliably converges\nto the globally optimal solution.\n\n5 Recovering the Acoustic Event Locations and Emission Times\nWith regards to the acoustic events, the optimization for the far \ufb01eld case only yields the\nincident angles. In the near \ufb01eld setting, in which the incident angles tend to differ for\ndifferent sensors, it may be desirable to recover the locations A of the acoustic event and\nthe corresponding emission times T .\n\nTo determine these variables, we use the vector X (cid:3) from the far \ufb01eld case as mere\nstarting points in a subsequent gradient search. The event location matrix A is initialized\nby selecting points suf\ufb01ciently far away along the estimated incident angle for the far \ufb01eld\napproximation to be sound:\n\nA = k (cid:0)0(cid:3)\n\n(24)\nHere (cid:0)0(cid:3) = C (cid:3)W T with C (cid:3) de\ufb01ned in Eq. 22, and k is a multiple of the diameter of the\nlocations in X. With this initial guess for A, we apply gradient descent to optimize Eq. 8,\nand \ufb01nally use Eq. 9 to recover T .\n\n6 Experimental Results\nWe ran a series of simulation experiments to characterize the quality of our algorithm,\nespecially in comparison with the obvious nonlinear least squares problem (Eq. 8) from\nwhich it is derived. Fig. 1 graphs the residual error as a function of the number of sensors\n\n\f(a) Error\n\ngrad. desc.\n\n@@R\n\nSVD\n(cid:0)(cid:0)(cid:9)\n(cid:8)(cid:8)(cid:8)(cid:25)\n\nSVD+grad. desc.\n\n3.5\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\nl\n\n)\ns\na\nv\nr\ne\n\nt\n\nn\n\ni\n \n\ne\nc\nn\ne\nd\n\ni\nf\n\nn\no\nc\n \n%\n5\n9\n(\n \nr\no\nr\nr\ne\n\n0\n0\n\nl\n\n)\ns\na\nv\nr\ne\n\nt\n\nn\n\ni\n \n\ne\nc\nn\ne\nd\n\ni\nf\n\nn\no\nc\n \n%\n5\n9\n(\n \nr\no\nr\nr\ne\n\u2212\ng\no\n\nl\n\n2\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n\u22123\n\n\u22124\n0\n\n(b) Log-error\n\ngrad. desc.\n\nSVD\n\n2\n\n4\n\n6\n\ndiameter ratio of events vs sensor array\n\n8\n\n10\n\n2\n\nSVD+grad. desc.\n4\n\n8\n\n6\n\n10\n\ndiameter ratio of events vs sensor array\n\nFigure 3: (a) Error and (b) log-error for three different algorithms (gradient descent in red, SVD\nin blue, and SVD followed by gradient descent in green), graphed here for varying distances of the\nsound events to the sensor array. An error above 2 means the reconstruction has entirely failed. All\ndiagrams also show the 95% con\ufb01dence intervals, and we set N = M = 10.\n\n(a) One of our motes\n\nused to generate the data\n\n(b) Optimal vs. hand-measured\n\n(c) Result of gradient descent\n\n(d) SVD and GD\n\nm\no\nt\ne\ns\n\nsounds\n\nmotes\n\nFigure 4: Results using our seven sensor motes as the sensor array, and a seventh mote to generate\nsound events.\n(a) A mote; (b) the globally optimal solution (big circles) compared to the hand-\nmeasures locations (small circles); (c) a typical result of vanilla gradient descent; and (d) the result\nof our approach, all compared to the optimal solution given the (noisy) data.\n\nN and acoustic events M (here N = M). Panel (a) plots the regular error along with\n95% con\ufb01dence intervals, and panel (b) the corresponding log-error. Clearly, as N and M\nincrease, plain gradient descent tends to diverge, whereas our approach converges. Each\ndata point in these graphs was obtained by averaging 1,000 random con\ufb01gurations, in which\nsensors were sampled uniformly within an interval of 1(cid:2)1m; sounds were placed at varying\nranges, from 2m to 10m. An example outcome (for a non-random con\ufb01guration!) is shown\nin Fig. 2. This \ufb01gure plots (a) a simulated sensor array consisting of 9 sensors with 9 sound\nsources arranged in a circle; and (b)-(d) the resulting reconstructions of our three methods.\nFor the SVD result shown in (c), only the directions of the incoming sounds are shown.\n\nAn interesting question pertains to the effect of the far \ufb01eld approximation in cases\nwhere it is clearly violated. To examine the robustness of our approach, we ran a series of\nexperiments in which we varied the diameter of the acoustic events relative to the diameter\nof the sensors. If this parameter is 1, the acoustic events are emitted in the same region as\nthe microphones; for values such as 10, the events are far away.\n\nFig. 3 graphs the residual errors and log-errors. The further away the acoustic events,\nthe better our results. However, even for nearby events, for which the far \ufb01eld assumption\nis clearly invalid, our approach generates results that are no worse than those of the plain\ngradient descent technique.\n\nWe also implemented our approach using a physical sensor array. Fig. 4 plots empirical\nresults using a microphone array comprised of seven Crossbow sensor motes, one of which\n\n\fis shown in Panel (a). Panels (b-d) compare the recovered structure with the one that\nglobally minimizes the LMS error, which we obtain by running gradient descent using the\nhand-measured locations as starting point. Panel (a) in Fig. 4 shows the manually measured\nlocations; the relatively high deviation to the LMS optimum is the result of measurement\nerror, which is ampli\ufb01ed by the fact that our motes are only spaced a few tens of centimeters\napart from each other (the standard deviation in the timing error corresponds to a distance\nof 6.99cm, and the motes are placed between 14cm and 125cm apart). Panel (b) in Fig. 4\nshows the solution of plain gradient descent applied to applied to Eq.8 and compares it\nto the optimal reconstruction; and Panel (c) illustrates our solution. In all plots the lines\nindicate residual error. This result shows that our method may work well on real-world\ndata that is noisy and that does not adhere to the far \ufb01eld assumption.\n\n7 Discussion\nThis paper considered the structure from sound problem and presented an algorithm for\nsolving it. Our approach makes is possible to simultaneously recover the location of a\ncollection of microphones, the locations of external acoustic events detected by these mi-\ncrophones, and the emission times for these events. By resorting to af\ufb01ne geometry, our\napproach overcomes the problem of local minima in the structure from sound problem.\n\nThere remain a number of open research issues. We believe the extension to 3-D is\nmathematically straightforward but requires empirical validation. The current approach\nalso fails to address reverberation problems that are common in con\ufb01ned space. It shall\nfurther be interesting to investigate data association problems in the SFS framework, and\nto develop parallel algorithms that can be implemented on sensor networks with limited\ncommunication resources. Finally, of great interest should be the incomplete data case in\nwhich individual sensors may fail to detect acoustic events\u2014a problem studied in [2].\n\nAcknowledgement\nThe motes data was made available by Rahul Biswas, which is gratefully acknowledged.\nWe also acknowledge invaluable suggestions by three anonymous reviewers.\n\nReferences\n[1] S.T. Birch\ufb01eld and A. Subramanya. Microphone array position calibration by basis-point clas-\n\nsical multidimensional scaling. IEEE Trans. Speech and Audio Processing, forthcoming.\n\n[2] R. Biswas and S. Thrun. A passive approach to sensor network localization. IROS-04.\n[3] J.C. Chen, R.E. Hudson, and K. Yao. Maximum likelihod source localization and unknown sen-\nsor location estimation for wideband signals in the near-\ufb01eld. IEEE Trans. Signal Processing,\n50, 2002.\n\n[4] P. Corke, S. Hrabar, R. Peterson, D. Rus, S. Saripalli, and G. Sukhatme. Deployment and\n\nconnectivity repair of a sensor net with a \ufb02ying robot. ISER-04.\n\n[5] E. Elnahrawy, X. Li, and R. Martin. The limits of localization using signal strength: A compar-\n\n[6] J. Elson and K. Romer. Wireless sensor networks: A new regime for time synchronization.\n\native study. SECON-04.\n\nHotNets-02.\n\n[7] S. Mahamud and M. Hebert. Iterative projective reconstruction from multiple views. CVPR-00.\n[8] D. Niculescu and B. Nath. Ad hoc positioning system (APS). GLOBECOM-01.\n[9] V.C. Raykar, I.V. Kozintsev, and R. Lienhart. Position calibration of microphones and loud-\nspeakers in distributed computing platforms. IEEE transaction on Speech and Audio Process-\ning, 13(1), 2005.\n\n[10] J. Sallai, G. Balogh, M. Maroti, and A. Ledeczi. Acoustic ranging in resource-constrained\n\n[11] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factor-\n\nsensor networks. eCOTS-04.\n\nization method. IJCV, 9(2), 1992.\n\n[12] T.L. Tung, K. Yao, D. Chen, R.E. Hudson, and C.W. Reed. Source localization and spatial \ufb01l-\ntering using wideband music and maxiumum power beam forming for multimedia applications.\nIn SIPS-99.\n\n\f", "award": [], "sourceid": 2770, "authors": [{"given_name": "Sebastian", "family_name": "Thrun", "institution": null}]}