{"title": "Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection", "book": "Advances in Neural Information Processing Systems", "page_first": 37, "page_last": 45, "abstract": "We present a system which constructs a topological map of an environment given a sequence of images. This system includes a novel image similarity score which uses dynamic programming to match images using both the appearance and relative positions of local features simultaneously. Additionally an MRF is constructed to model the probability of loop-closures. A locally optimal labeling is found using Loopy-BP. Finally we outline a method to generate a topological map from loop closure data. Results are presented on four urban sequences and one indoor sequence.", "full_text": "Constructing Topological Maps using Markov\nRandom Fields and Loop-Closure Detection\n\nRoy Anati Kostas Daniilidis\n\nGRASP Laboratory\n\nDepartment of Computer and Information Science\n\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19104\n\n{royanati,kostas}@cis.upenn.edu\n\nAbstract\n\nWe present a system which constructs a topological map of an environment given\na sequence of images. This system includes a novel image similarity score which\nuses dynamic programming to match images using both the appearance and rel-\native positions of local features simultaneously. Additionally, an MRF is con-\nstructed to model the probability of loop-closures. A locally optimal labeling is\nfound using Loopy-BP. Finally we outline a method to generate a topological map\nfrom loop closure data. Results, presented on four urban sequences and one indoor\nsequence, outperform the state of the art.\n\n1 Introduction\n\nThe task of generating a topological map from video data has gained prominence in recent years.\nTopological representations of routes spanning multiple kilometers are robuster than metric and\ncognitively more plausible for use by humans. They are used to perform path planning, providing\nwaypoints, and de\ufb01ning reachability of places. Topological maps can correct for the drift in visual\nodometry systems and can be part of hybrid representations where the environment is represented\nmetrically locally but topologically globally.\nWe identify two challenges in constructing a topological map from video: how can we say whether\ntwo images have been taken from the same place; and how can we reduce the original set of thou-\nsands of video frames to a reduced representative set of keyframes for path planning. We take into\nadvantage the fact that our input is video as opposed to an unorganized set of pictures. Video guaran-\ntees that keyframes will be reachable to each other but it also provides temporal ordering constraints\non deciding about loop closures. The paper has three innovations: We de\ufb01ne a novel image similarity\nscore which uses dynamic programming to match images using both the appearance and the layout\nof the features in the environment. Second, graphical models are used to detect loop-closures which\nare locally consistent with neighboring images. Finally, we show how the temporal assumption can\nbe used to generate compact topological maps using minimum dominating sets.\nWe formally de\ufb01ne a topological map T as a graph T = (K, ET ), where K is a set of keyframes\nand ET edges describing connectivity between keyframes. We will see later that keyframes are\nrepresentatives of locations. We desire the following properties of T :\nLoop closure For any two locations i, j \u2208 K, ET contains the edge (i, j) if and only if it is possible\n\nto reach location j from location i without passing through any other location k \u2208 K.\n\nCompactness Two images taken at the \u201csame location\u201d should be represented by the same\n\nkeyframe.\n\n1\n\n\fSpatial distinctiveness Two images from \u201cdifferent locations\u201d cannot be represented by the same\n\nkeyframe.\n\nNote that spatial distinctiveness requires that we distinguish between separate locations, however\ncompactness encourages agglomeration of geographically similar images. This distinction is im-\nportant, as lack of compactness does not lead to errors in either path planning or visual odometry\nwhile breaking spatial distinctiveness does. Our approach to building topological maps is divided\ninto three modules: calculating image similarity, detecting loop closures, and map construction. As\nde\ufb01ned it is possible to implement each module independently, providing great \ufb02exibility in the\nalgorithm selection. We now de\ufb01ne the interfaces between each pair of modules.\nStarting with I, a sequence of n images, the result of calculating image similarity scores is a matrix\nMn\u00d7n where Mij represents a relative similarity between images i and j. In section 2 we describe\nhow we use local image features to compute the matrix M. To detect loop-closures we have to\ndiscretize M into a binary decision matrix Dn\u00d7n where Dij = 1 indicates that images i and j\nare geographically equivalent and form a loop closure. Section 3 describes the construction of\nD by de\ufb01ning a Markov Random Field (MRF) on M and perform approximate inference using\nLoopy Belief Propagation (Loopy-BP). In the \ufb01nal step, the topological map T is generated from\nD. We calculate the set of keyframes K and their associated connectivity ET using the minimum\ndominating set of the graph represented by D (Section 4).\n\nRelated Work The state of the art in topological mapping of images is the FAB-MAP [8] algo-\nrithm. FAB-MAP uses bag of words to model locations using a generative appearance approach\nthat models dependencies and correlations between visual words rendering FAB-MAP extremely\nsuccessful in dealing with the challenge of perceptual aliasing (different locations sharing common\nvisual characteristics). Its implementation outperforms any other in speed averaging an intra-image\ncomparison of less than 1ms. Bayesian inference is also used in [1] where bags of words on local\nimage descriptors model locations whose consistency is validated with epipolar geometry. Ran-\nganathan et al. [14] incorporate both odometry and appearance and maintain several hypotheses of\ntopological maps. Older approaches like ATLAS [5] and Tomatis et al. [17] de\ufb01ne maps on two\nlevels, creating global (topological) maps by matching independent local (metric) data and com-\nbining loop -closure detection with visual SLAM (Self Localization and Mapping). The ATLAS\nframework [5] matches local maps through the geometric structures de\ufb01ned by their 2D schematics\nwhose correspondences de\ufb01ne loop-closures. Tomatis et al [17] detect loop closures by examining\nthe modality of the robot position\u2019s density function (PDF). A PDF with two modes traveling in sync\nis the result of a missed loop-closure, which is identi\ufb01ed and merged through backtracking.\nApproaches like [3] [19] [18] and [9] represent the environment using only an image similarity\nmatrix. Booij et al [3] use the similarity matrix to de\ufb01ne a weighted graph for robot navigation.\nNavigation is conducted on a node by node basis, using new observations and epipolar geometry\nto estimate the direction of the next node. Valgren et al [19] avoid exhaustively computing the\nsimilarity matrix by searching for and sampling cells which are more likely to describe existing\nloop-closures.\nIn [18], they employ exhaustive search, but use spectral clustering to reduce the\nsearch space incrementally when new images are processed. Fraundoerfer et al [9] use hierarchical\nvocabulary trees [13] to quickly compute image similarity scores. They show improved results by\nusing feature distances to weigh the similarity score. In [15] a novel image feature is constructed\nfrom patches centered around vertical lines from the scene (radial lines in the image). These are\nthen used to track the bearing of landmarks and localize the robot in the environment. Goedeme\n[10] proposes \u2018invariant column segments\u2019 combined with color information to compare images.\nThis is followed by agglomerative clustering of images into locations. Potential loop-closures are\nidenti\ufb01ed within clusters and con\ufb01rmed u sing Dempster-Shafer probabilities.\nOur approach advances the state of the art by using a powerful image alignment score without em-\nploying full epipolar geometry, and more robust loop colsure detection by applying MRF inference\non the similarity matrix. It is together with [4] the only video-based approach that provides a greatly\nreduced set of nodes for the \ufb01nal topological representation, making thus path planning tractable.\n\n2\n\n\f2 Image similarity score\n\nFor any two images i and j, we calculate the similarity score Mij in three steps: generate image fea-\ntures, sort image features into sequences, calculate optimal alignment between both sequences. To\ndetect and generate image features we use Scale Invariant Feature Transform (SIFT) [12]. SIFT was\nselected as it is invariant to rotation and scale, and partially immune to other af\ufb01ne transformations.\n\nFeature sequences Simply matching the SIFT features by value [12] yields satisfactory results\n(see later in \ufb01gure 2). However, to mitigate perceptual aliasing, we take advantage of the fact that\nfeatures represent real world structures with \ufb01xed spatial arrangements and therefore the similarity\nscore should take their relative positions into account. A popular approach, employed in [16], is to\nenforce scene rigidity by validating the epipolar geometry between two images. This process, al-\nthough extremely accurate, is expensive and very time-consuming. Instead, we make the assumption\nthat the gravity vector is known so that we can split image position into bearing and elevation and\nwe take into account only the bearing of each feature. Sorting the features by their bearing, results\nin ordered sequences of SIFT features. We then search for an optimal alignment between pairs of\nsequences, incorporating both the value and ordering of SIFT features into our similarity score.\n\nSequence alignment To solve for the optimal alignment between two ordered sequences of fea-\ntures we employ dynamic programming. Here a match between two features, fa and fb, occurs if\ntheir L1 norm is below a threshold, Score(a, b) = 1 if |fa\u2212fb|1 < tmatch. A key aspect to dynamic\nprogramming is the enforcement of the ordering constraint. This ensures that the relative order of\nfeatures matched is consistent in both sequences, exactly the property desired to ensure consistency\nbetween two scene appearances. Since bearing is not given with respect to an absolute orientation,\nordering is meant only cyclically, which can be handled easily in dynamic programming by repli-\ncating one of the input sequences. Modifying the \ufb01rst and last rows of the score matrix to allow for\narbitrary start and end locations yields the optimal cyclical alignment in most cases. This comes at\nthe cost of allowing one-to-many matches which can result in incorrect alignment scores. The score\nof the optimal alignment between both sequences of features provides the basis for the similarity\nscore between two images and the entries of the matrix M. We calculate the values of Mij for all\ni < j \u2212 w. Here w represents a window used to ignore images immediately before/after our query.\n\n3 Loop closure-detection using MRF\n\nUsing the image similarity measure matrix M, we use Markov Random Fields to detect loop-\nclosures. A lattice H is de\ufb01ned as an n \u00d7 n lattice of binary nodes where a node vi,j represents\nthe probability of images i and j forming a loop-closure. The matrix M provides an initial esti-\nmate of this value. We de\ufb01ne the factor \u03c6i,j over the node vi,j as follows: \u03c6i,j(1) = Mij/F and\n\u03c6i,j(0) = 1 \u2212 \u03c6i,j(1) where F = max(M) is used to normalize the values in M to the range [0, 1].\nLoops closures in the score matrix M appear as one of three possible shapes. In an intersection the\nscore matrix contains an ellipse. A parallel traversal, when a vehicle repeats part of its trajectory,\nis seen as a diagonal band. An inverse traversal, when a vehicle repeats a part of its trajectory in\nthe opposite direction, is an inverted diagonal band. The length and thickness of these shapes vary\nwith the speed of the vehicle (see \ufb01gure 1 for examples of these shapes). Therefore we de\ufb01ne lat-\ntice H with eight way connectivity, as it better captures the structure of possible loop closures. As\nadjacent nodes in H represent sequential images in the sequence, we expect signi\ufb01cant overlap in\ntheir content. So two neighboring nodes (in any orientation), are expected to have similar scores.\nSudden changes occur when either a loop is just closed (sudden increase) or when a loop closure\nis complete (sudden decrease) or due to noise caused by a sudden occlusion in one of the scenes.\nBy imposing smoothness on the labeling we capture loop closures while discarding noise. Edge\n\u2212 (x\u2212y)2\npotentials are therefore de\ufb01ned as Gaussians of differences in M. Letting G(x, y) = e\n,\nk = {i \u2212 1, i, i + 1} and l = {j \u2212 1, j, j + 1} then\n\n\u03c32\n\n\u03c6i,j,k,l(0, 0) = \u03c6i,j,k,l(1, 1) = \u03b1 \u00b7 G (Mij, Mkl)\n\u03c6i,j,k,l(0, 1) = \u03c6i,j,k,l(1, 0) =\n\n1,\n\n3\n\n\f(a) Intersection\n\n(b) Parallel Traversal\n\n(c) Inverse Traversal\n\nFigure 1: A small ellipse resulting from an intersection (a) and two diagonal bands from a parallel\n(b) and inverse (c) traversals. All extracted from a score matrix M.\n\nwhere 1 \u2264 \u03b1 (we ignore the case when both k = i and j = l). Overall, H models a probability\ndistribution over a labeling v \u2208 {1, 0}n\u00d7n where:\n\n(cid:89)\n\n(cid:89)\n\n(cid:89)\n\n(cid:89)\n\nP (v) =\n\n1\nZ\n\n\u03c6i,j(vi,j)\n\ni,j\u2208[1,n]\n\ni,j\u2208[1,n]\n\nk=[i\u22121,i+1]\n\nl=[j\u22121,j+1]\n\n\u03c6i,j,k,l(vi,j, vk,l)\n\nIn order to solve for the MAP labeling of H, v\u2217 = arg maxv P (v), the lattice must \ufb01rst be trans-\nformed into a cluster graph C. This transformation allows us to model the beliefs of all factors in the\ngraph and the messages being passed during inference. We model every node and every edge in H as\na node in the cluster graph C. An edge exists between two nodes in the cluster graph if the relevant\nfactors share variables. In addition this construction presents a two step update schedule, alternating\nbetween \u2018node\u2019 clusters and \u2018edge\u2019 clusters as each class only connects to instances of the other.\nOnce de\ufb01ned, a straightforward implementation of the generalized max-product belief propagation\nalgorithm (described in both [2] and [11]) serves to approximate the \ufb01nal labeling. We initialize the\ncluster graph directly from the lattice H with \u03c8i,j = \u03c6i,j for nodes and \u03c8i,j,k,l = \u03c6i,j,k,l for edges.\nThe MAP labeling found here de\ufb01nes our matrix D determining whether two images i and j close\na loop. Note, that the above MAP labeling is guaranteed to be locally optimal, but is not necessarily\nconsistent across the entire lattice. Generally, \ufb01nding the globally consistent optimal assignment is\nNP-hard [11]. Instead, we rely on our de\ufb01nition of D, which speci\ufb01es which pairs of images are\nequivalent, and our construction in section 4 to generate consistent results.\n\n4 Constructing the topological map\n\nFinally the decision matrix D is used to de\ufb01ne keyframes K and determine the map connectivity\nET . D can be viewed as an adjacency matrix of an undirected graph. Since there is no guarantee\nthat D found through belief propagation is symmetric, we initially treat D as an adjacency matrix\nfor a directed graph, and then remove the direction from all the edges resulting in a symmetric graph\nD0 = D \u2228 DT. It is possible to use the graph de\ufb01ned by D0 as a topological map. However this\nrepresentation is practically useless because multiple nodes represent the same location. To achieve\ncompactness, D0 needs to be pruned while remaining faithful to the overall structure of the environ-\nment. Booij [4] achieve this by approximating for the minimum connected dominating set. By using\nthe temporal assumption we can remove the connectedness requirement and use minimum dominat-\ning set to prune D0. We \ufb01nd the keyframes K by \ufb01nding the minimum dominating set of D0. Finding\nthe optimal solution is NP-Complete, however algorithm 1 provides a greedy approximation. This\napproximation has a guaranteed bound of H(dmax) (harmonic function of the maximal degree in\nthe graph dmax) [6].\nThe dominating set itself serves as our keyframes K. Each dominating node k \u2208 K is also associated\nwith the set of nodes it dominates Nk. Each set Nk represent images which have the \u201csame location\u201d.\nThe sets {Nk : k \u2208 K} in conjunction with our underlying temporal assumption are used to connect\nthe map T . An edge (k, j) is added if Nk and Nl contain two consecutive images from our sequence,\ni.e. (k, j) \u2208 ET if \u2203i such that i \u2208 Nk and i + 1 \u2208 Nl. This yields our \ufb01nal topological map T .\n\n4\n\n\fAlgorithm 1: Approximate Minimum Dominating Set\n\nInput: Adjacency matric D0\nOutput: K,{Nk : k \u2208 K}\nK \u2190 \u2205\nwhile D0 is not empty do\n\nk \u2190 node with largest degree\nK \u2190 K \u222a {k}\nNk \u2190 {k} \u222a N b(k)\nRemove all nodes Nk from matrix D0\n\nend\n\n5 Experiments\n\nThe system was applied to \ufb01ve image sequences. Results are shown for the system as described, as\nwell as for FAB-MAP ([8]) and for different methods of calculating image similarity scores.\n\nImage sets Three image sequences, indoors, Philadelphia and Pittsburgh1 were captured with a\nPoint Gray Research Ladybug camera. The Ladybug is composed of \ufb01ve wide-angle lens camera\narranged in circle around the base and one camera on top facing upwards. The resulting output is\na sequence of frames each containing a set of images captured by the six cameras. For the outdoor\nsequences the camera was mounted on top of a vehicle which was driven around an urban setting, in\nthis case the cities of Philadelphia and Pittsburgh. In the indoor sequence, the camera was mounted\non a tripod set on a cart and moved inside the building covering the ground and 1st \ufb02oors. Ladybug\nimages were processed independently for each camera using the SIFT detector and extractor pro-\nvided in the VLFeat toolbox [20]. The resulting features for every camera were merged into a single\nset and sorted by their spherical coordinates. The two remaining sequences, City Centre and New\nCollege were captured in an outdoor setting by Cummins [7] from a limited \ufb01eld of view camera\nmounted on a mobile robot. Table 1 summarizes some basic properties of the sequences we use.\nAll the outdoor sequences were provided with GPS location of the vehicle / robot. For Philadelphia\n\nData Set\nIndoors\nPhiladelphia[16]\nPittsburgh\nNew College[7]\nCity Centre[7]\n\nLength\n\nNot available\n\n2.5km\n12.5km\n1.9km\n2km\n\nNo. of frames\n\n852\n1,266\n1,256\n1,237\n1,073\n\nCamera Type\n\nspherical\nspherical\nspherical\n\nlimited \ufb01eld of view\nlimited \ufb01eld of view\n\nFormat\n\nraw Ladybug stream \ufb01le\nraw Ladybug stream \ufb01le\n\nrecti\ufb01ed images\nstandard images\nstandard images\n\nTable 1: Summary of image sequences processed.\n\nand Pittsburgh, these were used to generate ground truth decision matrices using a threshold of 10\nmeters. Ground truth matrices were provided for New College and City Centre. For the indoor\nsequence the position of the camera was manually determined using building schematics at an arbi-\ntrary scale. A ground truth decision matrix was generated using a manually determined threshold.\nThe entire system was implemented in Matlab with the exception of the SIFT detector and extractor\nimplemented by [20].\n\nParameters Both the image similarity scores and the MRF contain a number of parameters that\nneed to be set. When calculating the image similarity score, there are \ufb01ve parameters. The \ufb01rst\ntmatch is the threshold on th L1 norm at which two SIFT features are considered matched. In addi-\ntion, dynamic programming requires three parameters to de\ufb01ne the score of an optimal alignment:\nsmatch,sgap,smiss. smatch is the value by which the score of an alignment is improved by including\ncorrectly matched pairs of features. sgap is the cost of ignoring a feature in the optimal alignment\n(insertion and deletion), and smiss is the cost of including incorrectly matched pairs (substitution).\nWe use tmatch = 1000, smatch = 1, sgap = \u22120.1 and smiss = 0. Finally we use w = 30 as our\nwindow size, to avoid calculating similarity scores for images taken within very short time of each\n\n1The Pittsburgh dataset has been provided by Google for research purposes\n\n5\n\n\fPrecision\nRecall\n\nIndoors\n91.67%\n79.31%\n\nPhiladelphia\n\n91.72%\n51.46%\n\nPittsburgh City Centre New College\n63.85%\n54.60%\n\n91.57%\n84.35%\n\n97.42%\n40.04%\n\nTable 2: Precision and recall after performing inference.\n\nother. Constructing the MRF requires three parameters, F , \u03c3 and \u03b1. The normalization factor, F ,\nhas already been de\ufb01ned as max(M). The \u03c3 used in de\ufb01ning edge potentials is \u03c3 = 0.05F where\nF is again used to rescale the data in the interval [0, 1]. Finally we set \u03b1 = 2 to rescale the Gaussian\nto favor edges between similarly valued nodes. Inference using loopy belief propagation features\ntwo parameters, a dampening factor \u03bb = 0.5 used to mitigate the effect of cyclical inferencing and\nn = 20, the number of iterations over which to perform inference.\n\nij\n\nij\n\nResults\nIn addition to the image similarity score de\ufb01ned above, we also processed the image\n= number of SIFT\nsequences using alternative similarity measures. We show results for M SIF T\n= number of reciprocal SIFT matches (the intersection of matches from image i\nmatches, M REC\nto image j and from j to i). We also show results using FAB-MAP [8]. To process spherical images\nusing FAB-MAP we limited ourselves to using images captured by camera 0 (Directly forwards\n/ backwards). Figure 2 shows precision-recall curves for all sequences and similarity measures.\nThe curves were generated by thresholding the similarity scores. Our method outperforms state of\nthe art in terms of precision and recall in all sequences. The gain from using our system is most\npronounced in the Philadelphia sequence, where FAB-MAP yields extremely low recall rates. Table\n2 shows the results of performing inference on the image similarity matrices. Finally \ufb01gure 3 shows\nthe topological map resulting from running dominating sets on the decision matrices D. We use\nthe ground truth GPS positions for display purposes only. The blue dots represent the locations\nof the keyframes K with the edges ET drawn in blue. Red dots mark keyframes which are also\nloop-closures. For reference, \ufb01gure 4 provides ground truth maps and loop-closures.\n\n6 Outlook\n\nWe presented a system that constructs purely topological maps from video sequences captured from\nmoving vehicles. Our main assumption is that the images are presented in a temporally consistent\nmanner. A highly accurate image similarity score is found by a cyclical alignment of sorted feature\nsequences. This score is then re\ufb01ned via loopy-belief propagation to detect loop-closures. Finally\nwe constructed a topological map for the sequence in question. This map can be used for either path\nplanning or for bundle adjustment in visual SLAM systems. The bottleneck of the system is comput-\ning the image similarity score. In some instances, taking over 166 hours to process a single sequence\nwhile FAB-MAP [8] accomplishes the same task in 20 minutes. In addition to implementing score\ncalculation with a parallel algorithm (either on a multicore machine or using graphics hardware), we\nplan to construct approximations to our image similarity score. These include using visual bags of\nwords in a hierarchical fashion [13] and building the score matrix M incrementally [19, 18].\n\nAcknowledgments\n\nFinancial support by the grants NSF-IIS-0713260, NSF-IIP-0742304, NSF-IIP-0835714, and\nARL/CTA DAAD19-01-2-0012 is gratefully acknowledged.\n\n6\n\n\f(a) Indoors\n\n(b) Philadelphia\n\n(c) Pittsburgh\n\n(d) City Centre\n\n(e) New College\n\nFigure 2: Precision-recall curves for different thresholds on image similarity scores.\n\n7\n\n00.20.40.60.8100.10.20.30.40.50.60.70.80.91Indoors \u2212 Precision RecallRecallPrecision  Dynamic ProgrammingFAB\u2212MAPNo. SIFTSymmetric SIFT00.20.40.60.8100.10.20.30.40.50.60.70.80.91Philadelphia \u2212 Precision RecallRecallPrecision  Dynamic ProgrammingFAB\u2212MAPNo. SIFTSymmetric SIFT00.20.40.60.8100.10.20.30.40.50.60.70.80.91Pittsburgh \u2212 Precision RecallRecallPrecision  Dynamic ProgrammingFAB\u2212MAPNo. SIFTSymmetric SIFT00.20.40.60.8100.10.20.30.40.50.60.70.80.91City Centre \u2212 Precision RecallRecallPrecision  Dynamic ProgrammingFAB\u2212MAPNo. SIFTSymmetric SIFT00.20.40.60.8100.10.20.30.40.50.60.70.80.91New College \u2212 Precision RecallRecallPrecision  Dynamic ProgrammingFAB\u2212MAPNo. SIFTSymmetric SIFT\f(a) Indoors\n\n(b) Philadelphia\n\n(c) Pittsburgh\n\n(d) City Centre\n\n(e) New College\n\nFigure 3: Loop-closures generated using minimum dominating set approximation. Blue dots rep-\nresent positions of keyframes K with edges ET drawn in blue. Red dots mark keyframes with\nloop-closures.\n\n(a) Indoors\n\n(b) Philadelphia\n\n(c) Pittsburgh\n\n(d) City Centre\n\n(e) New College\n\nFigure 4: Ground truth maps and loop-closures. Blue dots represent positions of keyframes K with\nedges ET drawn in blue. Red dots mark keyframes with loop-closures.\n\n8\n\n\fReferences\n[1] A. Angeli, D. Filliat, S. Doncieux, and J.-A. Meyer. Fast and incremental method for loop-\nclosure detection using bags of visual words. Robotics, IEEE Transactions on, 24(5):1027\u2013\n1037, Oct. 2008.\n\n[2] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and\n\nStatistics). Springer, August 2006.\n\n[3] O. Booij, B. Terwijn, Z. Zivkovic, and B. Krose. Navigation using an appearance based topo-\nIn 2007 IEEE International Conference on Robotics and Automation, pages\n\nlogical map.\n3927\u20133932, 2007.\n\n[4] O. Booij, Z. Zivkovic, and B. Krose. Pruning the image set for appearance based robot local-\nization. In In Proceedings of the Annual Conference of the Advanced School for Computing\nand Imaging, 2005.\n\n[5] M. Bosse, P. Newman, J. Leonard, M. Soika, W. Feiten, and S. Teller. An atlas framework\nfor scalable mapping. In IEEE International Conference on Robotics and Automation, 2003.\nProceedings. ICRA\u201903, volume 2, 2003.\n\n[6] V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations\n\nResearch, 4(3):233\u2013235, 1979.\n\n[7] M. Cummins and P. Newman. Accelerated appearance-only SLAM. In Proc. IEEE Interna-\n\ntional Conference on Robotics and Automation (ICRA\u201908), Pasadena,California, April 2008.\n\n[8] M. Cummins and P. Newman. FAB-MAP: Probabilistic Localization and Mapping in the Space\n\nof Appearance. The International Journal of Robotics Research, 27(6):647\u2013665, 2008.\n\n[9] F. Fraundorfer, C. Wu, J.-M. Frahm, and M. Pollefeys. Visual word based location recognition\nin 3d models using distance augmented weighting. In Fourth International Symposium on 3D\nData Processing, Visualization and Transmission, 2008.\n\n[10] T. Goedem\u00b4e, M. Nuttin, T. Tuytelaars, and L. Van Gool. Omnidirectional vision based topo-\n\nlogical navigation. Int. J. Comput. Vision, 74(3):219\u2013236, 2007.\n\n[11] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT\n\nPress, 2009.\n\n[12] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of\n\nComputer Vision, 60:91\u2013110, 2004.\n\n[13] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. volume 2, pages\n\n2161\u20132168, 2006.\n\n[14] A. Ranganathan, E. Menegatti, and F. Dellaert. Bayesian inference in the space of topological\n\nmaps. IEEE Transactions on Robotics, 22(1):92\u2013107, 2006.\n\n[15] D. Scaramuzza, N. Criblez, A. Martinelli, and R. Siegwart. Robust feature extraction and\nmatching for omnidirectional images. Springer Tracts in Advanced Robotics, Field and Service\nRobotics, 2008.\n\n[16] J.-P. Tardif, Y. Pavlidis, and K. Daniilidis. Monocular visual odometry in urban environments\n\nusing an omnidirectional camera. pages 2531\u20132538, Sept. 2008.\n\n[17] N. Tomatis, I. Nourbakhsh, and R. Siegwart. Hybrid simultaneous localization and map\nbuilding: a natural integration of topological and metric. Robotics and Autonomous Systems,\n44(1):3\u201314, 2003.\n\n[18] C. Valgren, T. Duckett, and A. J. Lilienthal. Incremental spectral clustering and its application\nto topological mapping. In Proc. IEEE Int. Conf. on Robotics and Automation, pages 4283\u2013\n4288, 2007.\n\n[19] C. Valgren, A. J. Lilienthal, and T. Duckett. Incremental topological mapping using omnidirec-\ntional vision. In Proc. IEEE Int. Conf. On Intelligent Robots and Systems, pages 3441\u20133447,\n2006.\n\n[20] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algo-\n\nrithms. http://www.vlfeat.org/, 2008.\n\n9\n\n\f", "award": [], "sourceid": 647, "authors": [{"given_name": "Roy", "family_name": "Anati", "institution": null}, {"given_name": "Kostas", "family_name": "Daniilidis", "institution": null}]}