{"title": "Locally Uniform Comparison Image Descriptor", "book": "Advances in Neural Information Processing Systems", "page_first": 1, "page_last": 9, "abstract": "Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. For real-time mobile applications, very fast but less accurate descriptors like BRIEF and related methods use a random sampling of pairwise comparisons of pixel intensities in an image patch. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on permutation distances between the ordering of intensities of RGB values between two patches. LUCID is computable in linear time with respect to patch size and does not require floating point computation. An analysis reveals an underlying issue that limits the potential of BRIEF and related approaches compared to LUCID. Experiments demonstrate that LUCID is faster than BRIEF, and its accuracy is directly comparable to SURF while being more than an order of magnitude faster.", "full_text": "Locally Uniform Comparison Image Descriptor\n\nAndrew Ziegler\u2217 Eric Christiansen David Kriegman Serge Belongie\n\nDepartment of Computer Science and Engineering, University of California, San Diego\namz@gatech.edu, {echristiansen, kriegman, sjb}@cs.ucsd.edu\n\nAbstract\n\nKeypoint matching between pairs of images using popular descriptors like SIFT\nor a faster variant called SURF is at the heart of many computer vision algorithms\nincluding recognition, mosaicing, and structure from motion. However, SIFT and\nSURF do not perform well for real-time or mobile applications. As an alternative\nvery fast binary descriptors like BRIEF and related methods use pairwise compar-\nisons of pixel intensities in an image patch. We present an analysis of BRIEF and\nrelated approaches revealing that they are hashing schemes on the ordinal correla-\ntion metric Kendall\u2019s tau. Here, we introduce Locally Uniform Comparison Image\nDescriptor (LUCID), a simple description method based on linear time permuta-\ntion distances between the ordering of RGB values of two image patches. LUCID\nis computable in linear time with respect to the number of pixels and does not\nrequire \ufb02oating point computation.\n\n1 Introduction\n\nLocal image descriptors have long been explored in the context of machine learning and computer\nvision. There are countless applications that rely on local feature descriptors, such as visual regis-\ntration, reconstruction and object recognition. One of the most widely used local feature descriptors\nis SIFT which uses automatic scale selection, orientation normalization, and histograms of oriented\ngradients to achieve partial af\ufb01ne invariance [15]. SIFT is known for its versatility and reliable\nrecognition performance, but these characteristics come at a high computational cost.\nRecently, mobile devices and affordable reliable imaging sensors have become ubiquitous. The\nwide adoption of these devices has made new real-time mobile applications of computer vision\nand machine learning feasible. Examples of such applications include visual search, augmented\nreality, perceptual interfaces, and wearable computing. Despite this, these devices have less com-\nputational power than typical computers and perform poorly for \ufb02oating point heavy applications.\nThese factors have provided an impetus for new ef\ufb01cient discrete approaches to feature descrip-\ntion and matching. In this work we explore current trends in feature description and provide a new\nview of BRIEF and its related methods. We also present a novel feature description method that is\nsurprisingly simple and effective.\n\n1.1 Background\n\nBay et al. proposed SURF as an approximation to SIFT, a notable shift toward real-time feature\ndescription [1]. SURF obtains a large speed up over SIFT while retaining most of its desirable\nproperties and comparable recognition rates. However, SURF is not generally suited to real-time\napplications without acceleration via a powerful GPU [21].\nIn [3] Bosch et al. proposed Ferns as a classi\ufb01cation based approach to key point recognition. Ferns\nuses sparse binary intensity comparisons between pixels in an image patch for descriptive power.\n\n\u2217This work was completed while the author was at UCSD.\n\n1\n\n\fThis simple scheme provides real-time performance in exchange for expensive off-line learning.\nIn response to the success of Ferns, Calonder et al. presented a novel binary feature descriptor\nthey named BRIEF [4]. Rather than training off-line, BRIEF makes use of random pixel intensity\ncomparisons to create a binary descriptor quickly. These descriptors can be matched an order of\nmagnitude faster than SIFT with the Hamming distance, even on mobile processors. As a result,\nBRIEF has come into widespread use and has inspired several variants based on the approach [12,\n14, 19]. However, little explanation as to why or how these types of descriptors work is given.\nThere is a fuzzy notion that pairwise intensity comparisons are an approximation to signed intensity\ngradients. This is not the whole story, and in fact these methods are sampling in an ad hoc manner\nfrom a rich source of discriminative information.\n\n1.2 Related work\n\nIn this work we diverge from the current paradigm for fast feature description and explore a deter-\nministic approach based on permutations. The study of distances between permutations began near\nthe inception of group theory and has continued unabated since [5, 7, 8, 9, 11, 10, 16].\nA notable early use of permutation based methods in the realm of visual feature description was pre-\nsented by Bhat and Nayar in [2]. They investigated the use of rank permutations of pixel intensities\nfor the purpose of dense stereo, the motivation being to \ufb01nd a robust alternative to the (cid:96)2 norm. Per-\nmutations on pixel intensities offer a transformed representation of the data which is naturally less\nsensitive to noise and invariant to monotonic photometric transformations. Bhat and Nayar present\na similarity measure between two rank permutations that is based on the Kolmogorov Smirnov test.\nTheir measure was designed to be robust to impulse noise, sometimes called salt and pepper noise,\nwhich can greatly corrupt a rank permutation. In [20] Scherer et al. reported that though Bhat and\nNayar\u2019s method was useful, it suffered from poor discrimination.\nIn [18] Mittal and Ramesh proposed an improved version of the method presented by Bhat and\nNayar. Their improvement was in a similar vein to [20], based on a modi\ufb01cation to Kendall\u2019s tau\n[11]. The key observation made was that both Kendall\u2019s tau metric and Bhat and Nayar\u2019s metric are\nhighly sensitive to Gaussian noise. To become robust to Gaussian noise Mittal and Ramesh account\nfor actual intensity differences while only considering uncorrelated order changes. We choose to\nexplore the Hamming and Cayley distances, in part because they are naturally robust to Gaussian\nnoise, impulse noise is not a major issue for modern imaging devices, and they are computable in\nlinear time as opposed to quadratic time.\nRecently there has been more research on the application of ordinal correlation methods to sparse\nvisual feature description. In [22] and [13] ordinal methods were applied to SIFT descriptors. In\ncontrast to [2] and [20] the elements of the SIFT descriptor are sorted, rather than sorting pixel\nintensities themselves. Though these methods do improve the recognition performance of SIFT they\nadd computational cost, rather than reducing it.\n\n1.3 Our contributions\n\nIn this paper, we introduce LUCID, a novel approach to real-time feature description based on order\npermutations. We contrast LUCID with BRIEF, and provide a theoretical basis for understanding\nthese two methods. We prove that BRIEF is effectively a locality sensitive hashing (LSH) scheme on\nKendall\u2019s tau. It follows from this that other descriptors based on binary intensity comparisons are\ndimensionality reduction schemes on Kendall\u2019s tau. We then explore alternative distances based on\nthe observation that image patch matching can be viewed as a near duplicate recognition problem.\nIn the next section we describe LUCID, provide a background on permutation distances and discuss\noptimizations for an ef\ufb01cient implementation. Section 3 provides an analysis of BRIEF and com-\npares it to LUCID. Section 4 reports on experiments that evaluate LUCID\u2019s accuracy and run time\nrelative to SURF and BRIEF.\n\n2\n\n\f2 LUCID\n\nHere we present a new method of feature description that is surprisingly simple and effective. We\ncall our method Locally Uniform Comparison Image Descriptor or LUCID. Our descriptors implic-\nitly encapsulate all possible intensity comparisons in a local area of an image. They are extremely\nef\ufb01cient to compute and are related through the generalized Hamming distance for ef\ufb01cient match-\ning [10].\n\n[~, desc1] = sort(p1(:));\n[~, desc2] = sort(p2(:));\ndistance = sum(desc1 ~= desc2);\n\n2.1 Constructing a descriptor\nLet p1 and p2 be n \u00d7 n image patches with\nc color channels. We can compute descrip-\ntors for both patches and the Hamming dis-\ntance between them in three lines of Mat-\nlab as shown in Figure 1. Here desc1 and\ndesc2 are the order permutation represen-\ntations for p1 and p2 respectively. Let\nm = cn2, then clearly this depiction has\nan O(m log m) running time. However, our\nnative implementation makes use of a sta-\nble comparison-free linear time sort and thus\ntakes O(m) time and space. Descriptor con-\nstruction is depicted in Figure 1.\n\n2.2 Permutation distances\n\nwhere \u03c3k =(cid:81)k\n\nA more detailed discussion of the following\nis given in [16]. Recall the de\ufb01nition of a\npermutation: a bijective mapping of a \ufb01nite\nset onto itself. This mapping \u03c0 is a mem-\nber of the symmetric group Sn formed by\nfunction composition on the set of all per-\nFigure 1: Top: LUCID feature construction and\nmutations of n labelled objects. We write\nmatching method in 3 lines of Matlab. Note: ~ is\n\u03c0(i) = j to denote the action of \u03c0 with\ni, j \u2208 {1, 2, ..., n}. The permutation product\nused to ignore the \ufb01rst return value of sort; and the\nfor \u03c01, \u03c02 \u2208 Sn is de\ufb01ned as function com-\nsecond value is the order permutation. Bottom: An\nposition \u03c01\u03c02 = \u03c01\u25e6\u03c02, the permutation that\nillustration of an image patch split into its RGB color\nchannels, vectorized and then sorted; inducing a per-\nresults from \ufb01rst applying \u03c02 then \u03c01. Every\npermutation \u03c0 \u2208 Sn can be written as a prod-\nmutation on the indices.\nuct of disjoint cycles \u03c31, \u03c32, ..., \u03c3(cid:96). Cycles are permutations such that \u03c3k(i) = i for some k \u2264 n\nj=1 \u03c3. We will use the notation #cycles(\u03c0) = (cid:96) to denote the number of cycles in \u03c0.\nA convenient representation for a permutation \u03c0 \u2208 Sn is the n dimensional vector with the ith\ncoordinate equal to \u03c0(i); this is the permutation vector. The convex hull of the permutation vectors\nSn \u2282 Rn is the permutation polytope of Sn. This is an n \u2212 1 dimensional polytope with |Sn| = n!\nvertices. The vertices are equidistant from the centroid and lie on the surface of a circumscribed n\u22121\ndimensional sphere. The vertices corresponding to two permutations \u03c01, \u03c02 \u2208 Sn are connected by\nan edge if they are related by a pairwise adjacent transposition. This is analogous to Kendall\u2019s tau,\nde\ufb01ned to be the minimum number of pairwise adjacent transpositions between two vectors, more\nprecisely Kd(\u03c01, \u03c02) = |{(i, j)|\u03c01(i) < \u03c01(j), \u03c02(i) > \u03c02(j), 1 \u2264 i, j \u2264 n}|.\nThere are at least two classes of distances that can be de\ufb01ned between permutations [16]. Spatial\ndistances can be viewed as measuring the distance travelled along some path between two vertices\nof the permutation polytope. Examples of spatial distances are Kendall\u2019s tau which steps along the\nedges of the polytope, the Euclidean distance which takes the straight line path, and Spearman\u2019s\nfootrule which takes unit steps on the circumscribed sphere of the polytope. A disorder distance\nmeasures the disorder between two permutations and ignores the spatial structure of the polytope.\nExamples of disorder distances are the generalized Hamming distance Hd(\u03c01, \u03c02) = |{i|\u03c01(i) (cid:54)=\n\u03c02(i)}| which is the number of elements that differ between two permutation vectors and the Cayley\n\n3\n\n\fdistance Cd(\u03c01, \u03c02) = n \u2212 #cycles(\u03c02\u03c0\u22121\n1 ) which is the minimum number of unrestricted transpo-\nsitions between \u03c01 and \u03c02. We choose the generalized Hamming distance to relate our descriptors\nbecause it is much simpler than the Cayley distance to compute. Hamming also lends itself to SIMD\nparallel processing unlike Cayley which is inherently serial. However, if time is not a constraint\nexperimental results show that the Cayley distance should be preferred for accuracy.\nDisorder distances are not sensitive to Gaussian noise, but are highly sensitive to impulse noise.\nIn contrast, Kendall\u2019s tau is confused by Gaussian noise, but is more resilient to impulse noise\n[2, 20, 18]. Impulse noise can severely corrupt these permutations since it can cause pixels in a\npatch to become maximal or minimal elements changing each element in the permutation vector.\nIn the presence of moderate impulse noise the Cayley and Hamming distances will likely become\nmaximal while Kendall\u2019s tau would be at O(1/n) its maximal distance. Generally, modern imaging\ndevices do not suffer from severe impulse noise, but there are other sources of impulse noise such\nas occlusions and partial shadows. LUCID is used with sparse interest points and only individual\nimage patches would be affected by impulse noise. Since impulse noise would cause the distance to\nbecome maximal these bad matches can be reliably identi\ufb01ed via threshold.\nKendall\u2019s tau is normally used in situations where multiple independent judges are ranking sets or\nsubsets of objects, such as top-k lists, movie preferences or surveys. In these scenarios multiple\njudges are asked to rank preferences and the permutation polytope can be used as a discrete analog\nto histograms to gain valuable insight into the distribution of the judges\u2019 preferences. In the context\nof sparse image patch matching, the imaging sensor ideally acts as a single consistent judge; thus a\nsingle image patch will correspond to one vertex on the permutation polytope. Ideally, for a pair of\ncorresponding patches in different images the permutations should be identical. Thus in our scenario\nthe image sensor can be viewed as one judge comparing nearly identical objects. The structure of\nthe permutation polytope becomes less important in this context.\nSince the Cayley and Hamming distances are computed in linear time rather than quadratic time\nlike Kendall\u2019s tau, they may be better suited for fast image patch matching. In section 3 we present\na proof demonstrating that BRIEF is a locality sensitive hashing scheme on Kendall\u2019s tau metric\nbetween vectors of pixel intensities.\n\n2.3 An ef\ufb01cient implementation\n\nLUCID-24-RGB\n\nBRIEF\n\nSURF\n\n64\n256\n256\n1728\n64\n\n20\n30\n40\n50\n450\n\n240\n880\n2130\n4120\n420\n\nDescriptor\n\nDimension Construction Matching\n\nLUCID-8-Gray\nLUCID-16-Gray\n\nTable 1: Time in milliseconds to construct 10,000\ndescriptors and to exhaustively match 5000x5000 de-\nscriptors.\n\nOur choice to use the Hamming distance\nis inspired by the new Streaming SIMD\nExtensions (SSE) instructions.\nSSE is\na simple way to add parallelism to na-\ntive programs through vector operations.\nIn our implementation we use a 128-bit\npacked comparison which gives LUCID\n16x matching parallelism for grayscale\nimage patches up to 16x16, and 8x par-\nallelism for RGB image patches up to\n147x147. Many mobile processors have\nthese types of instructions, but even when\nthey are not available it is still possible to gain parallelism. One additional bit per descriptor element\ncan be reserved allowing the use of binary addition and bit masks to produce a packed Hamming\ndistance. For descriptor lengths less than 215, 16 bits per element are needed. This strategy supports\nRGB image patches up to 105x105 pixels and yields 4x parallelism on 64-bit processors. It is also\npossible to randomly sample a small subset of pixels before sorting to achieve greater speed. This\noperation can be interpreted as randomly projecting the descriptors into a lower dimension.\nOrder permutations are fast to construct and access memory in sequential order. Since pixel inten-\nsities are represented with small positive integers they are ideal candidates for stable linear time\nsorting methods like counting and radix sort. These sorting algorithms access memory in linear\norder and thus with the fewest number of possible cache misses. BRIEF accesses larger portions of\nmemory than LUCID in a non-linear fashion and should incur more time consuming cache misses.\nTherefore LUCID offers a modest improvement in terms of descriptor construction time as shown\nin Table 1.\n\n4\n\n\fWe investigate three versions of LUCID since they are the \ufb01rst three multiples of eight: LUCID-24-\nRGB, LUCID-16-Gray, and LUCID-8-Gray which respectively are LUCID on image patches that\nare 24x24 in RGB color, 16x16 grayscale and 8x8 grayscale. Before construction a 5x5 averaging\nblur is applied to the entire image to remove noise that may perturb the order permutation. BRIEF\nalso performs pre-smoothing; Calonder et al. reported that they found a 9x9 blurring kernel to be\n\u201cnecessary and suf\ufb01cient\u201d [4].\nWe compare the running time of LUCID to the OpenCV implementations of SURF and BRIEF with\ndefault parameters on a 2.66GHz Intel\u00ae Core\u00ae i7.1 In Table 1 timing results for SURF, BRIEF\nand the variants of LUCID are shown. BRIEF uses 48x48 image patches and produces a descriptor\nwith 256 dimensions which is equal to the dimension of LUCID-16-Gray. Surprisingly, LUCID-16-\nGray is faster to match than BRIEF; this was not expected since BRIEF has the same complexity\nas LUCID to match. This might indicate that there are further optimizations that can be made for\nOpenCV\u2019s implementation.\n\n3 Understanding BRIEF and related methods\n\nIn [4] Calonder et al. propose BRIEF, an ef\ufb01cient binary descriptor. BRIEF is intended to be simple\nto compute and match based solely on sparse intensity comparisons. These comparisons provide for\nthe ef\ufb01cient construction of a compact descriptor. Here we discuss their method as presented in [4].\nDe\ufb01ne a test \u03c4\n\n(cid:26)1,\n\n\u03c4 (p; x, y) :=\n\nif p(x) < p(y)\n\n0, otherwise\n\n(1)\n\nthe nd dimensional bitstring fnd (p) :=(cid:80)\n\nwhere p is a square image patch and p(x) is the smoothed value of the pixel with the local coor-\ndinates x = (u, v)(cid:62). This test will represent one bit in the \ufb01nal descriptor. To construct a BRIEF\ndescriptor a set of pre-de\ufb01ned pixel comparisons are performed. This pattern is a set of nd pixel co-\nordinate pairs (x, y) that should be compared in each image patch. A descriptor is then de\ufb01ned to be\n2i\u22121\u03c4 (p; xi, yi). Calonder et al. suggest that in-\ntuitively these pairwise intensity comparisons capture the signs of intensity gradients. However, this\nis not precise and in the next section we prove that the reason BRIEF works is that it inadvertently\napproximates Kendall\u2019s tau.\n\n1\u2264i\u2264nd\n\n3.1 BRIEF is LSH on Kendall\u2019s Tau\n\nConsider a version of BRIEF where the pixel sampling pattern consists of all(cid:0)m\n\n(cid:1) pairs of pixels.\n\nThen the Hamming distance between two of these BRIEF descriptors is equivalent to the Kendall\u2019s\ntau distance between the pixel intensities of the vectorized image patches. The original formulation\nof BRIEF is LSH on the normalized Kendall\u2019s tau correlation metric.\nProof. Let p1, p2 be m dimensional vectorized image patches. De\ufb01ne Bk(i, j) := I(pk(i) < pk(j))\n(cid:80)\nwhere I is the indicator function. For image patches containing m pixels, BRIEF chooses a pattern of\npairs P \u2286 {(i, j)|1 \u2264 i < j \u2264 m}, and for two vectorized image patches p1, p2, it returns the score\n(i,j)\u2208P I(B1(i, j) (cid:54)= B2(i, j)). When P = {(i, j)|1 \u2264 i < j \u2264 m}, this is precisely Kd(p1, p2). It can\nbe shown that BRIEF satis\ufb01es the de\ufb01nition of LSH as de\ufb01ned in [6], consider a random pair (i, j) with i < j.\nThen\n\n2\n\nP [B1(i, j) (cid:54)= B2(i, j)] =\n\n(cid:48)\n\n(cid:48)\n\n) (cid:54)= B2(i\n\n(cid:48)\n\n, j\n\n(cid:48)\n\n, j\n\n)) = KdN (p1, p2).\n\n(cid:88)\n\n1(cid:0)m\n(cid:1) I(B1(i\n\ni(cid:48)<j(cid:48)\n\n2\n\n3.2 The DAG of Possible Comparisons\n\nThe motivation behind BRIEF was to create a compact descriptor that could take advantage of SSE.\nThis was in part inspired by hashing schemes that produced binary descriptors related by the Ham-\nming distance [4]. However, these schemes require \ufb01rst constructing a large descriptor and then\n1We used a stable release of OpenCV, version 2.4.3. OpenCV is open source and all versions are publicly\n\navailable at http://opencv.willowgarage.com.\n\n5\n\n\fisons in a patch, which has an impractical(cid:0)m\nimage patch, and(cid:0)m\n\n2\n\n2\n\nsampling from it. BRIEF is more ef\ufb01cient than these methods because it skips the step of construct-\ning the large descriptor. BRIEF is essentially a short cut and instead it immediately performs LSH.\nTo our knowledge, the fact that BRIEF itself is an LSH scheme has not been previously discussed\nin the literature.\nIn this instance the large descriptor would be the set of all possible pairwise pixel intensity compar-\n\n(cid:1) = O(m2) dimension. This set of comparisons can\n(cid:1) edges. In this model, there exists a directed edge (i, j) connecting the node\n\nbe modelled as a directed acyclic graph (DAG) with m nodes, one for each pixel in the vectorized\n\nthat correspond to the pixel with index i to the one with index j in the vectorized image patch if\np(i) < p(j) where p is the m dimensional vectorized image patch and i (cid:54)= j.\nThe topological sort of this DAG produces a unique Hamiltonian path from the sole source node to\nthe sole sink node. The order in which the nodes are visited on this path is equivalent to the order\npermutation produced by a stable sort of the pixel intensities. Since this path is unique the order\npermutation implicitly captures all O(m2) possible comparisons in O(m) space. This is possible\nbecause of the transitive property of the binary comparison and the stable order in which pixels are\nsorted. This is how LUCID captures all the comparative information in a patch.\nIn [4] Calonder et al. explored several different types of pixel sampling patterns and concluded that\nrandom sampling works the best in practice. This makes sense since BRIEF can be interpreted as\nrandomly sampling edges from the DAG. Random sampling will eventually converge to a complete\nrepresentation of the DAG through the transitive property. BRIEF can alternatively be viewed as a\nrandom projection of the adjacency matrix of the DAG.\nSeveral variants and extensions of BRIEF have been proposed where different patterns as well as\nrotation and scale normalization are considered [12, 14, 19]. It follows from the proof in section 3.1\nand the DAG model that these methods are dimensionality reduction schemes on Kendall\u2019s tau.\n\n4 Experiments\n\nTable 2: Recognition Rates. The FAST (FKD) and SURF (SKD) keypoint detectors are used to\n\ufb01nd the top 500 of 1500 keypoints in the \ufb01rst image for each pair. Ground truth homographies are\nused to warp the keypoints into the other images. The ratio of correct matches for each descriptor to\nthe total number of ground truth matches is de\ufb01ned to be the recognition rate. For each image pair\nand keypoint detector the highest and second highest recognition rates are bolded with the second\nhighest rate pre\ufb01xed by an asterisk.\n\nImage Pair LUCID-24-RGB LUCID-16-Gray LUCID-8-Gray\n\nBRIEF\n\nSURF\n\n\u2014\n\nBikes 1|2\nBikes 1|4\nBikes 1|6\nWall 1|2\nWall 1|4\nWall 1|6\nLight 1|2\nLight 1|4\nLight 1|6\nTrees 1|2\nTrees 1|4\nTrees 1|6\nJpeg 1|2\nJpeg 1|4\nJpeg 1|6\n\nFKD\n0.94\n\u22170.65\n\u22170.19\n\u22170.54\n\u22170.16\n\u22170.04\n0.86\n\u22170.71\n0.56\n\u22170.44\n\u22170.20\n\u22170.09\n0.95\n\u22170.88\n\u22170.37\n\nSKD\n0.92\n\u22170.61\n0.22\n0.47\n0.15\n0.03\n\u22170.89\n0.75\n0.57\n\u22170.37\n0.10\n0.06\n0.99\n0.89\n0.39\n\nFKD\n\u22170.90\n0.50\n0.13\n0.38\n0.12\n0.03\n\u22170.90\n0.82\n0.61\n0.34\n0.11\n0.05\n\u22170.97\n0.94\n0.37\n\nSKD\n0.83\n0.46\n0.11\n0.32\n0.10\n0.02\n0.91\n\u22170.76\n0.58\n0.25\n0.03\n0.03\n0.99\n0.93\n0.35\n\nFKD SKD\n0.54\n0.79\n0.26\n0.22\n0.06\n0.07\n0.12\n0.21\n0.06\n0.08\n0.02\n0.02\n0.87\n0.73\n0.55\n0.62\n0.36\n0.44\n0.14\n0.17\n0.02\n0.05\n0.03\n0.02\n0.99\n0.94\n0.71\n0.86\n0.24\n0.14\n\nFKD\n\u22170.90\n0.81\n0.73\n0.87\n0.64\n0.17\n0.60\n0.60\n\u22170.59\n0.79\n0.67\n0.63\n0.80\n0.80\n0.79\n\nSKD\n\u22170.90\n0.84\n0.75\n0.82\n0.64\n0.17\n0.83\n0.79\n0.78\n0.69\n0.42\n0.42\n\u22170.92\n\u22170.92\n0.90\n\nFKD\n0.23\n0.04\n0.01\n0.17\n0.11\n0.03\n0.48\n0.41\n0.32\n0.10\n0.00\n0.00\n0.77\n0.48\n0.10\n\nSKD\n0.75\n0.59\n\u22170.39\n\u22170.56\n\u22170.32\n\u22170.09\n0.81\n0.71\n\u22170.65\n0.36\n\u22170.16\n\u22170.07\n\u22170.95\n0.89\n\u22170.61\n\nWe use a subset of the commonly used benchmarking dataset used in [17].2 Our subset consists\nof the image pairs that do not undergo extreme af\ufb01ne warping since neither BRIEF nor LUCID\n\n2The dataset is available for download at http://www.robots.ox.ac.uk/\u02dcvgg/research/\n\naffine/\n\n6\n\n\f(a) bikes\n\n(b) light\n\nFigure 3: Recognition rates for LUCID-*-RGB on the bikes 1|4 and light 1|4 image pairs. The rates\nare plotted as a function of the width of the descriptor patch and of the blur kernel applied. The best\n100 of 300 FAST keypoints were detected in the \ufb01rst image of each pair. We found a blur width of\n5 rarely hurts performance and often helps. Performance increases monotonically with patch size\nwith diminishing returns after 30x30.\n\naccount for these transformations. These image pairs are denoted by name 1|k, where k represents\nthe second image used in the pair, e.g. bikes 1|k indicates the pair consisting of the \ufb01rst image of\nthe bikes set to the \ufb01fth. In each experiment we detect a large number of keypoints in the \ufb01rst image\nof a set and select the top N keypoints sorted by response. For each pair of images the keypoints\nare warped into the second image using a ground truth homography. Points that are warped out of\nbounds are culled before describing the points with each descriptor. Exhaustive nearest neighbor\nsearch is used to bring the points into correspondence. The recognition rate is then recorded as the\nratio of correct matches to the number of ground truth matches.\nIn Table 2 we summarize the result of our comparison to BRIEF and SURF. BRIEF and LUCID\nperform well in most instances, though BRIEF degrades more slowly with respect to image trans-\nformations. This robustness can be attributed to the fact that BRIEF sparsely samples pixels. Most\nof the images are taken parallel to the horizon so orientation estimation does not help and in fact\ndegrades SURF\u2019s performance relative to BRIEF and LUCID. LUCID performs the best on the light\nset which undergo exposure changes. This makes sense since the order permutation is invariant to\nmonotonic intensity transformations and unlike BRIEF captures all the comparative information.\n\n4.1 Parameter selection\n\nLUCID has three parameters, blur kernel width, image\npatch size, and the option to use color or grayscale im-\nages. Figure 3 gives plots of recognition rate as a func-\ntion of blur kernel width and patch size for the medium\ndif\ufb01culty warps of two different image sets. These plots\nindicate that LUCID performs well with a 5x5 averaging\nblur kernel, and that larger patches help with diminishing\nreturns. Though not shown here, we \ufb01nd that using color\nimproves recognition performance with an expected slow\ndown.\n\n4.2 Distance distributions\n\nHere we examine the discriminative capability of three\ndistances, the Cayley distance, the generalized Hamming\ndistance and Kendall\u2019s tau on pixel intensities. The Ham-\nming distance represents LUCID which approximates the\n\n7\n\nFigure 2: ROC curves for descriptors on\nimage pair bikes 1|4 for 200 keypoints.\n\n\f(a) Cayley\n\n(b) Hamming\n\n(c) Kendall\u2019s tau\n\nFigure 4: Histograms of distances for correct matches and impostors for the bikes 1|4 image pair.\nThe plots show the Cayley, the Hamming distance on the order permutations, and Kendall\u2019s tau on\npixel intensities. These plots present a good separation of correct matches and impostors. Kendall\u2019s\ntau requires O(m2) time to compute while the Cayley and Hamming distances run in O(m) time\nmaking them ef\ufb01cient alternatives. The Hamming distance is embarrassingly parallel and lends itself\nwell to existing SSE instructions making it the most ef\ufb01cient distance.\n\nCayley distance. In Figure 4 we plot the distance distributions for correct matches and impostors,\nfocusing on the medium dif\ufb01culty warp of the bikes images. We chose this image set because bikes\nis a natural man-made scene and its distributions are representative for the other image sets. An\nROC curve is shown in Figure 2 to visualize these results in a different way as well as for SURF and\nBRIEF. BRIEF does particularly well on this image pair because the only transformation that occurs\nis blur. Interestingly, BRIEF outperforms Kendall\u2019s Tau and the other methods that use all the pixels.\nBRIEF is in essence random projection dimensionality reduction for Kendall\u2019s tau. This indicates\nthat random projections may improve the performance of the Cayley and Hamming distances as\nwell. It is important to note that Kendall\u2019s tau is inef\ufb01cient to compute with quadratic running time\ncontrasted with the linear running time of the Cayley and generalized Hamming distances.\n\n5 Conclusions and future work\n\nIn this work we have presented an analysis of BRIEF and related methods providing a theoretical\nbasis as to how and why they work. We introduced a new simple and effective image descriptor that\nperforms comparably to SURF and BRIEF. For our comparison and simplicity we made use of every\npixel in an image patch. However, given BRIEF\u2019s superior performance to Kendall\u2019s tau we plan to\nexplore sampling patterns of pixels and other dimensionality reduction techniques. In addition, we\nplan to incorporate scale and rotation normalization as in [12] and [19]. This will allow an in depth\ncomparison of our method to descriptors like SIFT and SURF.\nLUCID offers a new simpli\ufb01ed approach for ef\ufb01cient feature construction and matching. We plan\nto investigate approximate nearest neighbor approaches like LSH and metric trees to improve the\nspeed of matching. It would also be useful to \ufb01nd a binary representation of LUCID to allow for a\nmore compact descriptor and use of existing LSH schemes. It is already possible to obtain such a\nrepresentation for LUCID through a method like WTAHash [23]. WTAHash produces an embedding\nfor ordinal feature spaces such that transformed feature vectors are in binary form and the Hamming\ndistance between them closely approximates the original metric.\nFinally, we hope that this new understanding of BRIEF and other binary descriptors will allow for\nthe creation of new ef\ufb01cient visual feature descriptors. Spending less time processing visual features\nprovides more CPU time for core functionality and application complexity enabling new real-time\napplications.\n\n6 Acknowledgements\n\nWe would like to acknowledge Brian McFee for his helpful conversations. This work was supported\nby ONR MURI Grant #N00014-08-1-0638.\n\n8\n\n\fReferences\n[1] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-up robust features (SURF).\n\nComput. Vis. Image Underst., 110(3):346\u2013359, June 2008.\n\n[2] D.N. Bhat and S.K. Nayar. Ordinal measures for image correspondence. Pattern Analysis and Machine\n\nIntelligence, 20(4):415\u2013423, Apr 1998.\n\n[3] A. Bosch, A. Zisserman, and X. Muoz. Image classi\ufb01cation using random forests and ferns. In Computer\n\nVision, 2007. ICCV 2007, pages 1\u20138, Oct. 2007.\n\n[4] Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. Brief: binary robust independent\nIn Proceedings of the 11th European conference on Computer vision: Part IV,\n\nelementary features.\nECCV\u201910, pages 778\u2013792, Berlin, Heidelberg, 2010. Springer-Verlag.\n\n[5] A. Cayley. Lxxvii. note on the theory of permutations. Philosophical Magazine Series 3, 34(232):527\u2013\n\n529, 1849.\n\n[6] Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the\nthiry-fourth annual ACM symposium on Theory of computing, STOC \u201902, pages 380\u2013388, New York, NY,\nUSA, 2002.\n\n[7] Michael Deza, Liens ecole Normale Suprieure, and Tayuan Huang. Metrics on permutations, a survey.\n\nJournal of Combinatorics, Information and System Sciences, 1998.\n\n[8] Persi Diaconis and R. L. Graham. Spearman\u2019s footrule as a measure of disarray. Journal of the Royal\n\nStatistical Society. Series B (Methodological), 39(2):pp. 262\u2013268, 1977.\n\n[9] M. A. Fligner and J. S. Verducci. Distance based ranking models. Journal of the Royal Statistical Society.\n\nSeries B (Methodological), 48(3):pp. 359\u2013369, 1986.\n\n[10] R. W. Hamming. Error detecting and error correcting codes. Bell System Technical Journal, 29(2):147\u2013\n\n160, 1950.\n\n[11] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):pp. 81\u201393, 1938.\n[12] S Leutenegger, M Chli, and R Siegwart. BRISK: Binary robust invariant scalable keypoints. In Proc. of\n\nthe IEEE International Conference on Computer Vision (ICCV), 2011.\n\n[13] Bing Li, Rong Xiao, Zhiwei Li, Rui Cai, Bao-Liang Lu, and Lei Zhang. Rank-SIFT: Learning to rank\nrepeatable local interest points. In Computer Vision and Pattern Recognition (CVPR), pages 1737\u20131744,\nJune 2011.\n\n[14] Jie Liu and Xiaohui Liang. I-BRIEF: A fast feature point descriptor with more robust features. In Signal-\n\nImage Technology and Internet-Based Systems (SITIS), pages 322\u2013328, Dec. 2011.\n\n[15] D.G. Lowe. Object recognition from local scale-invariant features. In Computer Vision, volume 2, pages\n\n1150\u20131157, 1999.\n\n[16] John I. Marden. Analyzing and Modeling Rank Data. Chapman & Hall, 1995.\n[17] Krystian Mikolajczyk and Cordelia Schmid. A performance evaluation of local descriptors. IEEE Trans.\n\nPattern Anal. Mach. Intell., 27(10):1615\u20131630, October 2005.\n\n[18] A. Mittal and V. Ramesh. An intensity-augmented ordinal measure for visual correspondence. In Com-\n\nputer Vision and Pattern Recognition, volume 1, pages 849\u2013856, June 2006.\n\n[19] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. ORB: An ef\ufb01cient alternative to SIFT\n\nor SURF. International Conference on Computer Vision, 95(1):2564\u20132571, 2011.\n\n[20] S Scherer, P Werth, and A Pinz. The discriminatory power of ordinal measures \u2013 towards a new coef\ufb01-\n\ncient. 1, 1999.\n\n[21] Timothy B. Terriberry, Lindley M. French, and John Helmsen. GPU accelerating speeded-up robust\nfeatures. In Proceedings of the 4th International Symposium on 3D Data Processing, Visualization and\nTransmission, 3DPVT \u201908, pages 355\u2013362, Atlanta, GA, USA, 2008.\n\n[22] M. Toews and W. Wells. Sift-rank: Ordinal description for invariant feature correspondence. In Computer\n\nVision and Pattern Recognition, pages 172\u2013177, June 2009.\n\n[23] Jay Yagnik, Dennis Strelow, David A. Ross, and Ruei-Sung Lin. The power of comparative reasoning. In\n\nICCV, pages 2431\u20132438, 2011.\n\n9\n\n\f", "award": [], "sourceid": 12, "authors": [{"given_name": "Andrew", "family_name": "Ziegler", "institution": null}, {"given_name": "Eric", "family_name": "Christiansen", "institution": null}, {"given_name": "David", "family_name": "Kriegman", "institution": null}, {"given_name": "Serge", "family_name": "Belongie", "institution": null}]}