{"title": "Multiresolution analysis on the symmetric group", "book": "Advances in Neural Information Processing Systems", "page_first": 1637, "page_last": 1645, "abstract": "There is no generally accepted way to define wavelets on permutations. We address this issue by introducing the notion of coset based multiresolution analysis (CMRA) on the symmetric group; find the corresponding wavelet functions; and describe a fast wavelet transform of O(n^p) complexity with small p for sparse signals (in contrast to the O(n^q n!) complexity typical of FFTs). We discuss potential applications in ranking, sparse approximation, and multi-object tracking.", "full_text": "Multiresolution analysis on the symmetric group\n\nRisi Kondor and Walter Dempsey\n\nDepartment of Statistics and Department of Computer Science\n\nThe University of Chicago\n\n{risi,wdempsey}@uchicago.edu\n\nAbstract\n\nThere is no generally accepted way to de\ufb01ne wavelets on permutations. We ad-\ndress this issue by introducing the notion of coset based multiresolution analysis\n(CMRA) on the symmetric group, \ufb01nd the corresponding wavelet functions, and\ndescribe a fast wavelet transform for sparse signals. We discuss potential applica-\ntions in ranking, sparse approximation, and multi-object tracking.\n\n1\n\nIntroduction\n\nA variety of problems in machine learning, from ranking to multi-object tracking, involve inference\nover permutations. Invariably, the bottleneck in such problems is that the number of permutations\ngrows with n!, ruling out the possibility of representing generic functions or distributions over per-\nmutations explicitly, as soon as n exceeds about ten or twelve.\nRecently, a number of authors have advocated approximations based on a type of generalized Fourier\ntransform [1][2][3][4][5][6]. On the group Sn of permutations of n objects, this takes the form\n\nf(\u03c3) \u03c1\u03bb(\u03c3),\n\n(1)\n\nbf(\u03bb) = X\n\n\u03c3\u2208Sn\n\nwhere \u03bb plays the role of frequency, while the \u03c1\u03bb matrix valued functions, called irreducible repre-\nsentations, are similar to the e\u2212i2\u03c0kx/N factors in ordinary Fourier analysis. It is possible to show\n\nthat, just as in classical Fourier analysis, the bf(\u03bb) Fourier matrices correspond to components of f at\n\ndifferent levels of smoothness with respect to the underlying permutation topology [2][7]. Ordering\nthe \u03bb\u2019s from smooth to rough as \u03bb1 2 \u03bb2 2 . . ., one is thus lead to \u201cband-limited\u201d approximations\nof f via the nested sequence of spaces\n\nV\u00b5 = { f \u2208 RSn | bf(\u03bb) = 0 for all \u03bb (cid:31) \u00b5 } .\n\nWhile this framework is attractive mathematically, it suffers from the same disease as classical\nFourier approximations, namely its inability to handle discontinuities with grace. In applications\nsuch as multi-object tracking this is a particularly serious issue, because each observation of the\nform \u201cobject i is at track j\u201d introduces a new discontinuity into the assignment distribution, and the\nresulting Gibbs phenonomenon makes it dif\ufb01cult to ensure even that f(\u03c3) remains positive.\nThe time-honored solution is to use wavelets. However, in the absence of a natural dilation operator,\nde\ufb01ning wavelets on a discrete space is not trivial. Recently, Gavish et al. de\ufb01ned an analog of Haar\nwavelets on trees [8], while Coifman and Maggioni [9] and Hammond et al. [10] managed to de\ufb01ne\nwavelets on general graphs. In this paper we attempt to do the same on the much more structured\ndomain of permutations by introducing an altogether new notion of multiresolution analysis, which\nwe call coset-based multiresolution (CMRA).\n\n1\n\n\f. . .\n\n/ V0\n\nV\u22122\n\nV\u22121\n\n!CCCCCCCC\n\n#FFFFFFFF\n\n#FFFFFFFF\n\nV\u22123\n\n#FFFFFFFF\n\n. . .\n\nW\u22121\n\nW\u22122\n\nW\u22123\n\nW\u22124\n\nFigure 1: Multiresolution\n\n2 Multiresolution analysis and the multiscale structure of Sn\n\nThe notion of multiresolution analysis on the real line was \ufb01rst formalized by Mallat [11]: a nested\nsequence of function spaces\n\n. . . \u2282 V\u22121 \u2282 V0 \u2282 V1 \u2282 V2 \u2282 . . .\n\nis said to constitute a multiresolution analysis (MRA) for L2(R) if it satis\ufb01es the following axioms:\n\nMRA1. T\nMRA2. S\n\nk Vk = {0},\nk Vk = L2(R),\n\nMRA3. for any f \u2208 Vk and any m \u2208 Z, the function f0(x) = f(x \u2212 m 2\u2212k) is also in Vk,\nMRA4. for any f \u2208 Vk, the function f0(x) = f(2x), is in Vk+1.\nSetting Vk+1 = Vk\u2295Wk and starting with, say, V\u2018, the process of moving up the chain of spaces can\nbe thought of as splitting V\u2018 into a smoother part V\u2018\u22121 (called the scaling space) and a rougher part\nW\u2018\u22121 (called the wavelet space), and then repeating this process recursively for V\u2018\u22121, V\u2018\u22122, and so\non (Figure 1).\nTo get an actual wavelet transform, one needs to de\ufb01ne appropriate bases for the {Vi} and {Wi}\nIn the simplest case, a single function \u03c6, called the scaling function, is suf\ufb01cient to\nspaces.\ngenerate an orthonormal basis for V0, and a single function \u03c8, called the mother wavelet gen-\nIn this case, de\ufb01ning \u03c6k,m(x) = 2k/2 \u03c6(2k x \u2212 m), and\nerates an orthonormal basis for W0.\n\u03c8k,m(x) = 2k/2 \u03c8(2k x\u2212 m), we \ufb01nd that {\u03c6k,m}m\u2208Z and {\u03c8k,m}m\u2208Z will be orthonormal bases\nfor Vk and Wk, respectively. Moreover, {\u03c8k,m}k,m\u2208Z is an orthonormal basis for the whole of\nL2(R). By the wavelet transform of f we mean its expansion in this basis.\nThe dif\ufb01culty in de\ufb01ning multiresolution analysis on discrete spaces is that there is no natural analog\nof dilation, as required by Mallat\u2019s fourth axiom. However, in the speci\ufb01c case of the symmetric\ngroup, we do at least have a natural multiscale structure on our domain. Our goal in this paper is to\n\ufb01nd an analog of Mallat\u2019s axioms that can take advantage of this structure.\n\n2.1 Two decompositions of RSn\nA permutation of n objects is a bijective mapping {1, 2, . . . , n} \u2192 {1, 2, . . . , n}. With respect to the\nnatural notion of multiplication (\u03c32\u03c31)(i) = \u03c32(\u03c31(i)), the n! different permutations of {1, . . . , n}\nform a group, called the symmetric group of degree n, which we denote Sn.\nOur MRA on Sn is born of the tension between two different ways of carving up RSn into orthogonal\nsums of subspaces: one corresponding to subdivision in \u201ctime\u201d, the other in \u201cfrequency\u201d. The \ufb01rst of\nthese is easier to describe, since it is based on recursively partitioning Sn according to the hierarchy\nof sets\n\nSi1 = { \u03c3 \u2208 Sn | \u03c3(n) = i1 }\nSi1,i2 = { \u03c3 \u2208 Sn | \u03c3(n) = i1, \u03c3(n\u2212 1) = i2 }\n\ni1 \u2208 {1, . . . , n}\n\ni1 6= i2,\n\ni1, i2 \u2208 {1, . . . , n} ,\n\nand so on, down to sets of the form Si1...in\u22121, which only have a single element. Intuitively, this tree\nof nested sets captures the way in which we zoom in on a particular permutation \u03c3 by \ufb01rst \ufb01xing\n\u03c3(n), then \u03c3(n\u22121), etc. (see Figure 2 in Appendix B in the Supplement). From the algebraic point\nof view, Si1,...,ik is a so-called (left) Sn\u2212k\u2013coset\n\n\u00b5i1,...,ik\n\nSn\u2212k := { \u00b5i1...ik \u03c4 | \u03c4 \u2208 Sn\u2212k } ,\n\n(2)\n\n2\n\n/\n/\n/\n!\n/\n/\n#\n/\n/\n#\n/\n/\n#\n\fwhere \u00b5i1...ik is a permutation mapping n 7\u2192 i1, . . . , n\u2212 k + 1 7\u2192 ik. This emphasizes that in some\nsense each Si1,...,ik is just a \u201ccopy\u201d of Sn\u2212k inside Sn. The \ufb01rst important system of subspaces of\nRSn for our purposes are the window spaces\n\n0 \u2264 k \u2264 n\u2212 1,\n\n{i1, . . . , ik} \u2286 {1, . . . , n} .\n\nClearly, for any given k, RSn =L\n\nSi1...ik = { f | supp(f) \u2286 Si1...ik }\n\ni1,...,ik\n\nSi1...ik.\n\nThe second system of spaces is related to the behavior of functions under translation. In fact, there\nare two distinct ways in which a given f \u2208 RSn can be translated by some \u03c4 \u2208 Sn: left\u2013translation,\nf 7\u2192 T\u03c4 f, where (T\u03c4 f)(\u03c3) = f(\u03c4\u22121\u03c3), and right\u2013translation f 7\u2192 T R\n\u03c4 f)(\u03c3) =\nf(\u03c3\u03c4\u22121). For now we focus on the former.\nWe say that a space V \u2286 RSn is a left Sn\u2013module if it is invariant to left-translation in the sense\nthat for any f \u2208 V and \u03c4 \u2208 Sn, T\u03c4 f \u2208 V . A fundamental result in representation theory tells us\nthat if V is reducible in the sense that it has a proper subset V1 that is \ufb01xed by left-translation, then\nV = V1 \u2295 V2, where V1 and V2 are both (left Sn\u2013)modules. In particular, RSn is a (left Sn\u2013)invariant\nspace, therefore\n(3)\n\nRSn = M\n\n\u03c4 f, where (T R\n\nMt\n\nfor some set {Mt} of irreducible modules. This is our second important system of spaces.\nTo understand the interplay between modules and window spaces, observe that each coset\n\u00b5i1...ik\n\nSn\u2212k has an internal notion of left\u2013translation\n\n(4)\nwhich \ufb01xes Si1...ik. Therefore, Si1...ik must be decomposable into a sum of irreducible Sn\u2212k\u2013\nmodules,\n\n(T i1...ik\n\ni1...ik\n\n\u03c3),\n\nf)(\u03c3) = f(\u00b5i1...ik \u03c4\u22121\u00b5\u22121\n\n\u03c4 \u2208 Sn\u2212k,\n\n\u03c4\n\nt\u2208Tn\n\nM i1...ik\n\nt\n\n.\n\n(5)\n\nSi1...ik = M\n\nt\u2208Tn\u2212k\n\nk\n\nk\n\nt\n\nt\n\nt\n\nt\n\ni1...ik\n\n\u00b5\u22121\n\n1,...,i0\n\nM i1...ik\n\ni1,...,ik M i1...ik\n\n. (Note that each M i1...ik\n\nthe space U = L\nL\n\nFurthermore, the modules of different window spaces can be de\ufb01ned in such a way that M i0\n=\nis an Sn\u2212k\u2013module in the sense of being invariant\n\u00b5i0\n1,...,i0\nto the internal translation action (4), and this action depends on i1 . . . ik.) Now, for any \ufb01xed t,\n, is fully Sn\u2013invariant, and therefore we must also have U =\n\u03b1\u2208AM\u03b1, where the M\u03b1 are now irreducible Sn\u2013modules. Whenever a relationship of this type\nholds between two sets of irreducible Sn\u2013 resp. Sn\u2212k\u2013modules, we say that the {M\u03b1} modules are\ninduced by {M i1...ik\nThe situation is complicated by the fact that decompositions like (3) and (5) are not unique. In par-\nticular, there is no guarantee that the {M\u03b1} induced modules will be amongst the modules featured\nin (3). However, there is a unique, so-called adapted system of modules, for which this issue does\nnot arise. Speci\ufb01cally, if, as is usually done, we let the indexing set Tm be the set of Standard\nYoung Tableaux (SYT) of size m (see Appendix A in the supplementary materials for the exact\nde\ufb01nition), such as\n\n}.\n\nt\n\nt =\n\n1 3 5 6 7\n2 4\n8\n\n\u2208 T8,\n\nM\n\nthen the adapted modules at different levels of the coset tree are connected via\n\n= M\n\u00b5 \u2193n\u2212k:= { t0\u2193n\u2212k | t0 \u2208 \u00b5 } and \u03bd \u2191n:= S\n\n(6)\nwhere t \u2191n:= { t0 \u2208 Tn | t0\u2193n\u2212k= t } and t0\u2193n\u2212k is the tableau that we get by removing the boxes\ncontaining n\u2212 k + 1, . . . , n from t0. We also extend these relationships to sets in the obvious way:\nt\u2208\u03bd t \u2191n. We will give an explicit description of the\n\n\u2200 t \u2208 Tn\u2212k,\n\nM i1...ik\n\nt0\u2208t\u2191n\n\nMt0\n\ni1...ik\n\nadapted modules in Section 4. For now abstract relationships of the type (6) will suf\ufb01ce.\n\n.\n\nt\n\n3 Coset based multiresolution analysis on Sn\n\nOur guiding principle in de\ufb01ning an analog of Mallat\u2019s axioms for permutations is that the resulting\nmultiresolution analysis should re\ufb02ect the multiscale structure of the tree of cosets. At the same time,\nwe also want the {Vk} spaces to be invariant to translation. Letting P be the projection operator\n\n3\n\n\f(cid:26) f(\u03c3)\n\n0\n\nif \u03c3 \u2208 \u00b5i1...ik\notherwise,\n\nSn\u2212k,\n\n(Pi1...ik f)(\u03c3) :=\n\n(7)\n\nwe propose the following de\ufb01nition.\nDe\ufb01nition 1 We say that a sequence of spaces V0 \u2286 V1 \u2286 . . . \u2286 Vn\u22121 = RSn forms a left-invariant\ncoset based multiresolution analysis (L-CMRA) for Sn if\nL1. for any f \u2208 Vk and any \u03c4 \u2208 Sn, we have T\u03c4 f \u2208 Vk,\nL2. if f \u2208 Vk, then Pi1...ik+1f \u2208 Vk+1, for any i1, . . . , ik+1, and\nL3. if g \u2208 Vk+1, then for any i1, . . . , ik+1 there is an f \u2208 Vk such that Pi1...ik+1f = g.\nGiven any left-translation invariant space Vk, the unique Vk+1 that satis\ufb01es axioms L1\u2013L3 is\n\ni1...ik+1Pi1...ik+1Vk. Applying this formula recursively, we \ufb01nd that\n\nVk+1 :=L\n\nPi1...ik V0,\n\n(8)\n\nVk = M\n\ni1...ik\n\nso V0 determines the entire sequence of spaces V0, V1, . . . , Vn\u22121.\nMRAs, however, this relationship is not bidirectional: Vk does not determine V0, . . . , Vk\u22121.\nTo gain a better understanding of L-CMRA, we exploit that (by axiom L1) each Vk is Sn\u2013invariant,\nand is therefore a sum of irreducible Sn\u2013modules. By the following proposition, if V0 is a sum of\nadapted modules, then V1, . . . , Vn\u22121 are easy to describe.\n\nProposition 1 If {Mt}t\u2208Tn are the adapted left Sn\u2013modules of RSn, and V0 =L\n\nIn contrast to most classical\n\nt\u2208\u03bd0Mt for some\n\nMt,\n\nwhere\n\n\u03bdk = \u03bd0\u2193n\u2212k\u2191n,\n\n(9)\n\n\u03bd0 \u2286 Tn, then\n\nVk = M\nProof. By (6) Pi1...ik[L\n\nfor any k \u2208 {0, 1, . . . , n\u2212 1}.\n\nt\u2208 \u03bdk\n\nMt,\n\nWk = M\n\nt\u2208 \u03bdk+1\\\u03bdk\n\nsome f \u2208 Mt0 \u2286 V0 such that for some i1 . . . ik, Pi1...ik f \u2208 M i1...ik\nLemmas 1 and 2 in Appendix D, this implies that M i1...ik\nfrom (6) it is also clear that if t06\u2208 \u03bd0, then M i1...ik\n\n\u2229 Vk = {0}. Therefore,\n\nt\n\n. Therefore, for any t0 \u2208 (t\u2191n\u2229 \u03bd0) there must be\n(and Pi1...ik f is non-zero). By\n\u2286 Vk for all i1 . . . ik. On the other hand,\n\nt\n\nt0\u2208t\u2191nMt0] = M i1...ik\n\nt\n\nVk = M\n\nM\n\nt\n\n= M\n\nM i1...ik\n\nt\n\nMt00 .\n\nt\u2208\u03bd0\u2193n\u2212k\n\ni1...ik\n\nt00\u2208\u03bd0\u2193n\u2212k\u2191n\n\nThe expression for Wk follows from the general formula Vk+1 = Vk \u2295 Wk.\n\n(cid:4)\n\nExample 1 The simplest case of L-CMRA is when \u03bd0 = { 1 2 \u00b7 \u00b7 \u00b7 n }.\nIn this case, setting\nm = n\u2212 k, we \ufb01nd that \u03bd0 \u2193m= { 1 2 \u00b7 \u00b7 \u00b7 m}, and \u03bdk = \u03bd0 \u2193m\u2191n is the set of all Young tableaux\nwhose \ufb01rst row starts with the numbers 1, 2, . . . , m.\nIt so happens that M i1...ik\ninvariant subspace of constant functions on\nthe trivial\nSn\u2212k. Therefore, this instance of L-CMRA is an exact analog of Haar wavelets: Vk will\n\u00b5i1...ik\nconsist of all functions that are constant on each left Sn\u2212k\u2013coset. Some more interesting examples\ny\nof adapted L-CMRAs are described in Appendix C.\n\n1 2 \u00b7 \u00b7 m is just\n\nWhen V0 cannot be written as a direct sum of adapted modules, the analysis becomes signi\ufb01cantly\nmore complicated. Due to space limitations, we leave the discussion of this case to the Appendix.\n\n3.1 Bi-invariant multiresolution analysis\n\nThe left-invariant multiresolution of De\ufb01nition 1 is appropriate for problems like ranking, where we\nhave a natural permutation invariance with respect to relabeling the objects to be ranked, but not the\nranks themselves. In contrast, in problems like multi-object tracking, we want our V0 \u2282 . . . \u2282 Vn\u22121\nhierarchy to be invariant on both the left and the right. This leads to the following de\ufb01nition.\n\n4\n\n\f\u03c4 f \u2208 Vk\n\nDe\ufb01nition 2 We say that a sequence of spaces V0 \u2286 V1 \u2286 . . . \u2286 Vn\u22121 = RSn forms a bi-invariant\ncoset based multiresolution analysis (Bi-CMRA) for Sn if\nBi1. for any f \u2208 Vk and any \u03c4 \u2208 Sn, we have T\u03c4 f \u2208 Vk and T R\nBi2. if f \u2208 Vk\u22121, then Pi1...ik f \u2208 Vk, for any i1, . . . , ik; and\nBi3. Vk is the smallest subspace of RSn satisfying Bi1 and Bi2.\nNote that the third axiom had to be modi\ufb01ed somewhat compared to De\ufb01nition 1, but essentially it\nserves the same purpose as L3.\nA subspace U that is invariant to both left- and right-translation (i.e., for any f \u2208 U and any \u03c3, \u03c4 \u2208 Sn\nboth T\u03c3f \u2208 U and T R\n\u03c4 f \u2208 U) is called a two-sided module. The main reason that Bi-CMRA is\neasier to describe than L-CMRA is that the irreducible two-sided modules in RSn, called isotypic\nsubspaces, are unique. In particular, the isotypics turn out to be\n\nU\u03bb = M\n\nMt\n\n\u03bb \u2208 \u039bn,\n\nt\u2208Tn : \u03bb(t)=\u03bb\n\nwhere \u03bb(t) is the vector (\u03bb1, . . . , \u03bbp) in which \u03bbi is the number of boxes in row i of t. For t to be a\ni=1 \u03bbi = n. We use \u039bn to denote the set\n\nvalid SYT, we must have \u03bb1 \u2265 \u03bb2 \u2265 . . . \u2265 \u03bbp \u2265 1, andPp\nspace must be of the form Vk =L\nthese operators are extended to sets of partitions by \u00b5\u2193m:=S\n\nof all such p\u2013tuples, called integer partitions of n.\nBi-CMRA is a much more constrained framework than L-CMRA because (by axiom Bi1) each Vk\n\u03bb\u2208\u03bdk U\u03bb. It should come as no surprise that the way that \u03bd0\ndetermines \u03bd1, . . . , \u03bdn\u22121 is related to restriction and extension relationships between partitions. We\ni\u2264 \u03bbi for all i (assuming \u03bb is padded with zeros to make it the same length as \u03bb),\nwrite \u03bb0\u2264 \u03bb if \u03bb0\nand for m \u2264 n, we de\ufb01ne \u03bb\u2193m:= { \u03bb0 \u2208 \u039bm | \u03bb0 \u2264 \u03bb }, and \u03bb0\u2191n:= { \u03bb \u2208 \u039bn | \u03bb \u2265 \u03bb0 }. Again,\n\u03bb\u2208\u03bd\u03bb\u2191n. (See\n\n\u03bb\u2208\u00b5\u03bb\u2193m and \u03bd \u2191n:=S\n\nVk = M\n\nWk = M\n\nFigure 3 in Appendix B.)\nProposition 2 Given a set of partitions \u03bd0 \u2286 \u039bn, the corresponding Bi-CMRA comprises the spaces\n(10)\n\n\u03bdk = \u03bd0\u2193n\u2212k\u2191n .\nMoreover, any system of spaces satisfying De\ufb01nition 2 is of this form for some \u03bd0 \u2286 \u039bn.\nExample 2 The simplest case of Bi-CMRA corresponds to taking \u03bd0 = {(n)}.\n\u03bd0 \u2193n\u2212k= {(n \u2212 k)}, and \u03bdk = { \u03bb \u2208 \u039bn | \u03bb1 \u2265 n\u2212 k }.\nteractions between elements of the set {1, . . . , n}.\n\nIn this case\nIn Section 6 we discuss that Vk =\n\u03bb\u2208\u03bdk U\u03bb has a clear interpretation as the subspace of RSn determined by up to k\u2019th order in-\ny\n\n\u03bb\u2208 \u03bdk+1\\\u03bdk\n\nL\n\nwhere\n\n\u03bb\u2208 \u03bdk\n\nU\u03bb,\n\nU\u03bb,\n\n4 Wavelets\n\nAs mentioned in Section 2, to go from multiresolution analysis to orthogonal wavelets, one needs\nto de\ufb01ne appropriate bases for the spaces V0, W0, W1, . . . Wn\u22122. This can be done via the close\nconnection between irreducible modules and the {\u03c1\u03bb} irreducible representations (irreps), that we\nencountered in the context of the Fourier transform (1). As explained in Appendix A, each integer\npartition \u03bb \u2208 \u039bn has a corresponding irrep \u03c1\u03bb : Sn \u2192 Rd\u03bb\u00d7d\u03bb; the rows and columns of the \u03c1\u03bb(\u03c3)\nmatrices are labeled by the set T\u03bb of standard Young tableaux of shape \u03bb; and if the \u03c1\u03bb are de\ufb01ned\naccording to Young\u2019s Orthogonal Representation (YOR), then for any t \u2208 Tn and t0 \u2208 T\u03bb(t), the\nfunctions \u03d5t0(\u03c3) = [\u03c1\u03bb(t)(\u03c3)]t0,t form a basis for the adapted module Mt. Thus, the orthonormal\nsystem of functions\n\n(11)\n(12)\nseems to be a natural choice of scaling resp. wavelet functions for the L-CMRA of Proposition 1.\nSimilarly, we can take\n\nt \u2208 \u03bdk+1\\ \u03bdk\n\n\u03bb = \u03bb(t)\n\n\u03bb = \u03bb(t)\n\n\u03c8k\n\nt0 \u2208 T\u03bb\nt0 \u2208 T\u03bb,\n\n\u03c6t,t0(\u03c3) =pd\u03bb/n! [\u03c1\u03bb(\u03c3)]t0,t\nt,t0(\u03c3) =pd\u03bb/n! [\u03c1\u03bb(\u03c3)]t0,t\n\u03c6t,t0(\u03c3) =pd\u03bb/n! [\u03c1\u03bb(\u03c3)]t0,t\nt,t0(\u03c3) =pd\u03bb/n! [\u03c1\u03bb(\u03c3)]t0,t\n\n\u03c8k\n\nt, t0 \u2208 T\u03bb\n\n\u03bb \u2208 \u03bd0\n\u03bb \u2208 \u03bdk+1\\ \u03bdk\n\nt, t0 \u2208 T\u03bb,\n\n(13)\n(14)\n\nt \u2208 \u03bd0\n\n5\n\n\fas a basis for the Bi-CMRA of Proposition 2. Comparing with (1), we \ufb01nd that if we use these bases\nto compute the wavelet transform of a function, then the wavelet coef\ufb01cients will just be rescaled\nversions of speci\ufb01c columns of the Fourier transform. From the computational point of view, this\nis encouraging, because there are well-known and practical fast Fourier transforms (FFTs) available\nfor Sn [12][13]. On the other hand, it is also somewhat of a letdown, since it suggests that all that\nwe have gained so far is a way to reinterpret parts of the Fourier transform as wavelet coef\ufb01cients.\nAn even more serious concern is that the \u03c8k\nt,t0 functions are not at all localized in the spatial do-\nmain, largely contradicting the very idea of wavelets. A solution to this dilemma emerges when we\nconsider that since\n\n\u03bdk+1\\ \u03bdk = (\u03bd0\u2193n\u2212k\u22121\u2191n) \\ (\u03bd0\u2193n\u2212k\u2191n) =(cid:0)(\u03bd0\u2193n\u2212k\u22121\u2191n\u2212k) \\ (\u03bd0\u2193n\u2212k)(cid:1)\u2191n,\n\neach of the Wk wavelet spaces of Proposition 1 can be rewritten as\n\nand similarly, the wavelet spaces of Proposition 2 can be rewritten as\n\n\u03c9k = (\u03bd0\u2193n\u2212k\u22121\u2191n\u2212k) \\ (\u03bd0\u2193n\u2212k),\n\n\u03c9k = (\u03bd0\u2193n\u2212k\u22121\u2191n\u2212k) \\ (\u03bd0\u2193n\u2212k),\n\n(15)\n\n(16)\n\nWk = M\nWk = M\n\ni1...ik\n\ni1...ik\n\nt\n\nt\u2208\u03c9k\n\nM i1...ik\n\nM\nM\n:=L\n(cid:26)pd\u03bb(t)/(n\u2212 k)! [\u03c1\u03bb(t)(\u00b5\u22121\n\nU i1...ik\n\n\u03bb\u2208\u03c9k\n\n\u03bb\n\n\u03c8i1...ik\n\nt,t0\n\n(\u03c3) :=\n\n0\n\nwhere U i1...ik\nthe M i1...ik spaces is provided by the local Fourier basis functions\n\nare now the \u201clocal isotypics\u201d U i1...ik\n\nt\u2208T\u03bb M i1...ik\n\n\u03bb\n\n\u03bb\n\nt\n\n. An orthonormal basis for\n\ni1...ik\n\n\u03c3)]t0,t \u03c3 \u2208 \u00b5i1...ik\notherwise,\n\nSn\u2212k\n\n(17)\n\nt1,t0\n1\n\nj0\n1...j0\nk0\nt2,t0\n2\n\nfor functions in Sj1,...,jk0 if t2 and t0\n\nwhich are localized both in \u201cfrequency\u201d and in \u201cspace\u201d. This basis also af\ufb01rms the multiscale nature\nof our wavelet spaces, since projecting onto the wavelet functions \u03c8i1...ik\nof a speci\ufb01c shape, say,\n\u03bb1 = (n\u2212 k\u2212 2, 2) captures very similar information about functions in Si1...ik as projecting onto\nthe analogous \u03c8\nTaking (17) as our wavelet functions, we de\ufb01ne the L-CMRA wavelet transform of a function\nf : Sn \u2192 R as the collection of column vectors\nt \u2208 \u03bd0\ni)>\nt0\u2208\u03bb(t)\n\n(18)\n(19)\nwhere 0 \u2264 k \u2264 n\u2212 2, and \u03c9k is as in (15). Similarly, we de\ufb01ne the Bi-CMRA wavelet transform\nof f as the collection of matrices\n\nf (t) := (hf, \u03c6t,t0i)>\nw\u2217\nwf (t; i1, . . . , ik) := (hf, \u03c8i1...ik\n\n2 are of shape \u03bb2 = (n\u2212 k0\u2212 2, 2).\n\n{i1, . . . , ik} \u2282 {1, . . . , n} ,\n\nt \u2208 \u03c9k\n\nt0\u2208\u03bb(t)\n\nt,t0\n\nf (\u03bb) := (hf, \u03c6t,t0i)t,t0\u2208\u03bb\nw\u2217\nwf (\u03bb; i1, . . . , ik) := (hf, \u03c8i1...ik\nt,t0\nwhere 0 \u2264 k \u2264 n\u2212 2, and \u03c9k is as in (16).\n\n\u03bb \u2208 \u03bd0\ni)t,t0\u2208\u03bb\n\n\u03bb \u2208 \u03c9k\n\n{i1, . . . , ik} \u2282 {1, . . . , n} ,\n\n(20)\n(21)\n\n4.1 Overcomplete wavelet bases\n\nWhile the wavelet spaces W0, . . . , Wk\u22121 of Bi-CMRA are left- and right-invariant, the wavelets\n(17) still carry the mark of the coset tree, which is not a right-invariant object, since it branches in\nthe speci\ufb01c order n, n\u2212 1, n\u2212 2, . . .. In contexts where wavelets are used as a means of promoting\nsparsity, this will bias us towards sparsity patterns that match the particular cosets featured in the\ncoset tree. The only way to avoid this phenomenon is to span W0, . . . , Wk\u22121 with the overcomplete\nsystem of wavelets\n\n(cid:26)pd\u03bb(t)/(n\u2212 k)! [\u03c1\u03bb(t)(\u00b5\u22121\n\ni1...ik\n\n\u03c3 \u00b5j1...jk)]t0,t \u03c3 \u2208 \u00b5i1...ik\notherwise,\n\nSn\u2212k \u00b5j1...jk\n\nj1...jk,t,t0(\u03c3) :=\n\u03c8i1...ik\n\n0\n\nwhere now both {i1, . . . , ik} and {j1, . . . , jk} are allowed to run over all k\u2013element subsets of\n{1, . . . , n}. While sacri\ufb01cing orthogonality, such a basis is extremely well suited for sparse model-\ning in various applications.\n\n6\n\n\f5 Fast wavelet transforms\n\nreturn(Scaling\u03bd(v(f)))\n\nIn the absence of fast wavelet transforms, multiresolution analysis would only be of theoretical\ninterest. Fortunately, our wavelet transforms naturally lend themselves to ef\ufb01cient recursive compu-\ntation along branches of the coset tree. This is especially attractive when dealing with functions that\nare sparse, since subtrees that only have zeros at their leaves can be eliminated from the transform\naltogether.\n1: function FastLCWT(f, \u03bd, (i1 . . . ik)) {\n2: if k = n\u2212 1 then\n3:\n4: end if\n5: v \u2190 0\n6: for each ik+1 6\u2208 {i1 . . . ik} do\nif Pi1...ik+1f 6= 0 then\n7:\n8:\nend if\n9:\n10: end for\n11: output Wavelet\u03bd\u2193n\u2212k\u22121\u2191n\u2212k\\\u03bd(v)\n12: return Scaling\u03bd(v) }\nAlgorithm 1: A high level description of a recursive algorithm that computes the wavelet transform\n(18)\u2013(19). The function is called as FastLCWT(f, \u03bd0, ()). The symbol v stands for the collec-\ntion of coef\ufb01cient vectors {wf (t; i1 . . . ik)}t\u2208\u03bd\u2193n\u2212k\u22121\u2191n\u2212k. The function Scaling selects the sub-\nset of these vectors that are scaling coef\ufb01cients, whereas Wavelet selects the wavelet coef\ufb01cients.\nf \u2193i1...ik : Sn\u2212k \u2192 R is the restriction of f to \u00b5i1...ik\n\nv \u2190 v + \u03a6ik(FastLCWT(f\u2193i1...ik+1, \u03bd \u2193n\u2212k\u22121, (i1 . . . ik+1)))\n\nSn\u2212k, i.e., f\u2193i1...ik (\u03c4) = f(\u00b5i1...ik \u03c4).\n\nA very high level sketch of the resulting algorithm is given in Algorithm 1, while a more detailed\ndescription in terms of actual coef\ufb01cient matrices is in Appendix E. Bi-CMRA would lead to a\nsimilar algorithm, which we omit for brevity. A key component of these algorithms is the function\n\u03a6ik, which serves to convert the coef\ufb01cient vectors representing any g \u2208 Si1...ik+1 in terms of the\nbasis {\u03c8i1...ik+1\n}t,t0.\nWhile in general this can be a complicated and expensive linear transformation, due to the special\nproperties of Young\u2019s orthogonal representation, in our case it reduces to\n\n}t,t0 to the coef\ufb01cient vectors representing the same g in terms of {\u03c8i1...ik\n\nt,t0\n\nt,t0\n\nwg(t; i1 . . . ik) =\n\nwhere t0 = t\u2193n\u2212k\u22121; \u03bb = \u03bb(t); \u03bb0 = \u03bb(t0);Jik+1, kK is a special permutation, called a contiguous\n\ncycle, that maps k to ik+1; and \u2191t is a copy operation that promotes its argument to a d\u03bb\u2013dimensional\nvector by\n\nd\u03bb\n\n(22)\n\nq d\u03bb0 (n\u2212k)\n\n\u03c1\u03bb(Jik+1, n \u2212 kK)(cid:0)wg(t0; i1 . . . ik+1)\u2191t(cid:1),\n(cid:26) [wg(t0; . . .)]t00\u2193n\u2212k\u22121\n\nif t00\u2193n\u2212k\u22121\u2208 T\u03bb0\notherwise.\n\n(cid:2)wg(t0; . . .)\u2191t(cid:3)\n\nt00 =\n\n0\n\nClausen\u2019s FFT [12] uses essentially the same elementary transformations to compute (1). However,\nwhereas the FFT runs in O(n3n!) operations, by working with the local wavelet functions (17) as\nopposed to (12) and (14), if f is sparse, Algorithm 1 needs only polynomial time.\nProposition 3 Given f : Sn \u2192 R such that |supp(f)| \u2264 q, and \u03bd0 \u2286 Tn, Algorithm 1 can compute\nt\u2208\u03bd1 d\u03bb(t).\n\nthe L-CMRA wavelet coef\ufb01cients (18)\u2013(19) in n2N q scalar operations, where N = P\nThe analogous Bi-CMRA transform runs in n2M q time, where M =P\n\n\u03bb.\n\u03bb\u2208\u03bd1 d2\n\nTo estimate the N and M constants in this result, note that for partitions with \u03bb1 >> \u03bb2, \u03bb3, . . .,\nd\u03bb = O(nn\u2212\u03bb1). For example, d(n\u22121,1) = n\u2212 1, d(n\u22122,2) = n(n\u2212 3)/2, etc.. The inverse wavelet\ntransforms essentially follow the same computations in reverse and have similar complexity bounds.\n\n7\n\n\f6 Applications\n\nThere is a range of applied problems involving permutations that could bene\ufb01t from the wavelets\nde\ufb01ned in this paper. In this section we mention just two potential applications.\n\n6.1 Spectral analysis of ranking data\nGiven a distribution p over permutations, the matrix Mk of k\u2019th order marginals is\np(\u03c3),\n\n[Mk]j1...jk;i1...ik = p( \u03c3(i1) = j1, . . . , \u03c3(ik) = jk ) = X\n\n\u03c3\u2208S j1...jk\ni1...ik\n\n:= (cid:8) \u00b5j1...jk \u03c4 \u00b5\u22121\n(cid:21)\n\nbp(\u03bb)\n\nTk,\n\n(cid:20) M\n\nMk = T >\n\nk\n\n(cid:9).\n\nwhere S j1...jk\nClearly, these matrices satisfy a number of linear equations, and therefore are redundant. However,\nit can be shown that for for some appropriate basis transformation matrix Tk,\n\nis the two-sided coset \u00b5j1...jk\n\n| \u03c4 \u2208 Sn\u2212k\n\nSn\u2212k \u00b5\u22121\n\ni1...ik\n\ni1...ik\n\ni1...ik\n\n\u03bb\u2208Tn : \u03bb1\u2265n\u2212k\n\ni.e., the Fourier matrices {bp(\u03bb)}\u03bb : \u03bbi=n\u2212k capture exactly the \u201cpure k\u2019th order effects\u201d in the dis-\ntribution p. In the spectral analysis of rankings, as advocated, e.g., in [7], there is a lot of emphasis\non projecting data to this space, Margk, but using an FFT this takes around O(n2n!) time. On the\nother hand, Margk is exactly the wavelet space Wk\u22121 of the Bi-CMRA generated by \u03bd0 = {(n)} of\nExample 2. Therefore, when p is q\u2013sparse, noting that d(n\u22121,1) = n\u22121, by using the methods of the\nprevious section, we can \ufb01nd its projection to each of these spaces in just O(n4q) time.\n\n6.2 Multi-object tracking\n\nthe \ufb01rst few Fourier coef\ufb01cients\n\nIn multi-object tracking, as mentioned in the Introduction,\n\n{bp(\u03bb)}\u03bb\u2208\u03be (w.r.t. the majorizing order on permutations) provide an optimal approximation to the\n\nassignment distribution p between targets and tracks in the face of a random noise process [2][1].\nHowever, observing target i at track j will zero out p everywhere outside the coset \u00b5jSn\u2212k\u00b5\u22121\n,\nwhich is dif\ufb01cult for the Fourier approach to handle. In fact, by analogy with (7), denoting the oper-\nator that projects to the space of functions supported on this coset by Pi\nj, the new distribution will\njp. Thus, if we set \u03bd0 = \u03be, after any single observation, our distribution will lie in V1 of the\njust be Pi\ncorresponding Bi-CMRA.\nUnfortunately, after a second observation, p will fall in V2, etc., leading to a combinatorial explo-\nsion in the size of the space needed to represent p. However, while each observation makes p less\nsmooth, it also makes it more concentrated, suggesting that this problem is ideally suited to a sparse\nrepresentation in terms of the overcomplete basis functions of Section 4.1. The important departure\nfrom the fast wavelet transforms of Section 5 is that now, to \ufb01nd the optimally sparse representation\nSn\u2212k \u00b5i1...ik, which are no\nof p, we must allow branching to two-sided cosets of the form \u00b5j1...jk\nlonger mutually disjoint.\n\ni\n\n7 Conclusions\nStarting from the self-similar structure of the Sn\u2212k coset tree, we developed a framework for wavelet\nanalysis on the symmetric group. Our framework resembles Mallat\u2019s multiresolution analysis in its\naxiomatic foundations, yet is closer to continuous wavelet transforms in its invariance properties. It\nalso has strong ties to the \u201cseparation of variables\u201d technique of non-commutative FFTs [14]. In a\ncertain special case we recover the analog of Haar wavelets on the coset tree, In general, wavelets\ncan circumvent the rigidity of the Fourier approach when dealing with functions that are sparse\nand/or have discontinuities, and, in contrast to the O(n2n!) complexity of the best FFTs, for sparse\nfunctions and a reasonable choice of \u03bd0, our fast wavelet transform runs in O(np) time for some\nsmall p. Importantly, wavelets also provide a natural basis for sparse approximations, which have\nhithero not been explored much in the context of permutations. Finally, much of our framework is\napplicable not just to the symmetric group, but to other \ufb01nite groups, as well.\n\n8\n\n\fReferences\n[1] J. Huang, C. Guestrin, and L. Guibas. Fourier Theoretic Probabilistic Inference over Permuta-\n\ntions. Journal of Machine Learning Research, 10:997\u20131070, 2009.\n\n[2] R. Kondor, A. Howard, and T. Jebara. Multi-object tracking with representations of the sym-\n\nmetric group. In Arti\ufb01cial Intelligence and Statistics (AISTATS), 2007.\n\n[3] S. Jagabathula and D. Shah. Inferring Rankings under Constrained Sensing. In In Advances in\n\nNeural Information Processing Systems (NIPS), 2008.\n\n[4] J. Huang, C. Guestrin, X. Jiang, and L. Guibas. Exploiting Probabilistic Independence for\n\nPermutations. In Arti\ufb01cial Intelligence and Statistics (AISTATS), 2009.\n\n[5] X. Jiang, J. Huang, and L. Guibas. Fourier-information duality in the identity management\nproblem. In In Proceedings of the European Conference on Machine Learning and Principles\nand Practice of Knowledge Discovery in Databases (ECML PKDD), Athens, Greece, Septem-\nber 2011.\n\n[6] D. Rockmore, P. Kostelec, W. Hordijk, and P. F. Stadler. Fast Fourier Transforms for Fitness\n\nLandscapes. Applied and Computational Harmonic Analysis, 12(1):57\u201376, 2002.\n\n[7] P. Diaconis. Group representations in probability and statistics.\n\nStatistics, 1988.\n\nInstitute of Mathematical\n\n[8] M. Gavish, B. Nadler, and R. R. Coifman. Multiscale Wavelets on Trees, Graphs and High\nDimensional Data: Theory and Applications to Semi Supervised Learning. In International\nConference on Machine Learning (ICML), 2010.\n\n[9] R. R. Coifman and M. Maggioni. Diffusion wavelets. Applied and Computational Harmonic\n\nAnalysis, 21, 2006.\n\n[10] D. K. Hammond, P. Vandergheynst, and R. Gribonval. Wavelets on graphs via spectral graph\n\ntheory. Applied and Computational Harmonic Analysis, 30:129\u2013150, 2011.\n\n[11] S. G. Mallat. A Theory for Multiresolution Signal Decomposition.\n\nPattern Analysis and Machine Intelligence, 11:674\u2013693, 1989.\n\nIEEE Transactions on\n\n[12] M. Clausen. Fast generalized Fourier transforms. Theor. Comput. Sci., 67(1):55\u201363, 1989.\n[13] D. Maslen and D. Rockmore. Generalized FFTs \u2013 a survey of some recent results. In Groups\nand Computation II, volume 28 of DIMACS Ser. Discrete Math. Theor. Comput. Sci., pages\n183\u2013287. AMS, Providence, RI, 1997.\n\n[14] D. K. Maslen and D. N. Rockmore. Separation of Variables and the Computation of Fourier\nTransforms on Finite Groups, I. Journal of the American Mathematical Society, 10:169\u2013214,\n1997.\n\n9\n\n\f", "award": [], "sourceid": 773, "authors": [{"given_name": "Risi", "family_name": "Kondor", "institution": null}, {"given_name": "Walter", "family_name": "Dempsey", "institution": null}]}