{"title": "Infinite Factorial Dynamical Model", "book": "Advances in Neural Information Processing Systems", "page_first": 1666, "page_last": 1674, "abstract": "We propose the infinite factorial dynamic model (iFDM), a general Bayesian nonparametric model for source separation. Our model builds on the Markov Indian buffet process to consider a potentially unbounded number of hidden Markov chains (sources) that evolve independently according to some dynamics, in which the state space can be either discrete or continuous. For posterior inference, we develop an algorithm based on particle Gibbs with ancestor sampling that can be efficiently applied to a wide range of source separation problems. We evaluate the performance of our iFDM on four well-known applications: multitarget tracking, cocktail party, power disaggregation, and multiuser detection. Our experimental results show that our approach for source separation does not only outperform previous approaches, but it can also handle problems that were computationally intractable for existing approaches.", "full_text": "In\ufb01nite Factorial Dynamical Model\n\nIsabel Valera\u2217\n\nMax Planck Institute for\n\nSoftware Systems\n\nFrancisco J. R. Ruiz\u2217\n\nDepartment of Computer Science\n\nColumbia University\n\nivalera@mpi-sws.org\n\nf.ruiz@columbia.edu\n\nLennart Svensson\n\nDepartment of Signals and Systems\nChalmers University of Technology\n\nlennart.svensson@chalmers.se\n\nFernando Perez-Cruz\n\nUniversidad Carlos III de Madrid, and\n\nBell Labs, Alcatel-Lucent\nfernandop@ieee.org\n\nAbstract\n\nWe propose the in\ufb01nite factorial dynamic model (iFDM), a general Bayesian non-\nparametric model for source separation. Our model builds on the Markov In-\ndian buffet process to consider a potentially unbounded number of hidden Markov\nchains (sources) that evolve independently according to some dynamics, in which\nthe state space can be either discrete or continuous. For posterior inference, we\ndevelop an algorithm based on particle Gibbs with ancestor sampling that can be\nef\ufb01ciently applied to a wide range of source separation problems. We evaluate the\nperformance of our iFDM on four well-known applications: multitarget tracking,\ncocktail party, power disaggregation, and multiuser detection. Our experimental\nresults show that our approach for source separation does not only outperform\nprevious approaches, but it can also handle problems that were computationally\nintractable for existing approaches.\n\nIntroduction\n\n1\nThe central idea behind Bayesian nonparametrics (BNPs) is the replacement of classical \ufb01nite-\ndimensional prior distributions with general stochastic processes, allowing for an open-ended num-\nber of degrees of freedom in a model [8]. They constitute an approach to model selection and\nadaptation in which the model complexity is allowed to grow with data size [17]. In the literature,\nBNP priors have been applied for time series modeling. For example, the in\ufb01nite hidden Markov\nmodel [2, 20] considers a potentially in\ufb01nite cardinality of the state space; and the BNP construc-\ntion of switching linear dynamical systems (LDS) [4] considers an unbounded number of dynamical\nsystems with transitions among them occurring at any time during the observation period.\nIn the context of signal processing, the source separation problem has captured the attention of the\nresearch community for decades due to its wide range of applications [12, 23, 7, 24]. The BNP\nliterature for source separation includes [10], in which the authors introduce the nonparametric\ncounterpart of independent component analysis (ICA), referred as in\ufb01nite ICA (iICA); and [23],\nwhere the authors present the Markov Indian buffet process (mIBP), which places a prior over an\nin\ufb01nite number of parallel Markov chains and is used to build the in\ufb01nite factorial hidden Markov\nmodel (iFHMM) and the ICA iFHMM. These approaches can effectively adapt the number of hidden\nsources to \ufb01t the available data. However, they suffer from several limitations: i) the iFHMM is\nrestricted to binary on/off hidden states, which may lead to hidden chains that do not match the\nactual hidden causes, and it is not able to deal with continuous-valued states, and ii) both the iICA\nand the ICA iFHMM make independence assumptions between consecutive values of active hidden\nstates, which signi\ufb01cantly restricts their ability to capture the underlying dynamical models. As a\nresult, we \ufb01nd that existing approaches are not applicable to many well-known source separation\n\n\u2217 Both authors contributed equally.\n\n1\n\n\fproblems, such as multitarget tracking [12], in which each target can be modeled as a Markov chain\nwith continuous-valued states describing the target trajectory; or multiuser detection [24], in which\nthe high cardinality of the hidden states makes this problem computationally intractable for the non-\nbinary extension of the iFHMM. Hence, there is a lack of both a general BNP model for source\nseparation, and an ef\ufb01cient inference algorithm to address these limitations.\nIn this paper, we provide a general BNP framework for source separation that can handle a wide\nrange of dynamics and likelihood models. We assume a potentially in\ufb01nite number of sources that\nare modeled as Markov chains that evolve according to some dynamical system model. We assume\nthat only the active sources contribute to the observations, and the states of the Markov chains are not\nrestricted to be discrete but they can also be continuous-valued. Moreover, we let the observations\ndepend on both the current state of the hidden sequences, and on some previous states. This system\nmemory is needed when dealing with applications in which the individual source signals propagate\nthrough the air and may thus suffer from some phenomena, such as reverberation, echo, or multipath\npropagation. Our approach results in a general and \ufb02exible dynamic model that we refer to as in\ufb01nite\nfactorial dynamical model (iFDM), and that can be particularized to recover other models previously\nproposed in the literature, e.g., the binary iFHMM.\nAs for most BNP models, one of the main challenges of our iFDM is posterior inference. In discrete\ntime series models, including the iFHMM, an approximate inference algorithm based on forward-\n\ufb01ltering backward-sampling (FFBS) sweeps is typically used [23, 5]. However, the exact FFBS\nalgorithm has exponential computational complexity with respect to the memory length. The FFBS\nalgorithm also becomes computationally intractable when dealing with on/off hidden states that\nare continuous-valued when active. In order to overcome these limitations, we develop a suitable\ninference algorithm for our iFDM by building a Markov chain Monte Carlo (MCMC) kernel using\nparticle Gibbs with ancestor sampling (PGAS) [13]. This algorithm presents quadratic complexity\nwith respect to the memory length and can easily handle a broad range of dynamical models.\nThe versatility and ef\ufb01ciency of our approach is shown through a comprehensive experimental val-\nidation in which we tackle four well-known source separation problems: multitarget tracking [12],\ncocktail party [23], power disaggregation [7], and multiuser detection [24].1 Our results show that\nour iFDM provides meaningful estimations of the number of sources and their corresponding indi-\nvidual signal traces even in applications that previous approaches cannot handle. It also outperforms,\nin terms of accuracy, the iFHMM (extended to account for the actual state space cardinality) com-\nbined with FFBS-based inference in the cocktail party and power disaggregation problems.\n2\nIn this section, we detail our proposed iFDM. We assume that there is a potentially in\ufb01nite number of\nsources contributing to the observed sequence {yt}T\nt=1, and each source is modeled by an underlying\ndynamic system model in which the state of the m-th source at time t, denoted by xtm \u2208 X , evolves\nover time as a \ufb01rst-order Markov chain. Here, the state space X can be either discrete or continuous.\nIn addition, we introduce the auxiliary binary variables stm \u2208 {0, 1} to indicate whether the m-th\nsource is active at time t, such that the observations only depend on the active sources. We assume\nthat the variables stm follow a \ufb01rst-order Markov chain and let the states xtm evolve according to\np(xtm|stm, x(t\u22121)m, s(t\u22121)m), i.e., the dynamic system model may depend on whether the source\nis active or inactive. We assume dummy states stm = 0 for t \u2264 0. As an example, in the cocktail\nparty problem, yt denotes a sample of the recorded audio signal, which depends on the individual\nvoice signals of the active speakers. The latent states xtm in this example are real-valued and the\ntransition model p(xtm|stm = 1, x(t\u22121)m, s(t\u22121)m) describes the dynamics of the voice signal.\nIn many real applications, the individual signals propagate though the air until they are mixed and\ngathered by the receiver. In such propagation, different phenomena (e.g., refraction or re\ufb02exion of\nthe signal in the walls) may occur, leading to multipath propagation of the signals and, therefore,\nto different delayed copies of the individual signals at the receiver.\nIn order to account for this\n\u201cmemory\u201d effect, we consider that the state of the m-th source at time t, xtm, in\ufb02uences not only the\nobservation yt, but also the future L \u2212 1 observations, yt+1, . . . , yt+L\u22121. Therefore, the likelihood\nof yt depends on the last L states of all the Markov chains, yielding\n\nIn\ufb01nite Factorial Dynamical Model\n\np(yt|X, S) = p(yt|{xtm, stm, x(t\u22121)m, s(t\u22121)m, . . . , x(t\u2212L+1)m, s(t\u2212L+1)m}M\n1Code for these applications can be found at https://github.com/franrruiz/iFDM\n\nm=1),\n\n(1)\n\n2\n\n\f(a)\n\n(b)\n\nFigure 1: (a) Graphical representation of the iFDM with memory length L = 2. The dashed lines\nrepresent the memory. (b) Equivalent representation using extended states.\n\nwhere X and S are T \u00d7 M matrices containing all the states xtm and stm, respectively. We remark\nthat the likelihood of yt cannot depend on any hidden state x\u03c4 m if s\u03c4 m = 0.\nIn order to be able to deal with an in\ufb01nite number of sources, we place a BNP prior over the binary\nmatrix S that contains all variables stm. In particular, we assume that S \u223c mIBP(\u03b1, \u03b20, \u03b21), i.e., S\nis distributed as a mIBP [23] with parameters \u03b1, \u03b20 and \u03b21. The mIBP places a prior distribution\nover binary matrices with a \ufb01nite number of rows T and an in\ufb01nite number of columns M, in which\neach row represents a time instant, and each column represents a Markov chain. The mIBP ensures\nthat, for any \ufb01nite value of T , only a \ufb01nite number of columns M+ in S are active almost surely,\nwhereas the rest of them remain in the all-zero state and do not in\ufb02uence the observations. We\nmake use of the stick-breaking construction of the mIBP, which is particularly useful to develop\nmany practical inference algorithms [19, 23]. Under the stick-breaking construction, two hidden\nvariables for each Markov chain are introduced, representing the transition probabilities between\nthe active and inactive states.\nIn particular, we de\ufb01ne am = p(stm = 1|s(t\u22121)m = 0) as the\ntransition probability from inactive to active, and bm = p(stm = 1|s(t\u22121)m = 1) as the self-\ntransition probability of the active state of the m-th chain. In the stick-breaking representation, the\ncolumns of S are ordered according to their values of am, such that a1 > a2 > a3 > . . ., and\nthe probability distribution over variables am is given by a1 \u223c Beta(\u03b1, 1), and p(am|am\u22121) \u221d\n(am)\u03b1\u22121I(0 \u2264 am \u2264 am\u22121), being I(\u00b7) the indicator function [19]. Finally, we place a beta\ndistribution over the transition probabilities bm of the form bm \u223c Beta(\u03b20, \u03b21).\nThe resulting iFDM model, particularized for L = 2, is shown in Figure 1a. Note that this model\ncan be equivalently represented as shown in Figure 1b, using the extended states s(e)\n\ntm, with\n\ns(e)\n\ntm =(cid:2) xtm,\n\ns(t\u2212L+1)m (cid:3) .\n\ns(t\u22121)m,\n\nstm, x(t\u22121)m,\n\n. . . , x(t\u2212L+1)m,\n\n(2)\nThis extended representation allows for an FFBS-based inference algorithm. However, the exponen-\ntial complexity of the FFBS with the memory parameter L and with continuous-valued hidden states\nxtm makes the algorithm intractable in many real scenarios. Hence, we maintain the representation\nin Figure 1a because it allows us to derive an ef\ufb01cient inference algorithm.\nThe proposed iFDM in Figure 1a can be particularized to resemble some other models that have\nbeen proposed in the literature. In particular, we recover: i) the iFHMM in [23] by choosing the\nstate space X = {0, 1}, xtm = stm and L = 1, ii) the ICA iFHMM in [23] if we set X = R, L = 1\nand assume that p(xtm|stm = 1, x(t\u22121)m, s(t\u22121)m) = p(xtm|stm = 1) is a Gaussian distribution,\nand iii) a BNP counterpart of the LDS [9] with on/off states by assuming L = 1 and X = R, and\nletting the variables xtm be Gaussian distributed with linear relationships among them.\n3\nWe develop an inference algorithm for the proposed iFDM that can handle different dynamic and\nlikelihood models. Our approach relies on a blocked Gibbs sampling algorithm that alternates be-\ntween sampling the number of considered chains and the global variables conditioned on the current\nvalue of matrices S and X, and sampling matrices S and X conditioned on the current value of the\nremaining variables. In particular, the algorithm proceeds iteratively as follows:\n\nInference Algorithm\n\n\u2022 Step 1: Add Mnew new inactive chains using an auxiliary slice variable and a slice sampling\nmethod. In this step, the number of considered chains is increased from its initial value M+\nto M\u2021 = M+ + Mnew (M+ is not updated because stm = 0 for all t for the new chains).\n\n3\n\n\u21b5ams0ms1msTm...0,1bmx0mx1mxTmx2mx3m...s2ms3m...y1y2y3yTm=1,...,1\u21b5am...0,1bm...y1y2y3yTs(e)1ms(e)0ms(e)2ms(e)3ms(e)Tmm=1,...,1\f(a) Example of the connection of particles\nin PGAS. We represent P = 3 particles\n\u03c4 for \u03c4 = {t\u2212 1, t, t + 1}. The index ai\nxi\n\u03c4\ndenotes the ancestor particle of xi\n\u03c4 . It can\nbe seen that, e.g., the trajectories x1\n1:t+1\nand x2\n1:t+1 only differ at time instant t+1.\n\nFigure 2: Particle Gibbs with ancestor sampling.\n\n(b) PGAS algorithm.\n\nand the emission parameters, from their posterior distribution.\n\n\u2022 Step 2: Jointly sample the states xtm and stm of all the considered chains. Compact\nthe representation by removing those chains that remain inactive in the entire observation\nperiod, consequently updating M+.\n\u2022 Step 3: Sample the global variables in the model, which include the transition probabilities\nIn Step 1, we follow the slice sampling scheme for inference in BNP models based on the Indian\nbuffet process (IBP) [19, 23], which effectively transforms the model into a \ufb01nite factorial model\nwith M\u2021 = M+ + Mnew parallel chains. Step 2 consists in sampling the elements of the matri-\nces S and X given the current value of the global variables. Here, we propose to use PGAS, an\nalgorithm recently developed for inference in state-space models and non-Markovian latent vari-\nable models [13]. Each iteration of this algorithm presents quadratic complexity with respect to\nthe memory length L, avoiding the exponential complexity of the standard FFBS algorithm when\napplied over the equivalent model with extended states in Figure 1b. Details on the PGAS approach\nare given in Section 3.1. After running PGAS, we remove those chains that remain inactive in the\nwhole observation period. In Step 3, we sample the transition probabilities am and bm, as well as\nother model-dependent variables such as the observation variables needed to evaluate the likelihood\np(yt|X, S). Further details on the inference algorithm can be found in the Supplementary Material.\n3.1 Particle Gibbs with ancestor sampling\nPGAS [13] is a method within the framework of particle MCMC [1] that combines the main ideas,\nas well as the strengths, of sequential Monte Carlo and MCMC techniques. In contrast to other\nparticle Gibbs with backward simulation methods [25, 14], this algorithm can also be conveniently\napplied to non-Markovian latent variable models, i.e., models that are not expressed on a state-space\nform. The PGAS algorithm is an MCMC kernel, and thus generates a new sample of the hidden state\nmatrices (X, S) given an initial sample (X(cid:48), S(cid:48)), which is the output of the previous iteration of the\nPGAS (extended to account for the Mnew new inactive chains). The machinery inside the PGAS\nalgorithm resembles an ordinary particle \ufb01lter, with two main differences: one of the particles is\ndeterministically set to the reference input sample, and the ancestor of each particle is randomly\nchosen and stored during the algorithm execution. We brie\ufb02y describe the PGAS approach below,\nbut we refer to [13] for a rigorous analysis of the algorithm properties.\nIn the proposed PGAS, we assume a set of P particles for each time instant, each representing the\nstates {xtm, stm}M\u2021\nt the state of the i-th particle at time t. We also\nintroduce the ancestor indexes ai\nt \u2208 {1, . . . , P} in order to denote the particle that precedes the\ni-th particle at time t. That is, ai\nt corresponds to the index of the ancestor particle of xi\nt. Let also\n1:t be the ancestral path of particle xi\nt, i.e., the particle trajectory that is recursively de\ufb01ned as\nxi\n1:t = (xai\nxi\n\nt). Figure 2a shows an example to clarify the notation.\n\nm=1. We denote by the vector xi\n\nt\n\n1:t\u22121, xi\n\n4\n\nx1tx2tx3tx3t1x3t+1x2t+1x2t1x1t1x1t+1ta1t=1a2t=1a3t=2a3t+1=3a2t+1=2a1t+1=2i=1i=2i=3t+1t1270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323Algorithm1ParticleGibbswithancestorsamplingInput:Referenceparticlex0tfort=1,...,T,andglobalvariables.Output:Samplexout1:TfromthePGASMarkovkernelDrawxi1\u21e0r1(x1)fori=1,...,P1(Eq.4)1SetxP1=x012Computetheweightswi1=W1(xi1)fori=1,...,P(Eq.5)3fort=2,...,Tdo4//ResamplingandancestorsamplingDrawait\u21e0Categorical(w1t1,...,wPt1)fori=1,...,P15Computeewit1|Tfori=1,...,P(Eq.6)6DrawaPt\u21e0Categorical(ew1t1|T,...,ewPt1|T)7//ParticlepropagationDrawxit\u21e0rt(xt|xait1:t1)fori=1,...,P1(Eq.4)8SetxPt=x0t9Setxi1:t=(xait1:t1,xit)fori=1,...,P(Eq.3)10//WeightingComputetheweightswit=Wt(xi1:t)fori=1,...,P(Eq.5)11Drawk\u21e0Categorical(w1T,...,wPT)12returnxout1:T=xk1:T13furthermore,theycanswitchonandoff(i.e.,startorstoptransmitting)atanygiventime.Targetsareallowedtoswitchonatanyposition.Wegeneratesyntheticdatainwhichthreedifferenttargetsmovewithinaregionof800\u21e5800metres,where25sensorsarelocatedonaregulargridof5\u21e55.Thestatextm=[x(1)tm,x(2)tm,v(1)tm,v(2)tm]>ofeachtargetconsistsofitspositionandvelocityinatwodimensionalplane,andweassumealinearGaussiandynamicmodelsuchthat,whileactive,xtmevolvesaccordingtoxtm=Gxx(t1)m+Guut=26410Ts0010Ts00100001375x(t1)m+2664T2s200T2s2Ts00Ts3775ut,(7)whereTs=0.5isthesamplingperiod,andut\u21e0N(0,I)isavectorthatmodelstheaccelerationnoise.Foreachconsideredtarget,wesampletheinitialpositionuniformlyinthesensornetworkspace,andassumethattheinitialvelocityisGaussiandistributedwithzeromeanandcovariance0.01I.Similarlyto[20,12],weassumetheobservationofsensorjattimetisgivenbythereceivedsignalstrength(RSS),i.e.,ytj=Pm:stm=1P0\u00b7\u21e3d0dmjt\u2318+ntj,wherentj\u21e0N(0,2)isthenoiseterm,dmjtisthedistancebetweentargetmandsensorjattimet,P0=10isthetransmittedpower,andd0=100metresand=2arerespectivelythereferencedistanceandthepathlossexponent,whichaccountfortheradiopropagationmodel.WeapplyourinferencealgorithmonanobservationperiodoflengthT=300.InourinferencealgorithmwesamplethenoisevariancebyplacinganInvGamma(1,1)asitspriordistribution.InFigure3,weshowthetrueandinferredtrajectoriesofthetargets,andthetemporalevolutionofthepositionerror.Wehavesortedtheinferredtargetsinawaythatthepositionerrorisminimized.Inthis\ufb01gure,weobservethattheproposedmodelandalgorithmisabletodetectthethreetargetsandtheirtrajectorieswithanaveragepositionerrorofaround6metres.Wedonotconsiderabenchmarkalgorithmbecause,tothebestofourknowledge,therearenotmultitargettrackingapproachesintheliteraturethatcandealwithtargetsthatmaystartandstoptransmittingatanytime.CocktailParty.Wenowaddressablindspeechseparationtask,alsoknownasthecocktailpartyproblem.Morespeci\ufb01cally,werecordmultiplepeoplewhoaresimultaneouslyspeaking,usingasetofmicrophones.Giventherecordedsignal,thegoalistoseparateouttheindividualspeechsignals.Speakersmaystartspeakingorbecomesilentatanygiventime.Similarlyto[23],wecollectdatafromseveralspeakersfromthePASCAL\u2018CHiME\u2019SpeechSeparationandRecognitionChallengewebsite.1Thevoicesignalforeachspeakerconsistsof4sentences,whichweappendwithrandompausesinbetweeneachsentence.Wearti\ufb01ciallymixthedata10times(corresponding1http://spandh.dcs.shef.ac.uk/projects/chime/PCC/datasets.html6\fThe algorithm is summarized in Figure 2b. For each time instant t, we \ufb01rst generate the ancestor\nt\u22121. Given these an-\nindexes for the \ufb01rst P \u2212 1 particles according to the importance weights wi\ncestors, the particles are then propagated across time according to a distribution rt(xt|xat\n1:t\u22121). For\nsimplicity, and dropping the global variables from the notation for conciseness, we assume that\n\nM\u2021(cid:89)m=1\n\nrt(xt|xat\n\n1:t\u22121) = p(xt|xat\n\nt\u22121) =\n\np(xtm|stm, xat\n\n(t\u22121)m, sat\n\n(t\u22121)m)p(stm|sat\n\n(t\u22121)m),\n\n(3)\n\nt = x(cid:48)t, whereas the ancestor indexes aP\n\ni.e., particles are propagated as in Figure 1a using a simple bootstrap proposal kernel,\np(xtm, stm|s(t\u22121)m, x(t\u22121)m). The P -th particle is instead deterministically set to the reference\nparticle, xP\nt\u22121|T .\nIndeed, this is a crucial step that vastly improves the mixing properties of the MCMC kernel.\nWe now focus on the computation on the importance weights wi\nFor the former, the particles are weighted according to wi\n\nt are sampled according to some weights (cid:101)wi\nt and the ancestor weights (cid:101)wi\n\n1:t), where\n\nt = Wt(xi\n\nt\u22121|T .\n\nWt(x1:t) =\n\np(x1:t|y1:t)\n\np(x1:t\u22121|y1:t\u22121)rt(xt|x1:t\u22121) \u221d p(yt|xt\u2212L+1:t),\n\n(4)\n\nbeing y\u03c41:\u03c42 the set of observations {yt}\u03c42\n\nt=\u03c41. Eq. 4 implies that, in order to obtain the importance\n\n(5)\n\nt\u22121\n\np(xi\n\nt\u22121)\n\np(y\u03c4|xi\n\nt\u22121p(x(cid:48)t|xi\n\nt\u22121p(x(cid:48)t|xi\n\nt\u22121|T = wi\n\n1:t\u22121, x(cid:48)t:T ).\n\n1:t\u22121, x(cid:48)t:T|y1:T )\n1:t\u22121|y1:t\u22121) \u221d wi\np(xi\nt\u22121|T \u221d wi\n\nweights, it suf\ufb01ces to evaluate the likelihood at time t. The weights (cid:101)wi\nt\u22121|T are given by\nt+L\u22122(cid:89)\u03c4 =t\n\n(cid:101)wi\nterm is not present and, therefore, (cid:101)wi\nweights (cid:101)wi\n\nNote that, for memoryless models (i.e., L = 1), Eq. 5 can be simpli\ufb01ed, since the product in the last\nt\u22121). For L > 1, the computation of the\nt\u22121|T in (5) for i = 1, . . . , P has computational time complexity scaling as O(P M\u2021L2).\nSince this computation needs to be performed for each time instant (and this is the most expensive\ncalculation), the resulting algorithm complexity scales as O(P T M\u2021L2).\n4 Experiments\nWe now evaluate the proposed model and inference algorithm on four different applications, which\nare detailed below and summarized in Table 1. For the PGAS kernel, we use P = 3, 000 particles in\nall our experiments. Additional details on the experiments are given in the Supplementary Material.\nMultitarget Tracking.\nIn the multitarget tracking problem, we aim at locating the position of\nseveral moving targets based on noisy observations. Under a general setup, a varying number of\nindistinguishable targets are moving around in a region, appearing at random in space and time.\nMultitarget tracking plays an important role in many areas of engineering such as surveillance,\ncomputer vision and signal processing [18, 16, 21, 6, 12]. Here, we focus on a simple synthetic\nexample to show that our proposed iFDM can handle time-dependent continuous-valued hidden\nstates. We place three moving targets within a region of 800 \u00d7 800 metres, where 25 sensors are\nlocated on a square grid. The state xtm = [x(1)\ntm](cid:62) of each target consists of its\nposition and velocity in a two dimensional plane, and we assume a linear Gaussian dynamic model\nsuch that, while active, xtm evolves according to\n\ntm, x(2)\n\ntm, v(1)\n\ntm, v(2)\n\ns\n\ns\n\nxtm = Gxx(t\u22121)m + Guut\n2 0; 0 T 2\n\n(6)\nwhere Gx = [1 0 Ts 0; 0 1 0 Ts; 0 0 1 0; 0 0 0 1], Gu = [ T 2\n2 ; Ts 0; 0 Ts], Ts = 0.5 is the sam-\npling period, and ut \u223c N (0, I) is a vector that models the acceleration noise. For each considered\ntarget, we sample the initial position uniformly in the sensor network space, and assume that the\ninitial velocity is Gaussian distributed with zero mean and covariance 0.01I. Following [21, 12], we\ngenerate (T = 300) observations based on the received signal strength (RSS), where the measure-\n+ ntj. Here, ntj \u223c N (0, 2)\nis the noise term, dmjt is the distance between target m and sensor j at time t, P0 = 10 is the\ntransmitted power, and d0 = 100 metres and \u03b3 = 2 are, respectively, the reference distance and the\npath loss exponent, which account for the radio propagation model. In our inference algorithm, we\nsample the noise variance by placing an InvGamma(1,1) distribution as its prior. Here, we compare\n\nment of sensor j at time t is given by ytj =(cid:80)m:stm=1 P0 \u00b7(cid:16) d0\ndmjt(cid:17)\u03b3\n\n5\n\n\fApplication\n\nModel\n\nMultitarget Tracking\n\nIn\ufb01nite factorial LDS\n\nCocktail Party\n\nPower Dissagregation\nMultiuser Detection\n\nICA iFHMM\n\nNon-binary iFHMM\n\n\u2212\n\n{0, 1, . . . , Q \u2212 1}\n\nA(cid:83){0}\n\nTable 1: Applications of the iFDM.\n\nX\nR4\nR\n\np(xtm|stm = 1, x(t\u22121)m, s(t\u22121)m = 1)\n\nN (xtm|Gxx(t\u22121)m, GuG(cid:62)u )\nam\njk = p(xtm = k|x(t\u22121)m = j)\n\nN (xtm|0, \u03c32\nx)\n\nU (A)\n\nL\n\n1\n1\n1\n\n\u2208 N\n\nTarget 1\nTarget 2\nTarget 3\nAverage\n\niFDM Genie-aided model\n7.0\n5.9\n6.3\n6.4\n\n4.8\n6.0\n5.4\n5.9\n\n(c) Average position error.\n\n(a) Target trajectories.\n\n(b) Position error.\n\nFigure 3: Results for the multitarget tracking problem.\n\nthe performance of the iFDM with a \u2018genie-aided\u2019 \ufb01nite factorial model with perfect knowledge of\nthe number of targets and noise variance.\nIn Figures 3a and 3b, we show the true and inferred trajectories of the targets, and the temporal\nevolution of the position error of the iFDM. Additionally, Figure 3c shows the average position error\n(in absolute value) for our iFDM and the genie-aided method. In these \ufb01gures, we observe that the\nproposed model and algorithm is able to detect the three targets and their trajectories, providing\nsimilar performance to the genie-aided method.\nIn particular, both approaches provide average\nposition errors of around 6 metres, which is thrice the noise variance.\nCocktail Party. We now address a blind speech separation task, also known as the cocktail party\nproblem. Given the recorded audio signals from a set of microphones, the goal is to separate out the\nindividual speech signals of multiple people who are speaking simultaneously. Speakers may start\nspeaking or become silent at any time. Similarly to [23], we collect data from several speakers from\nthe PASCAL \u2018CHiME\u2019 Speech Separation and Recognition Challenge website.2 The voice signal\nfor each speaker consists of 4 sentences, which we append with random pauses in between each sen-\ntence. We arti\ufb01cially mix the data 10 times (corresponding to 10 microphones) with mixing weights\nsampled from Uniform(0, 1), such that each microphone receives a linear combination of all the con-\nsidered signals, corrupted by Gaussian noise with standard deviation 0.3. We consider two scenarios,\nwith 5 and 15 speakers, and subsample the data so that we learn from T = 1, 354 and T = 1, 087\ndatapoints, respectively. Following [23], our model assumes p(xtm|stm = 1, x(t\u22121)m, s(t\u22121)m) =\nN (xtm|0, 2), and xtm = 0 whenever stm = 0. We also model yt as a linear combination of all\nthe voice signals under Gaussian noise, i.e., yt = (cid:80)M+\nm=1 wmxtm + nt, where nt \u223c N (0, \u03c32\nyI)\nis the noise term, wm \u223c N (0, I) is the 10-dimensional weighting vector associated to the m-th\nspeaker, and \u03c32\ny \u223c InvGamma(1, 1). We compare our iFDM with the ICA iFHMM in [23] using\nFFBS sweeps for inference, with (i) p(xtm|stm = 1) = N (xtm|0, 2) (denoted as FFBS-G), and (ii)\np(xtm|stm = 1) = Laplace(xtm|0, 2) (denoted as FFBS-L).\nFor the scenario with 5 speakers, we show the true and the inferred (after iteration 10, 000) number\nof speakers in Figures 4a, 4b, 4c and 4d, along with their activities during the observation period. In\norder to quantitatively evaluate the performance of the different algorithms, we show in Figure 4e\n(top) the activity detection error rate (ADER), which is computed as the probability of detecting\nactivity (inactivity) of a speaker while that speaker is actually inactive (active). As the algorithms\nare unsupervised, we sort the estimated chains so that the ADER is minimized.\nIf the inferred\nnumber of speakers M+ is smaller (larger) than the true number of speakers, we consider some\nextra inferred inactive chains (additional speakers). The PGAS-based approach outperforms the two\nFFBS-based methods because it can jointly sample the states of all chains (speakers) for each time\ninstant, whereas the FFBS requires sampling each chain conditioned on the current states of the\nother chains, leading to poor mixing, as discussed in [22]. As a consequence, the FFBS tends to\noverestimate the number of speakers, as shown in Figure 4e (bottom).\n\n2http://spandh.dcs.shef.ac.uk/projects/chime/PCC/datasets.html\n\n6\n\n02004006008000200400600800Target 1Target 2Target 3Inferred Target 1 Inferred Target 2 Inferred Target 3SensorsTime0100200300Error (m)05101520253035Target 1Target 2Target 3\f(a) Ground truth.\n\n(b) PGAS.\n\n(c) FFBS-G.\n\n(d) FFBS-L.\n\nMethod\n\n0.08\n0.25\n0.14\n\n# of Speakers\n5\n15\n0.08\n0.14\n0.12\n15\n15\n15\n(e) ADER / Inferred M+.\n\nR PGAS\nFFBS-G\nE\nD\nFFBS-L\nA\nPGAS\nFFBS-G\nFFBS-L\n\n+\nM\n\n5\n7\n8\n\nFigure 4: Results for the cocktail party problem.\n\nAlgorithm\n\nPGAS\nFFBS\n\nH. 1\n0.68\n0.59\n\nH. 2\n0.79\n0.78\n\nH. 3\n0.60\n0.56\n\nH. 4\n0.58\n0.53\n\nH. 5\n0.55\n0.43\n\nAlgorithm\n\nPGAS\nFFBS\n\nDay 1\n0.76\n0.67\n(b) AMP.\n\nDay 2\n0.82\n0.72\n\n(a) REDD (\u2018H\u2019 stands for \u2018House\u2019).\nTable 2: Accuracy for the power disaggregation problem.\n\nm=1 P m\n\nj \u223c Dirichlet(1), where each element am\n\nPower Disaggregation. Given the aggregate whole-home power consumption signal, the power\ndisaggregation problem consists in estimating both the number of active devices in the house and\nthe power draw of each individual device [11, 7]. We validate the performance of the iFDM on\ntwo different real databases: the Reference Energy Disaggregation Data Set (REDD) [11], and the\nAlmanac of Minutely Power Dataset (AMP) [15]. For the AMP database, we consider two 24-hour\nsegments and 8 devices. For the REDD database, we consider a 24-hour segment across 5 houses\nand 6 devices. Our model assumes that each device can take Q = 4 different states (one inactive\nstate and three active states with different power consumption), i.e., xtm \u2208 {0, 1, . . . , Q \u2212 1}, with\nxtm = 0 if stm = 0. We place a symmetric Dirichlet prior over the transition probability vectors\nof the form am\njk = p(xtm = k|stm = 1, x(t\u22121)m =\nj, s(t\u22121)m). When xtm = 0, the power consumption of device m at time t is zero (P m\n0 = 0), and\nwhen xtm \u2208 {1, . . . , Q\u2212 1} its average power consumption is given by P m\nxtm. Thus, the total power\nconsumption is given by yt = (cid:80)M+\nxtm + nt, where nt \u223c N (0, 0.5) represents the additive\nq \u223c N (15, 10).\nGaussian noise. For q \u2208 {1, . . . , Q \u2212 1}, we assume a prior power consumption P m\nIn this case, the proposed model for the iFDM resembles a non-binary iFHMM and, therefore, we\ncan also apply the FFBS algorithm to infer the power consumption draws of each device.\nIn order to evaluate the performance of the different algorithms, we compute the mean accuracy of\nt \u2212\u02c6x(m)\n|\n,\nm=1 x(m)\n\nthe estimated consumption of each device (higher is better), i.e., acc = 1 \u2212 (cid:80)T\nwhere x(m)\nxtm are, respectively, the true and the estimated power consumption by\ndevice m at time t. In order to compute the accuracy, we assign each estimated chain to a device\nso that the accuracy is maximized. If the inferred number of devices M+ is smaller than the true\nnumber of devices, we use \u02c6x(m)\nt = 0 for the undetected devices. If M+ is larger than the true number\nof devices, we group all the extra chains as an \u201cunknown\u201d device and use x(unk)\n= 0. In Table 2 we\nshow the results provided by both algorithms. The PGAS approach outperforms the FFBS algorithm\nin the \ufb01ve houses of the REDD database and the two selected days of the AMP database. This occurs\nbecause the PGAS can simultaneously sample the hidden states of all devices for each time instant,\nwhereas the FFBS requires conditioning on the current states of all but one device.\nMultiuser Detection. We now consider a digital communication system in which users are allowed\nto enter or leave the system at any time, and several receivers cooperate to estimate the number of\nusers, the (digital) symbols they transmit, and the propagation channels they face. Multipath propa-\ngation affects the radio signal, thus causing inter-symbol interference. To capture this phenomenon\nin our model, we use L \u2265 1 in this application. We consider a multiuser Wi-Fi communication sys-\ntem, and we use a ray tracing algorithm (WISE software [3]) to design a realistic indoor wireless sys-\ntem in an of\ufb01ce located at Bell Labs Crawford Hill. We place 12 receivers and 6 transmitters across\nthe of\ufb01ce, in the positions respectively marked with circles and crosses in Figure 5 (all transmitters\nand receivers are placed at a height of 2 metres). Transmitted symbols belong to a quadrature phase-\nshift keying (QPSK) constellation, A = {\u00b11\u00b1\u221a\u22121\u221a2\n}, such that, while active, the transmitted symbols\nare independent and uniformly distributed in A, i.e., p(xtm|stm = 1, x(t\u22121)m, s(t\u22121)m) = U(A).\n\nm=1 |x(m)\nt=1(cid:80)M\n\nt=1(cid:80)M\n2(cid:80)T\n\nt = P m\n\nand \u02c6x(m)\n\nt\n\nt\n\nt\n\nt\n\n7\n\n\fFigure 5: Plane of the of\ufb01ce building at Bell Labs Crawford Hill.\n\nModel\n\niFDM\niFHMM\n\n1\n6/6\n3/11\n\n2\n6/6\n3/11\n\nL\n3\n6/6\n3/8\n\n4\n6/6\n1/10\n\n5\n6/6\n\u2212\n\n(a) # Recovered transmitters / Inferred M+.\n\nModel\n\n1\n\n2\n\nL\n3\n\n4\n\n5\n\n2.58\n2.79\n\niFDM\n0.16\niFHMM\n\u2212\n(b) MSE of the channel coef\ufb01cients (\u00d710\u22126).\n\n2.51\n1.38\n\n0.80\n5.53\n\n0.30\n1.90\n\nTable 3: Results for the multiuser detection problem.\n\n1\n\n(cid:96) |\u03c32\n\n(cid:96) \u223c CN (0, \u03c32\n\n6\u00d712(cid:80)m ||hm\n\n1 \u2212(cid:98)hm\n\n1 ||2, being(cid:98)hm\n\nyt = (cid:80)M+\n\nm=1(cid:80)L\n\n(cid:96)=1 hm\n(cid:96) and noise variance \u03c32\n\nyI, 0). We place an inverse gamma prior over \u03c32\n\nThe observations of all the receivers are weighted replicas of the transmitted symbols under noise,\n(cid:96) x(t\u2212(cid:96)+1)m + nt, where xtm = 0 for the inactive states, and the chan-\nnel coef\ufb01cients hm\ny are provided by WISE software. For inference, we as-\nsume Rayleigh-fading channels and, therefore, we place a circularly symmetric complex Gaussian\nprior distribution over the channel coef\ufb01cients, hm\n(cid:96) I, 0), and over the noise term,\n(cid:96) with mean and standard deviation\nnt \u223c CN (0, \u03c32\n0.01e\u22120.5((cid:96)\u22121). The choice of this particular prior is based on the assumption that the channel co-\n(cid:96) are a priori expected to decay with the memory index (cid:96), since the radio signal suffers\nef\ufb01cients hm\nmore attenuation as it propagates through the walls or bounces off them. We use an observation\nperiod T = 2, 000, and vary L from 1 to 5. Five channel taps correspond to the radio signal trav-\nelling a distance of 750 m, which should be enough given the dimensions of this of\ufb01ce space. We\ncompare our iFDM with a non-binary iFHMM model with state space cardinality |X| = 5L using\nFFBS sweeps for inference (we do not run the FFBS algorithm for L = 5 due to its computational\ncomplexity).\nWe show in Table 3a the number of recovered transmitters (i.e., the number of transmitters for which\nwe recover all the transmitted symbols with no error) found after running the inference algorithms,\ntogether with the inferred value of M+. We see that the iFHMM tends to overestimate the number\nof transmitters, which deteriorates the overall symbol estimates and, as a consequence, not all the\ntransmitted symbols are recovered. We additionally report in Table 3b the MSE of the \ufb01rst channel\ntap, i.e.,\n(cid:96) the inferred channel coef\ufb01cients. We sort the transmitters\nso that the MSE is minimized, and ignore the extra inferred transmitters. In general, the iFDM\noutperforms the iFHMM approach, as discussed above. Under our iFDM, the MSE decreases as we\nconsider a larger value of L, since the model better \ufb01ts the actual radio propagation model.\n5 Conclusions\nWe have proposed a general BNP approach to solve source separation problems in which the number\nof sources is unknown. Our model builds on the mIBP to consider a potentially unbounded number\nof hidden Markov chains that evolve independently according to some dynamics, in which the state\nspace can be either discrete or continuous. For posterior inference, we have developed an algorithm\nbased on PGAS that solves the intractable complexity that the FFBS presents in many scenarios,\nenabling the application of our iFDM in problems such as multitarget tracking or multiuser detec-\ntion. In addition, we have shown empirically that our PGAS approach outperforms the FFBS-based\nalgorithm (in terms of accuracy) in the cocktail party and power disaggregation problems, since\nthe FFBS gets more easily trapped in local modes of the posterior in which several Markov chains\ncorrespond to a single hidden source.\nAcknowledgments\nI. Valera is currently supported by the Humboldt research fellowship for postdoctoral researchers\nprogram and acknowledges the support of Plan Regional-Programas I+D of Comunidad de Madrid\n(AGES-CM S2010/BMD-2422). F. J. R. Ruiz is supported by an FPU fellowship from the Span-\nish Ministry of Education (AP2010-5333). This work is also partially supported by Ministerio de\nEconom\u00b4\u0131a of Spain (projects COMPREHENSION, id. TEC2012-38883-C02-01, and ALCIT, id.\nTEC2012-38800-C03-01), by Comunidad de Madrid (project CASI-CAM-CM, id. S2013/ICE-\n2845), by the Of\ufb01ce of Naval Research (ONR N00014-11-1-0651), and by the European Union\n7th Framework Programme through the Marie Curie Initial Training Network \u2018Machine Learning\nfor Personalized Medicine\u2019 (MLPM2012, Grant No. 316861).\n\n8\n\n120m14m123456789101112123456\fReferences\n[1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods. Journal of the\n\nRoyal Statistical Society Series B, 72(3):269\u2013342, 2010.\n\n[2] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The in\ufb01nite hidden Markov model. In Advances in\n\nNeural Information Processing Systems, volume 14, 2002.\n\n[3] S. J. Fortune, D. M. Gay, B. W. Kernighan, O. Landron, R. A. Valenzuela, and M. H. Wright. WISE\ndesign of indoor wireless systems: Practical computation and optimization. IEEE Computing in Science\n& Engineering, 2(1):58\u201368, March 1995.\n\n[4] E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. Bayesian nonparametric methods for learning\n\nMarkov switching processes. IEEE Signal Processing Magazine, 27(6):43\u201354, 2010.\n\n[5] E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. A sticky HDP-HMM with application to\n\nspeaker diarization. Annals of Applied Statistics, 5(2A):1020\u20131056, 2011.\n\n[6] L. Jiang, S. S. Singh, and S. Y\u0131ld\u0131r\u0131m. Bayesian tracking and parameter learning for non-linear multiple\n\ntarget tracking models. arXiv preprint arXiv:1410.2046, 2014.\n\n[7] M. J. Johnson and A. S. Willsky. Bayesian nonparametric hidden semi-Markov models. Journal of\n\nMachine Learning Research, 14:673\u2013701, February 2013.\n\n[8] M. I. Jordan. Hierarchical models, nested models and completely random measures. Springer, New York,\n\n(NY), 2010.\n\n[9] R. E. Kalman. A new approach to linear \ufb01ltering and prediction problems. ASME Journal of Basic\n\nEngineering, 82(Series D):35\u201345, 1960.\n\n[10] D. Knowles and Z. Ghahramani. Nonparametric Bayesian sparse factor models with application to gene\n\nexpression modeling. The Annals of Applied Statistics, 5(2B):1534\u20131552, June 2011.\n\n[11] J Z. Kolter and T. Jaakkola. Approximate inference in additive factorial hmms with application to energy\nIn International conference on arti\ufb01cial intelligence and statistics, pages 1472\u20131482,\n\ndisaggregation.\n2012.\n\n[12] J. Lim and U. Chong. Multitarget tracking by particle \ufb01ltering based on RSS measurement in wireless\n\nsensor networks. International Journal of Distributed Sensor Networks, March 2015.\n\n[13] F. Lindsten, M. I. Jordan, and T. B. Sch\u00a8on. Particle Gibbs with ancestor sampling. Journal of Machine\n\nLearning Research, 15(1):2145\u20132184, 2014.\n\n[14] F. Lindsten and T. B. Sch\u00a8on. Backward simulation methods for Monte Carlo statistical inference. Foun-\n\ndations and Trends in Machine Learning, 6(1):1\u2013143, 2013.\n\n[15] S. Makonin, F. Popowich, L. Bartram, B. Gill, and I. V. Bajic. AMPds: A public dataset for load dis-\naggregation and eco-feedback research. In Proceedings of the 2013 IEEE Electrical Power and Energy\nConference (EPEC), 2013.\n\n[16] S. Oh, S. Russell, and S. Sastry. Markov chain Monte Carlo data association for general multiple-target\ntracking problems. In IEEE Conference on Decision and Control, volume 1, pages 735\u2013742, Dec 2004.\nIn Encyclopedia of Machine Learning.\n\n[17] P. Orbanz and Y. W. Teh. Bayesian nonparametric models.\n\nSpringer, 2010.\n\n[18] S. S\u00a8arkk\u00a8a, A. Vehtari, and J. Lampinen. Rao-blackwellized particle \ufb01lter for multiple target tracking.\n\nInformation Fusion, 8(1):2\u201315, 2007.\n\n[19] Y. W. Teh, D. G\u00a8or\u00a8ur, and Z. Ghahramani. Stick-breaking construction for the Indian buffet process. In\n\nProceedings of the International Conference on Arti\ufb01cial Intelligence and Statistics, volume 11, 2007.\n\n[20] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the\n\nAmerican Statistical Association, 101(476):1566\u20131581, 2006.\n\n[21] F. Thouin, S. Nannuru, and M. Coates. Multi-target tracking for measurement models with additive\ncontributions. In Proceedings of the 14th International Conference on Information Fusion (FUSION),\npages 1\u20138, July 2011.\n\n[22] M. K. Titsias and C. Yau. Hamming ball auxiliary sampling for factorial hidden Markov models.\n\nAdvances in Neural Information Processing Systems 27, 2014.\n\nIn\n\n[23] J. Van Gael, Y. W. Teh, and Z. Ghahramani. The in\ufb01nite factorial hidden Markov model. In Advances in\n\nNeural Information Processing Systems, volume 21, 2009.\n\n[24] M. A. V\u00b4azquez and J. M\u00b4\u0131guez. User activity tracking in DS-CDMA systems.\n\nVehicular Technology, 62(7):3188\u20133203, 2013.\n\nIEEE Transactions on\n\n[25] N. Whiteley, C. Andrieu, and A. Doucet. Ef\ufb01cient Bayesian inference for switching state-space models\nusing particle Markov chain Monte Carlo methods. Technical report, Bristol Statistics Research Report\n10:04, 2010.\n\n9\n\n\f", "award": [], "sourceid": 1021, "authors": [{"given_name": "Isabel", "family_name": "Valera", "institution": "MPI-SWS"}, {"given_name": "Francisco", "family_name": "Ruiz", "institution": "Columbia University"}, {"given_name": "Lennart", "family_name": "Svensson", "institution": "Chalmers University of Technology, G\u00f6teborg"}, {"given_name": "Fernando", "family_name": "Perez-Cruz", "institution": null}]}