{"title": "Signal Aggregate Constraints in Additive Factorial HMMs, with Application to Energy Disaggregation", "book": "Advances in Neural Information Processing Systems", "page_first": 3590, "page_last": 3598, "abstract": "Blind source separation problems are difficult because they are inherently unidentifiable, yet the entire goal is to identify meaningful sources. We introduce a way of incorporating domain knowledge into this problem, called signal aggregate constraints (SACs). SACs encourage the total signal for each of the unknown sources to be close to a specified value. This is based on the observation that the total signal often varies widely across the unknown sources, and we often have a good idea of what total values to expect. We incorporate SACs into an additive factorial hidden Markov model (AFHMM) to formulate the energy disaggregation problems where only one mixture signal is assumed to be observed. A convex quadratic program for approximate inference is employed for recovering those source signals. On a real-world energy disaggregation data set, we show that the use of SACs dramatically improves the original AFHMM, and significantly improves over a recent state-of-the art approach.", "full_text": "Signal Aggregate Constraints in Additive Factorial\nHMMs, with Application to Energy Disaggregation\n\nMingjun Zhong, Nigel Goddard, Charles Sutton\n\n{mzhong,nigel.goddard,csutton}@inf.ed.ac.uk\n\nSchool of Informatics\nUniversity of Edinburgh\n\nUnited Kingdom\n\nAbstract\n\nBlind source separation problems are dif\ufb01cult because they are inherently uniden-\nti\ufb01able, yet the entire goal is to identify meaningful sources. We introduce a way\nof incorporating domain knowledge into this problem, called signal aggregate\nconstraints (SACs). SACs encourage the total signal for each of the unknown\nsources to be close to a speci\ufb01ed value. This is based on the observation that the\ntotal signal often varies widely across the unknown sources, and we often have a\ngood idea of what total values to expect. We incorporate SACs into an additive\nfactorial hidden Markov model (AFHMM) to formulate the energy disaggregation\nproblems where only one mixture signal is assumed to be observed. A convex\nquadratic program for approximate inference is employed for recovering those\nsource signals. On a real-world energy disaggregation data set, we show that the\nuse of SACs dramatically improves the original AFHMM, and signi\ufb01cantly im-\nproves over a recent state-of-the-art approach.\n\n1\n\nIntroduction\n\nMany learning tasks require separating a time series into a linear combination of a larger number of\n\u201csource\u201d signals. This general problem of blind source separation (BSS) arises in many application\ndomains, including audio processing [17, 2], computational biology [1], and modelling electricity\nusage [8, 12]. This problem is dif\ufb01cult because it is inherently underdetermined and unidenti\ufb01able,\nas there are many more sources than dimensions in the original time series. The unidenti\ufb01ability\nproblem is especially serious because often the main goal of interest is for people to interpret the\nresulting source signals.\nFor example, consider the application of energy disaggregation. In this application, the goal is to\nhelp people understand what appliances in their home use the most energy; the time at which the\nappliance is used is of less importance. To place an electricity monitor on every appliance in a\nhousehold is expensive and intrusive, so instead researchers have proposed performing BSS on the\ntotal household electricity usage [8, 22, 15]. If this is to be effective, we must deal with the issue\nof identi\ufb01ability: it will not engender con\ufb01dence to show the householder a \u201cfranken-appliance\u201d\nwhose electricity usage looks like a toaster from 8am to 10am, a hot water heater until 12pm, and a\ntelevision until midnight.\nTo address this problem, we need to incorporate domain knowledge regarding what sorts of sources\nwe are hoping to \ufb01nd. Recently a number of general frameworks have been proposed for incor-\nporating prior constraints into general-purpose probabilistic models. These include posterior reg-\nularization [4], the generalized expectation criterion [14], and measurement-based learning [13].\nHowever, all of these approaches leave open the question of what types of domain knowledge we\nshould include. This paper considers precisely that research issue, namely, how to identify classes\n\n1\n\n\fof constraints for which we often have prior knowledge, which are general across a wide variety of\ndomains, and for which we can perform ef\ufb01cient computation.\nIn this paper we observe that in many applications of BSS, the total signal often varies widely across\nthe different unknown sources, and we often have a good idea of what total values to expect. We\nintroduce signal aggregate constraints (SACs) that encourage the aggregate values, such as the sums,\nof the source signals to be close to some speci\ufb01ed values. For example, in the energy disaggregation\nproblem, we know in advance that a toaster might use 50 Wh in a day and will be most unlikely to\nuse as much as 1000 Wh. We incorporate these constraints into an additive factorial hidden Markov\nmodel (AFHMM), a commonly used model for BSS [17].\nSACs raise dif\ufb01cult inference issues, because each constraint is a function of the entire state se-\nquence of one chain of the AFHMM, and does not decompose according to the Markov structure\nof the model. We instead solve a relaxed problem and transform the optimization problem into a\nconvex quadratic program which is computationally ef\ufb01cient.\nOn real-world data from the electricity disaggregation domain (Section 7.2.2), we show that the use\nof SACs signi\ufb01cantly improves performance, resulting in a 45% decrease in normalized disaggrega-\ntion error compared to the original AFHMM, and a signi\ufb01cant improvement (29%) in performance\ncompared to a recent state-of-the-art approach to the disaggregation problem [12].\nTo summarize, the contributions of this paper are: (a) introducing signal aggregate constraints\nfor blind source separation problems (Section 4), (b) a convex quadratic program for the relaxed\nAFHMM with SACs (Section 5), and (c) an evaluation (Section 7) of the use of SACs on a real-\nworld problem in energy disaggregation.\n\n2 Related Work\n\nThe problem of energy disaggregation, also called non-intrusive load monitoring, was introduced\nby [8] and has since been the subject of intense research interest. Reviews on energy disaggregation\ncan be found in [22] and [24].\nVarious approaches have been proposed to improve the basic AFHMM by constraining the states\nof the HMMs. The additive factorial approximate maximum a posteriori (AFAMAP) algorithm in\n[12] introduces the constraint that at most one chain can change state at any one time point. Another\napproach [21] proposed non-homogeneous HMMs combining with the constraint of changing at\nmost one chain at a time. Alternately, semi-Markov models represent duration distributions on the\nhidden states and are another approach to constrain the hidden states. These have been applied to\nthe disaggregation problems by [11] and [10]. Both [12] and [16] employ other kinds of additional\ninformation to improve the AFHMM. Other approaches could also be applicable for constraining the\nAFHMM, e.g., the k-segment constraints introduced for HMMs [19]. Some work in probabilistic\ndatabases has considered aggregate constraints [20], but that work considers only models with very\nsimple graphical structure, namely, independent discrete variables.\n\n3 Problem Setting\n\nSuppose we have observed a time series of sensor readings, for example the energy measured in\nwatt hours by an electricity meter, denoted by Y = (Y1, Y2,\u00b7\u00b7\u00b7 , YT ) where Yt \u2208 R+. It is assumed\nthat this signal was aggregated from some component signals, for example the energy consumption\nof individual appliances used by the household. Suppose there were I components, and for each\ncomponent, the signal is represented as Xi = (xi1, xi2,\u00b7\u00b7\u00b7 , xiT ) where xit \u2208 R+. Therefore, the\nobservation signal could be represented as the summation of the component signals as follows\n\nI(cid:88)\n\nYt =\n\nxit + \u0001t\n\n(1)\n\nwhere \u0001t is assumed Gaussian noise with zero mean and variance \u03c32\nt . The disaggregation problem\nis then to recover the unknown time series Xi given only the observed data Y . This is essentially\nthe BSS problem [3] where only one mixture signal was observed. As discussed earlier, there is no\n\ni=1\n\n2\n\n\funique solution for this model, due to the identi\ufb01ability problem: component signals are exchange-\nable.\n\n4 Models\n\nOur models in this paper will assume that the component signals Xi can be modelled by a hidden\nMarkov chain, in common with much work in BSS. For simplicity, each Markov chain is assumed to\nhave a \ufb01nite set of states such that for the chain i, xit \u2248 \u00b5it for some \u00b5it \u2208 {\u00b5i1,\u00b7\u00b7\u00b7 , \u00b5iKi} where\nKi denotes the number of the states in chain i. The idea of the SAC is fairly general, however, and\ncould be easily incorporated into other models of the hidden sources.\n\n4.1 The Additive Factorial HMM\n\nOur baseline model will be the AFHMM. The AFHMM is a natural model for generation of an\naggregated signal Y where the component signals Xi are assumed each to be a hidden Markov chain\nwith states Zit \u2208 {1, 2,\u00b7\u00b7\u00b7 , Ki} over time t. In the AFHMM, and variants such as AFAMAP, the\nmodel parameters, denoted by \u03b8, are unknown. These parameters are the \u00b5ik; the initial probabilities\n\u03c0i = (\u03c0i1,\u00b7\u00b7\u00b7 , \u03c0iKi)T for each chain where \u03c0ik = P (Zi1 = k); and the transition probabilities\njk = P (Zit = j|Zi,t\u22121 = k). Those parameters can be estimated by using approximation methods\np(i)\nsuch as the structured variational approximation [5].\nIn this paper we focus on inferring the sequence over time of hidden states Zit for each hidden\nMarkov chain; \u03b8 are assumed known. We are interested in maximum a posteriori (MAP) inference,\nand the posterior distribution has the following form\n\nP (Z|Y ) \u221d I(cid:89)\n\nT(cid:89)\n\nT(cid:89)\n\nI(cid:89)\n\ni=1\n\nt=1\n\nt=2\n\ni=1\n\nP (Zi1)\n\np(Yt|Zt)\n\nP (Zit|Zi,t\u22121)\n\n(2)\n\nwhere p(Yt|Zt) = N((cid:80)I\n\ni=1 \u00b5i,zit, \u03c32\n\nt ) is a Gaussian distribution. An alternative way to represent\nthe posterior distibution would use a binary vector Sit = (Sit1, Sit2,\u00b7\u00b7\u00b7 , SitKi)T to represent the\ndiscrete variable Zit such that Sitk = 1 when Zit = k and for all Sitj = 0 when j (cid:54)= k. The\n(cid:33)2\nlogarithm of posterior distribution over S then has the following form\n\nT(cid:88)\n\nI(cid:88)\n\n(cid:16)\n\nlog P (i)(cid:17)\n\nlog P (S|Y ) \u221d I(cid:88)\n\nT(cid:88)\n\nt=1\n\n1\n\u03c32\nt\n\n(cid:32)\nYt \u2212 I(cid:88)\n\ni=1\n\nSi,t\u22121 \u2212 1\n2\n\ni1 log \u03c0i +\nST\n\nST\nit\n\ni=1\n\nt=2\n\ni=1\n\nST\n\nit \u00b5i\n\n(3)\n\njk ) is the transition probability matrix and \u00b5i = (\u00b5i1, \u00b5i2,\u00b7\u00b7\u00b7 , \u00b5iKi)T . Exact\nwhere P (i) = (p(i)\ninference is not tractable as the numbers of chains and states increase. A MAP value can be con-\nveniently found by using the chainwise Viterbi algorithm [18], which optimizes jointly over each\nchain Si1 . . . SiT in sequence, holding the other chains constant. However, the chainwise Viterbi\nalgorithm can get stuck in local optima. Instead, in this paper we solve a convex quadratic program\nfor a relaxed version of the MAP problem (see Section 5). However, this solution is not guaranteed\noptimal due to the identi\ufb01ability problem. Many efforts have been made to provide tractable solu-\ntions to this problem by constraining the states of the hidden Markov chains. In the next section we\nintroduce signal aggregate constraints, which will help to address this problem.\n\n4.2 The Additive Factorial HMM with Signal Aggregate Constraints\n\nSAC assumes(cid:80)T\n\nNow we add Signal Aggregate Constraints to the AFHMM, yielding a new model AFHMM+SAC.\nThe AFHMM+SAC assumes that the aggregate value of each component signal i over the entire\nsequence is expected to be a certain value \u00b5i0, which is known in advance. In other words, the\nt=1 xit \u2248 \u00b5i0. The constraint values \u00b5i0 (i = 1, 2,\u00b7\u00b7\u00b7 , I) could be obtained from\nexpert knowledge or by experiments. For example, in the energy disaggregation domain, extensive\nresearch has been undertaken to estimate the average national consumption of different appliances\n[23].\n\n3\n\n\fIncorporating this constraint into the AFHMM, using the formulation from (3), results in the follow-\ning optimization problem for MAP inference\nlog P (S|Y )\n\nmaximize\n\nS\n\nsubject to\n\n(cid:32) T(cid:88)\n\nt=1\n\n(cid:33)2\n\ni Sit \u2212 \u00b5i0\n\u00b5T\n\n\u2264 \u03b4i,\n\ni = 1, 2,\u00b7\u00b7\u00b7 , I,\n\n(4)\n\nwhere \u00b5i0 (i = 1, 2,\u00b7\u00b7\u00b7 , I) are assumed known, and \u03b4i \u2265 0 is a tuning parameter which has the\nsimilar role as the ones used in ridge regression and LASSO [9]. Instead of solving this optimization\nproblem directly, we equivalently solve the penalized objective function\n\nL(S) = log P (S|Y ) \u2212 I(cid:88)\n\n\u03bbi\n\n(cid:32) T(cid:88)\n\ni=1\n\nt=1\n\nmaximize\n\nS\n\n(cid:33)2\n\ni Sit \u2212 \u00b5i0\n\u00b5T\n\n,\n\n(5)\n\nwhere \u03bbi \u2265 0 is a complexity parameter which has a one-to-one correspondence with the tuning\nparameter \u03b4i. In the Bayesian point of view, the constraint terms could be viewed as the logarithm\nof the prior distributions over the states S. Therefore, the objective can be viewed as a log posterior\ndistribution over S. Now the Viterbi algorithm is not applicable directly since at any time t, the\nstate Sit depends on all the states at all time steps, because of the regularization terms which\nare non-Markovian inherently. Therefore, in the following section we transform the optimization\nproblem (5) into a convex quadratic program which can be ef\ufb01ciently solved.\n\n(cid:18)(cid:80)tb\n\n(cid:19)2 \u2264 \u03b4ij where [ta\n\nNote that the constraints in equation (4) could be generalised. Rather than making only one con-\nstraint on each chain in the time period [0, T ] (as described above), a series of constraints could be\nmade. We could de\ufb01ne J constraints such that, for j = 1, 2,\u00b7\u00b7\u00b7 , J, the jth constraint for chain i is:\nij] denotes the time period for the constraint. This\ncould be reasonable particularly in household energy data to represent the fact that some appliances\nare commonly used during the daytime and are unlikely to be used between 2am and 5am. This is a\nstraightforward extension that does not complicate the algorithms, so for presentational simplicity,\nwe only use a single constraint per chain, as shown in (4), in the rest of this paper.\n\nij\n(i)\nj =ta\n\u03c4\nij\n\n\u00b5T\ni Si,\u03c4\n\n\u2212 \u00b5j\n\nij, tb\n\n(i)\nj\n\ni0\n\n5 Convex Quadratic Programming for AFHMM+SAC\n\nIn this section we derive a convex quadratic program (CQP) for the relaxed problem for (5). The\nproblem (5) is not convex even if the constraint Sitk \u2208 {0, 1} is relaxed, because log P (S|Y ) is not\nconvex. By adding an additional set of variables, we obtain a convex problem.\nSimilar to [12], we de\ufb01ne a new Ki \u00d7 Ki variable matrix H it = (hit\njk = 1 when\njk = 0. In order to present a CQP problem, we de\ufb01ne\nSi,t\u22121,k = 1 and Sitj = 1, and otherwise hit\nthe following notation. Denote 1T as a column vector of size T \u00d7 1 with all the elements being 1.\nDenote \u00b5\u2217\nand\ni . Denote eT as a T \u00d7 1 vector with the \ufb01rst element being 1 and all the others being\n\u02dc\u00b5i = 2\u03bbi\u00b5i0\u00b5\u2217\n(cid:80)\nzero. Denote \u02dc\u03c0i = eT \u2297 log \u03c0i with size T Ki \u00d7 1. We represent \u2212\u2192\u00b5 = (\u00b5T\nI )T with size\ni Ki \u00d7 1, and denote Vt = \u03c3\u22122\ni1,\u00b7\u00b7\u00b7 , ST\niT )T\nwith size T Ki \u00d7 1 and St = (ST\nl. as the\ncolumn and row vectors of the matrix H it, respectively.\nThe objective function in equation (5) can then be equivalently represented as\n\ni = 1T \u2297 \u00b5i with size T Ki \u00d7 1, where \u2297 is Kronecker product, then \u039bi = \u03bbi\u00b5\u2217\n\n2 ,\u00b7\u00b7\u00b7 , \u00b5T\n\u2212\u2192\u00b5 . We also denote Si = (ST\n\n\u2212\u2192\u00b5 \u2212\u2192\u00b5 T and ut = \u03c3\u22122\nt Yt\n1t,\u00b7\u00b7\u00b7 , ST\n\ni Ki \u00d7 1. Denote H it\n\njk) such that hit\n\n.l and H it\n\ni \u00b5\u2217T\n\n1 , \u00b5T\n\nt\n\ni\n\nIX\nX\n\ni=1\n\nL(S, H) =\n\n=\n\nST\n\ni \u02dc\u03c0i +\n\njk log p(i)\nhit\n\nX\njk \u2212 IX\n\ni,t,k,j\n\n\u201c\n\njk log p(i)\nhit\n\ni,t,k,j\n\ni=1\n\n\u201c\n\n\u201d \u2212 1\nTX\n\u201c\n\u201d \u2212 1\nTX\n\ni \u02dc\u00b5i\n\nt=1\n\n2\n\n2\n\nt=1\n\n\u201d\n\n\u201d\n\nt VtSt \u2212 2uT\nST\n\nt St\n\n+ C\n\nt VtSt \u2212 2uT\nST\n\nt St\n\n+ C\n\nIt)T with size(cid:80)\njk \u2212 IX\n\n\u201c\n\ni \u039biSi \u2212 ST\nST\n\ni=1\n\ni \u039biSi \u2212 ST\nST\n\ni ( \u02dc\u00b5i + \u02dc\u03c0i)\n\n4\n\n\fwhere C is constant. Our aim is to optimize the problem\n\nmaximize\n\nS,H\n\nsubject to\n\nL(S, H)\n\nKi(cid:88)\nKi(cid:88)\n\nk=1\n\nKi(cid:88)\n\nSitk = 1, Sitk \u2208 {0, 1}, i = 1, 2,\u00b7\u00b7\u00b7 , I; t = 1, 2,\u00b7\u00b7\u00b7 , T,\n\n(6)\n\nH it\n\nl. = ST\n\ni,t\u22121,\n\nH it\n\n.l = Sit, hit\n\njk \u2208 {0, 1}.\n\nl=1\n\nl=1\n\nThis problem is equavalent to the problem in equation (5). It should be noted that the matrices \u039bi\nand Vt are positive semide\ufb01nite (PSD). Therefore, the problem is an integer quadratic program (IQP)\nwhich is hard to solve. Instead we solve the relaxed problem where Sitk \u2208 [0, 1] and hit\njk \u2208 [0, 1].\nThe problem is thus a CQP. To solve this problem we used CVX, a package for specifying and\nsolving convex programs [7, 6]. Note that a relaxed problem for AFHMM could also be obtained\nby setting \u03bbi = 0, which is also a CQP. Concerning the computational complexity, the CQP for\nAFHMM+SAC has polynomial time in the number of time steps times the total number of states\nof the HMMs. In practice, our implementations of AFHMM, AFAMAP, and AFHMM+SAC scale\nsimilarly (see Section 7.2).\n\n6 Relation to Posterior Regularization\n\nIn this section we show that the objective function in (5) can also be derived from the posterior\nregularization framework [4]. The posterior regularization framework guides the model to approach\ndesired behavior by constraining the space of the model posteriors. The distribution de\ufb01ned in\n\n(3) is the model posterior distribution for the AFHMM. However, the desired distribution (cid:101)P we\n(cid:16)(cid:80)T\n\n. To ensure (cid:101)P is a valid distribution, it is required to optimize\n\n(cid:110)(cid:101)P|EeP (\u03d5i(S, Y )) \u2264 \u03b4i\n\nare interested in is de\ufb01ned in the constrained space\n\nwhere \u03d5i(S, Y ) =\n\ni Sit \u2212 \u00b5i0\n\n(cid:17)2\n\n(cid:111)\n\nt=1 \u00b5T\n\nminimize\n\neP\nsired distribution is (cid:101)P \u2217(S) = 1\n\nwhere KL(\u00b7|\u00b7) denotes the KL-divergence. According to [4], the unique optimal solution for the de-\n. This is exactly the distribution\n\nZ P (S|Y ) exp\n\ni=1 \u03bbi\u03d5i(S, Y )\n\n(cid:110)\u2212(cid:80)I\n\n(cid:111)\n\nsubject to EeP (\u03d5i(S, Y )) \u2264 \u03b4i, i = 1, 2,\u00b7\u00b7\u00b7 , I,\n\n(7)\n\nKL((cid:101)P (S)|P (S|Y ))\n\nin equation (5).\n\n7 Results\n\nIn this section, the AFHMM+SAC is evaluated by applying it to the disaggregation problems of a\ntoy data set and energy data, and comparing with AFHMM and AFAMAP performance.\n\n7.1 Toy Data\n\nIn this section the AFHMM+SAC was applied to a toy data set to evaluate the robustness of the\nmethod. Two chains were generated with state values \u00b51 = (0, 24, 280)T and \u00b52 = (0, 300, 500)T .\nThe initial and transition probabilities were randomly generated. Suppose the generated chains were\nxi = xi1, xi2,\u00b7\u00b7\u00b7 , xiT (i = 1, 2), with T = 100. The aggregated data were generated by the\nequation Yt = x1t + x2t + \u0001t where \u0001t follows a Gaussian distribution with zero mean and variance\n\u03c32 = 0.01. The AFHMM+SAC was applied to this data to disaggregate Y into component signals.\nNote that we simply set \u03bbi = 1 for all the experiments including the energy data, though in practice\nthese hyper-parameters could be tuned using cross validation. Denote \u02c6xi as the estimated signal for\nxi. The disaggregation performance was evaluated by the normalized disaggregation error (NDE)\n\nN DE =\n\n.\n\n(8)\n\n(cid:80)\n(cid:80)\ni,t(\u02c6xit \u2212 xit)2\n\ni,t x2\nit\n\n5\n\n\fFor the energy data we are also particularly interested in recovering the total energy used by each\nappliance [16, 10]. Therefore, another objective of the disaggregation is to estimate the total energy\nconsumed by each appliance over a period of time. To measure this, we employ the following signal\naggregate error (SAE)\n\nI(cid:88)\n\ni=1\n\nSAE =\n\n1\nI\n\nt=1 \u02c6xit \u2212(cid:80)T\n|(cid:80)T\n(cid:80)T\n\nt=1 Yt\n\nt(cid:48)=1 xit(cid:48)|\n\n.\n\n(9)\n\nIn order to assess how the SAC regularizer affects the results, various values for \u00b50 = (\u00b510, \u00b520)T\nwere used for the AFHMM+SAC algorithm. Figure 1 shows the NDE and SAE results. It shows\nthat as the Euclidean distance between the input vector \u00b50 and the true signal aggregate vector\nincreases, both the NDE and SAE increase. This shows how the SACs\n\n(cid:16)(cid:80)T\nt=1 x1t,(cid:80)T\n\nt=1 x2t\n\n(cid:17)\n\naffect the performance of AFHMM+SAC.\n\nFigure 1: Normalized disaggregation error and signal aggregate error computed by AFHMM+SAC\nusing various input vectors \u00b5i0. The x-axis shows the Euclidean distance between the input vector\n(\u00b510, \u00b520)T and the true signal aggregate vector\n\n(cid:16)(cid:80)T\nt=1 x1t,(cid:80)T\n\n(cid:17)T\n\nt=1 x2t\n\n.\n\n7.2 Energy Disaggregation\n\nIn this section, the AFHMM, AFAMAP, and AFHMM+SAC were applied to electrical energy disag-\ngregation problems. We use the Household Electricity Survey (HES) data. HES was a recent study\ncommissioned by the UK Department of Food and Rural Affairs, which monitored a total of 251\nowner-occupied households across England from May 2010 to July 2011 [23]. The study monitored\n26 households for an entire year, while the remaining 225 were monitored for one month during the\nyear with periods selected to be representative of the different seasons. Individual appliances as well\nas the overall electricity consumption were monitored. The households were carefully selected to be\nrepresentative of the overall population. The data were recorded every 2 or 10 minutes, depending\non the household. This ultra-low frequency data presents a challenge for disaggregation techniques;\ntypically studies rely on much higher data rates, e.g., the REDD data [12]. Both the data measured\nwithout and with a mains reading were used to compare those models. The model parameters \u03b8\nde\ufb01ned in AFHMM, AFAMAP and AFHMM+SAC for every appliance were estimated by using\n15-30 days\u2019 data for each household. We simply assume 3 states for all the appliances, though we\ncould assume more states which requires more computational costs. The \u00b5i was estimated by using\nk-means clustering on each appliance\u2019s signals in the training data.\n\n7.2.1 Energy Data without Mains Readings\n\nIn the \ufb01rst experiment, we generated the aggregate data by adding up the appliance signals, since\nno mains reading had been measured for most of the households. One-hundred households were\nstudied, and one day\u2019s usage was used as test data for each household. The model parameters were\n\n6\n\n10310400.20.40.60.8ErrorDistance Normalized Disaggregation ErrorSignal Aggregate Error\fTable 1: Normalized disaggregation error (NDE), signal aggregate error (SAE), and computing time\nobtained by AFHMM, AFAMAP, and AFHMM+SAC on the energy data for 100 houses without\nmains. Shown are the mean\u00b1std values over days. NTC: National total consumption which was\nthe average consumption of each appliance over the training days; TTC: True total consumption for\neach appliance for that day and household in the test data.\n\nMETHODS\nAFHMM\nAFAMAP [12]\nAFHMM+SAC (NTC)\nAFHMM+SAC (TTC)\n\nNDE\n\n0.98\u00b1 0.68\n0.96\u00b1 0.42\n0.64\u00b1 0.37\n0.36\u00b1 0.28\n\nSAE\n\n0.144\u00b1 0.067\n0.083\u00b1 0.004\n0.069\u00b1 0.004\n0.0015\u00b1 0.0089\n\nTIME (SECOND)\n\n206\u00b1114\n325\u00b1177\n356\u00b1262\n260\u00b1108\n\nestimated by using 15-26 days\u2019 data as the training data. In future work, it would be straightforward\nto incorporate the SAC into unsupervised disaggregation approaches [11], by using prior informa-\ntion such as national surveys to estimate \u00b50. The AFHMM, AFAMAP and AFHMM+SAC were\napplied to the aggregated signal to recover the component appliances. For the AFHMM+SAC, two\nkinds of total consumption vectors were used as the vector \u00b50. The \ufb01rst, the national total con-\nsumption (NTC), was the average consumption of each appliance over the training days across all\nhouseholds in the data set. The second, for comparison, was the true total consumption (TTC) for\neach appliance for that day and household. Obviously, TTC is the optimal value for the regularizer\nin AFHMM+SAC, so this gives us an oracle result which indicates the largest possible bene\ufb01t from\nincluding this kind of SAC.\nTable 1 shows the NDE and SAE when the three methods were applied to one day\u2019s data for 100\nhouseholds. We see that AFHMM+SAC outperformed the AFHMM in terms of both NDE and\nSAE. The AFAMAP outperformed the AFHMM in terms of SAE, and otherwise they performed\nsimilar in terms of NDE. Unsurprisingly, the AFHMM+SAC using TTC performs the best among\nthese methods. This shows the difference the constraints made, even though we would never be able\nto obtain the TTC in reality. By looking at the mean values in the Table 1, we also conclude that\nAFHMM+SAC using NTC had improved 33% and 16% over state-of-the-art AFAMAP in terms\nof NDE and SAE, respectively. This was also veri\ufb01ed by computing the paired t-test to show that\nthe mean NDE and SAE obtained by AFHMM+SAC and AFAMAP were different at the 5% sig-\nni\ufb01cance level. To demonstrate the computational ef\ufb01ciency, the computing time is also shown in\nthe Table 1. It indicates that AFHMM, AFAMAP and AFHMM+SAC consumed similar time for\ninference.\n\n7.2.2 Energy Data with Mains Readings\n\nWe studied 9 houses in which the mains as well as the appliances were measured. In this experiment\nwe applied the models directly to the measured mains signal. This scenario is more dif\ufb01cult than that\nof the previous section, because the mains power will also include the demand of some appliances\nwhich are not included in the training data, but it is also the most realistic. The summary of the 9\nhouses is shown in Table 2. The training data were used to estimate the model parameters. The num-\nber of appliances corresponds to the number of the HMMs in the model. The mains measured in the\ntest days are inputted into the models to recover the consumption of those appliances. We computed\nthe NTC by using the training data for the AFHMM+SAC. The NDE and SAE were computed for\nevery house and each method. The results are shown in Figure 2. For each house we also com-\nputed the paired t-test for the NDE and SAE computed by AFAMAP and AFHMM+SAC(NTC),\nwhich shows that the mean errors are different at the 5% signi\ufb01cance level. This indicates that\nacross all the houses AFHMM+SAC has improved over AFAMAP. The overall results for all the\ntest days are shown in Table 3, which shows that AFHMM+SAC has improved over both AFHMM\nand AFAMAP. In terms of computing time, however, AFHMM+SAC is similar to AFHMM and\nAFAMAP. It should be noted that, by looking at Tables 1 and 3, all the three methods require more\ntime for the data with mains than those without mains. This is because the algorithms take more\ntime to converge for realistic data. These results indicate the value of signal aggregate constraints\nfor this problem.\n\n7\n\n\fTable 2: Summary of the 9 houses with mains\n\nHOUSE\nNUMBERS OF TRAINING DAYS\nNUMBERS OF TEST DAYS\nNUMBERS OF APPLIANCES\n\n1\n17\n9\n21\n\n2\n16\n9\n25\n\n3\n15\n10\n24\n\n4\n29\n8\n15\n\n5\n27\n9\n24\n\n6\n28\n9\n22\n\n7\n27\n9\n23\n\n8\n15\n10\n20\n\n9\n30\n10\n25\n\nTable 3: The normalized disaggregation error (NDE), signal aggregate error (SAE), and computing\ntime obtained by AFHMM, AFAMAP, and AFHMM+SAC using mains as the input. Shown are the\nmean\u00b1std values computed from all the test days of the 9 houses. NTC: National total consump-\ntion which was the average consumption of each appliance over the training days; TTC: True total\nconsumption for each appliance for that day and household in the test data.\n\nMETHODS\nAFHMM\nAFAMAP [12]\nAFHMM+SAC (NTC)\nAFHMM+SAC (TTC)\n\nNDE\n\n1.36\u00b1 0.75\n1.05\u00b1 0.29\n0.74\u00b1 0.34\n0.57\u00b1 0.28\n\nSAE\n\n0.069\u00b1 0.039\n0.043\u00b1 0.012\n0.030\u00b1 0.014\n0.001\u00b1 0.0048\n\nTIME (SECOND)\n\n1008\u00b1269\n1327\u00b1453\n1101\u00b1342\n1276\u00b1410\n\nFigure 2: Mean and std plots for NDE and SAE computed by AFHMM, AFAMAP and\nAFHMM+SAC using mains as the input for 9 houses.\n\n8 Conclusions\n\nIn this paper, we have proposed an additive factorial HMM with signal aggregate constraints. The\nregularizer was derived from a prior distribution over the chain states. We also showed that the\nobjective function can be derived in the framework of posterior regularization. We focused on\n\ufb01nding the MAP con\ufb01guration for the posterior distribution with the constraints. Since dynamic\nprogramming is not directly applicable, we pose the optimization problem as a convex quadratic\nprogram and solve the relaxed problem. On simulated data, we showed that the AFHMM+SAC\nis robust to errors in speci\ufb01cation of the constraint value. On real world data from the energy\ndisaggregation problem, we showed that the AFHMM+SAC performed better both than a simple\nAFHMM and than previously published research.\n\nAcknowledgments\n\nThis work is supported by the Engineering and Physical Sciences Research Council (grant number\nEP/K002732/1).\n\n8\n\n12345678900.511.522.533.5HouseErrorNormalized Disaggregation Error AFHMMAFAMAPAFHMM+SAC(NTC)AFHMM+SAC(TTC)12345678900.020.040.060.080.1HouseErrorSignal Aggregate Error AFHMMAFAMAPAFHMM+SAC(NTC)AFHMM+SAC(TTC)\fReferences\n[1] H.M.S. Asif and G. Sanguinetti. Large-scale learning of combinatorial transcriptional dynamics from\n\ngene expression. Bioinformatics, 27(9):1277\u20131283, 2011.\n\n[2] F. Bach and M. I. Jordan. Blind one-microphone speech separation: A spectral learning approach. In\n\nNeural Information Processing Systems, pages 65\u201372, 2005.\n\n[3] P. Comon and C. Jutten, editors. Handbook of Blind Source Separation: Independent Component Analysis\n\nand Applications. Academic Press, First Edition, 2010.\n\n[4] K. Ganchev, J. Grac\u00b8a, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable\n\nmodels. Journal of Machine Learning Research, 11:2001\u20132049, 2010.\n\n[5] Z. Ghahramani and M.I. Jordan. Factorial hidden Markov models. Machine Learning, 27:245\u2013273, 1997.\n[6] M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd,\nand H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Infor-\nmation Sciences, pages 95\u2013110. Springer-Verlag Limited, 2008. http://stanford.edu/\u02dcboyd/\ngraph_dcp.html.\n\n[7] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http:\n\n//cvxr.com/cvx, March 2014.\n\n[8] G.W. Hart. Nonintrusive appliance load monitoring. Proceedings of the IEEE, 80(12):1870 \u20131891, Dec\n\n1992.\n\n[9] T. Hastie, R. Tibshirani, and J. Friedman, editors. The Elements of Statistical Learning, Second Edition.\n\nSpringer, 2009.\n\n[10] M.J. Johnson and A.S. Willsky. Bayesian nonparametric hidden semi-Markov models. Journal of Ma-\n\nchine Learning Research, 14:673\u2013701, 2013.\n\n[11] H. Kim, M. Marwah, M. Arlitt, G. Lyon, and J. Han. Unsupervised disaggregation of low frequency\n\npower measurements. In Proceedings of the SIAM Conference on Data Mining, pages 747\u2013758, 2011.\n\n[12] J. Z. Kolter and T. Jaakkola. Approximate inference in additive factorial HMMs with application to\nenergy disaggregation. In Proceedings of the Fifteenth International Conference on Arti\ufb01cial Intelligence\nand Statistics (AISTATS-12), volume 22, pages 1472\u20131482, 2012.\n\n[13] P. Liang, M.I. Jordan, and D. Klein. Learning from measurements in exponential families. In The 26th\n\nAnnual International Conference on Machine Learning, pages 641\u2013648, 2009.\n\n[14] G. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning of conditional\nrandom \ufb01elds. In Proceedings of Association for Computational Linguistics (ACL-08), pages 870\u2013878,\nColumbus, Ohio, June 2008.\n\n[15] O. Parson. Unsupervised Training Methods for Non-intrusive Appliance Load Monitoring from Smart\n\nMeter Data. PhD thesis, University of Southampton, April 2014.\n\n[16] O. Parson, S. Ghosh, M. Weal, and A. Rogers. Non-intrusive load monitoring using prior models of\ngeneral appliance types. In Proceedings of the Twenty-Sixth Conference on Arti\ufb01cial Intelligence (AAAI-\n12), pages 356\u2013362, July 2012.\n\n[17] S. T. Roweis. One microphone source separation. In Advances in Neural Information Processing, pages\n\n793\u2013799, 2001.\n\n[18] L.K. Saul and M.I. Jordan. Mixed memory Markov chains: Decomposing complex stochastic processes\n\nas mixtures of simpler ones. Machine Learning, 37:75\u201387, 1999.\n\n[19] M.K. Titsias, C. Yau, and C.C. Holmes. Statistical inference in hidden Markov models using k-segment\n\nconstraints. Eprint arXiv:1311.1189, 2013.\n\n[20] M. Yang, H. Wang, H. Chen, and W. Ku. Querying uncertain data with aggregate constraints. In Proceed-\nings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD \u201911, pages\n817\u2013828, New York, NY, USA, 2011.\n\n[21] M. Zhong, N. Goddard, and C. Sutton. Interleaved factorial non-homogeneous hidden Markov models\nfor energy disaggregation. In Neural Information Processing Systems, Workshop on Machine Learning\nfor Sustainability, Lake Tahoe, Nevada, USA, 2013.\n\n[22] M. Ziefman and K. Roth. Nonintrusive appliance load monitoring: review and outlook. IEEE transactions\n\non Consumer Electronics, 57:76\u201384, 2011.\n\n[23] J.-P. Zimmermann, M. Evans, J. Griggs, N. King, L. Harding, P. Roberts, and C. Evans. Household\n\nelectricity survey, 2012.\n\n[24] A. Zoha, A. Gluhak, M.A. Imran, and S. Rajasegarar. Non-intrusive load monitoring approaches for\n\ndisaggregated energy sensing: a survey. Sensors, 12:16838\u201316866, 2012.\n\n9\n\n\f", "award": [], "sourceid": 1888, "authors": [{"given_name": "Mingjun", "family_name": "Zhong", "institution": "University of Edinburgh"}, {"given_name": "Nigel", "family_name": "Goddard", "institution": "University of Edinburgh"}, {"given_name": "Charles", "family_name": "Sutton", "institution": "University of Edinburgh"}]}