{"title": "Bayesian Nonparametric Modeling of Suicide Attempts", "book": "Advances in Neural Information Processing Systems", "page_first": 1853, "page_last": 1861, "abstract": "The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, depression, etc., of a representative sample of the U.S. population. In the present paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows us to integrate out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts.", "full_text": "Bayesian Nonparametric Modeling of Suicide\n\nAttempts\n\nFrancisco J. R. Ruiz\n\nDepartment of Signal Processing\n\nand Communications\n\nUniversity Carlos III in Madrid\nfranrruiz@tsc.uc3m.es\n\nCarlos Blanco\n\nColumbia University College of\n\nPhysicians and Surgeons\n\nCblanco@nyspi.columbia.edu\n\nIsabel Valera\n\nDepartment of Signal Processing\n\nand Communications\n\nUniversity Carlos III in Madrid\nivalera@tsc.uc3m.es\n\nFernando Perez-Cruz\n\nDepartment of Signal Processing\n\nand Communications\n\nUniversity Carlos III in Madrid\nfernando@tsc.uc3m.es\n\nAbstract\n\nThe National Epidemiologic Survey on Alcohol and Related Conditions (NE-\nSARC) database contains a large amount of information, regarding the way of\nlife, medical conditions, etc., of a representative sample of the U.S. population. In\nthis paper, we are interested in seeking the hidden causes behind the suicide at-\ntempts, for which we propose to model the subjects using a nonparametric latent\nmodel based on the Indian Buffet Process (IBP). Due to the nature of the data, we\nneed to adapt the observation model for discrete random variables. We propose\na generative model in which the observations are drawn from a multinomial-logit\ndistribution given the IBP matrix. The implementation of an ef\ufb01cient Gibbs sam-\npler is accomplished using the Laplace approximation, which allows integrating\nout the weighting factors of the multinomial-logit likelihood model. Finally, the\nexperiments over the NESARC database show that our model properly captures\nsome of the hidden causes that model suicide attempts.\n\n1\n\nIntroduction\n\nEvery year, more than 34,000 suicides occur and over 370,000 individuals are treated for self-\nin\ufb02icted injuries in emergency rooms in the U.S., where suicide prevention is one of the top public\nhealth priorities [1]. The current strategies for suicide prevention have focused mainly on both the\ndetection and treatment of mental disorders [13], and on the treatment of the suicidal behaviors\nthemselves [4]. However, despite prevention efforts including improvements in the treatment of de-\npression, the lifetime prevalence of suicide attempts in the U.S. has remained unchanged over the\npast decade [8]. This suggests that there is a need to improve understanding of the risk factors for\nsuicide attempts beyond psychiatric disorders, particularly in non-clinical populations.\nAccording to the National Strategy for Suicide Prevention, an important \ufb01rst step in a public health\napproach to suicide prevention is to identify those at increased risk for suicide attempts [1]. Suicide\nattempts are, by far, the best predictor of completed suicide [12] and are also associated with major\nmorbidity themselves [11]. The estimation of suicide attempt risk is a challenging and complex task,\nwith multiple risk factors linked to increased risk. In the absence of reliable tools for identifying\nthose at risk for suicide attempts, be they clinical or laboratory tests, risk detection still relays mainly\non clinical variables. The adequacy of the current predictive models and screening methods has been\n\n1\n\n\fquestioned [12], and it has been suggested that the methods currently used for research on suicide\nrisk factors and prediction models need revamping [9].\nDatabases that model the behavior of human populations present typically many related questions\nand analyzing each one of them individually, or a small group of them, do not lead to conclusive\nresults. For example, the National Epidemiologic Survey on Alcohol and Related Conditions (NE-\nSARC) samples the U.S. population with nearly 3,000 questions regarding, among others, their\nway of life, their medical conditions, depression and other mental disorders. It contains yes-or-no\nquestions, and some multiple-choice and questions with ordinal answers.\nIn this paper, we propose to model the subjects in this database using a nonparametric latent model\nthat allows us to seek hidden causes and compact in a few features the immense redundant informa-\ntion. Our starting point is the Indian Buffet Process (IBP) [5], because it allows us to infer which\nlatent features in\ufb02uence the observations and how many features there are. We need to adapt the ob-\nservation model for discrete random variables, as the discrete nature of the database does not allow\nus to use the standard Gaussian observation model. There are several options for modeling discrete\noutputs given the hidden latent features, like a Dirichlet distribution or sampling from the features,\nbut we prefer a generative model in which the observations are drawn from a multinomial-logit\ndistribution because it is similar to the standard Gaussian observation model, where the observa-\ntion probability distribution depends on the IBP matrix weighted by some factors. Furthermore,\nthe multinomial-logit model, besides its versatility, allows the implementation of an ef\ufb01cient Gibbs\nsampler where the Laplace approximation [10] is used to integrate out the weighting factors, which\ncan be ef\ufb01ciently computed using the Matrix Inversion Lemma.\nThe IBP model combined with discrete observations has already been tackled in several related\nworks. In [17], the authors propose a model that combines properties from both the hierarchical\nDirichlet process (HDP) and the IBP, called IBP compound Dirichlet (ICD) process. They apply the\nICD to focused topic modeling, where the instances are documents and the observations are words\nfrom a \ufb01nite vocabulary, and focus on decoupling the prevalence of a topic in a document and its\nprevalence in all documents. Despite the discrete nature of the observations under this model, these\nassumptions are not appropriate for categorical observations such as the set of possible responses to\nthe questions in the NESARC database. Titsias [14] introduced the in\ufb01nite gamma-Poisson process\nas a prior probability distribution over non-negative integer valued matrices with a potentially in\ufb01nite\nnumber of columns, and he applies it to topic modeling of images. In this model, each (discrete)\ncomponent in the observation vector of an instance depends only on one of the active latent features\nof that object, randomly drawn from a multinomial distribution. Therefore, different components\nof the observation vector might be equally distributed. Our model is more \ufb02exible in the sense that\nit allows different probability distributions for every component in the observation vector, which is\naccomplished by weighting differently the latent variables.\n\n2 The Indian Buffet Process\n\nIn latent feature modeling, each object can be represented by a vector of latent features, and the\nobservations are generated from a distribution determined by those latent feature values. Typically,\nwe have access to the set of observations and the main goal of these models is to \ufb01nd out the latent\nvariables that represent the data. The most common nonparametric tool for latent feature modeling\nis the Indian Buffet Process (IBP).\nThe IBP places a prior distribution over binary matrices where the number of columns (features) K\nis not bounded, i.e., K \u2192 \u221e. However, given a \ufb01nite number of data points N, it ensures that the\nnumber of non-zero columns K+ is \ufb01nite with probability one. Let Z be a random N \u00d7 K binary\nmatrix distributed following an IBP, i.e., Z \u223c IBP(\u03b1), where \u03b1 is the concentration parameter of\nthe process. The nth row of Z, denoted by zn\u00b7, represents the vector of latent features of the nth\ndata point, and every entry nk is denoted by znk. Note that each element znk \u2208 {0, 1} indicates\nwhether the kth feature contributes to the nth data point.\nGiven a binary latent feature matrix Z, we assume that the N \u00d7 D observation matrix X, where the\nnth row contains a D-dimensional observation vector xn\u00b7, is distributed according to a probability\ndistribution p(X|Z). Additionally, x\u00b7d stands for the dth column of X, and each element of the\n\n2\n\n\fmatrix is denoted by xnd. For instance, in the standard observation model described in [5], p(X|Z)\nis a Gaussian probability density function.\nMCMC (Markov Chain Monte Carlo) methods have been broadly applied to infer the latent structure\nZ from a given observation matrix X (see, e.g., [5, 17, 15, 14]). In particular, we focus on the use of\nGibbs sampling for posterior inference over the latent variables. The algorithm iteratively samples\nthe value of each element znk given the remaining variables, i.e., it samples from\n\np(znk = 1|X, Z\u00acnk) \u221d p(X|Z)p(znk = 1|Z\u00acnk),\n\n(1)\nwhere Z\u00acnk denotes all the entries of Z other than znk. The distribution p(znk = 1|Z\u00acnk) can be\nreadily derived from the exchangeable IBP and can be written as p(znk = 1|Z\u00acnk) = m\u2212n,k/N,\ni6=n zik.\n\nwhere m\u2212n,k is the number of data points with feature k, not including n, i.e., m\u2212n,k =P\n\n3 Observation model\nLet us consider that the observations are discrete, i.e., each element xnd \u2208 {1, . . . , Rd}, where this\n\ufb01nite set contains the indexes to all the possible values of xnd. For simplicity and without loss of\ngenerality, we consider that Rd = R, but the following results can be readily extended to a different\ncardinality per input dimension, as well as mixing continuous variables with discrete variables, since\ngiven the latent matrix Z the columns of X are assumed to be independent.\nWe introduce matrices Bd of size K \u00d7 R to model the probability distribution over X, such that\nBd links the hidden latent variables with the dth column of the observation matrix X. We assume\nthat the probability of xnd taking value r (r = 1, . . . , R), denoted by \u03c0r\nnd, is given by the multiple-\nlogistic function, i.e.,\n\nnd = p(xnd = r|zn\u00b7, Bd) =\n\u03c0r\n\nexp (zn\u00b7bd\u00b7r)\n\n,\n\n(2)\n\nRX\n\nr0=1\n\nexp (zn\u00b7bd\u00b7r0)\n\nNY\n\nDY\n\nNY\n\nDY\n\nBI.\n\nwhere bd\u00b7r denotes the rth column of Bd. Note that the matrices Bd are used to weight differently\nthe contribution of every latent feature for every component d, similarly as in the standard Gaussian\nobservation model in [5]. We assume that the mixing vectors bd\u00b7r are Gaussian distributed with zero\nmean and covariance matrix \u03a3b = \u03c32\nThe choice of the observation model in Eq. 2, which combines the multiple-logistic function with\nGaussian parameters, is based on the fact that it induces dependencies among the probabilities \u03c0r\nnd\nthat cannot be captured with other distributions, such as the Dirichlet distribution [2]. Furthermore,\nthis multinomial-logistic normal distribution has been widely used to de\ufb01ne probability distributions\nover discrete random variables (see, e.g., [16, 2]).\nWe consider that elements xnd are independent given the latent feature matrix Z and the D matrices\nBd. Then, the likelihood for any matrix X can be expressed as\n\np(X|Z, B1, . . . , BD) =\n\np(xnd|zn\u00b7, Bd) =\n\n\u03c0xnd\nnd .\n\n(3)\n\n3.1 Laplace approximation for inference\n\nn=1\n\nd=1\n\nn=1\n\nd=1\n\nIn Section 2, the (heuristic) Gibbs sampling algorithm for the posterior inference over the latent\nvariables of the IBP has been reviewed and it is detailed in [5]. To sample from Eq. 1, we need to\nintegrate out Bd in (3), as sequentially sampling from the posterior distribution of Bd is intractable,\nfor which an approximation is required. We rely on the Laplace approximation to integrate out the\nparameters Bd for simplicity and ease of implementation. We \ufb01rst consider the \ufb01nite form of the\nproposed model, where K is bounded.\nRecall that our model assumes independence among the observations given the hidden latent vari-\nables. Then, the posterior p(B1, . . . , BD|X, Z) factorizes as\np(Bd|x\u00b7d, Z) =\n\np(B1, . . . , BD|X, Z) =\n\np(x\u00b7d|Bd, Z)p(Bd)\n\nDY\n\nDY\n\n(4)\n\n.\n\np(x\u00b7d|Z)\n\nd=1\n\nd=1\n\n3\n\n\fHence, we only need to deal with each term p(Bd|x\u00b7d, Z) individually. Although the prior p(Bd)\nis Gaussian, due to the non-conjugacy with the likelihood term, the computation of the posterior\np(Bd|x\u00b7d, Z) turns out to be intractable. Following a similar procedure as in Gaussian processes for\nmulticlass classi\ufb01cation [16], we approximate the posterior p(Bd|x\u00b7d, Z) as a Gaussian distribution\nusing Laplace\u2019s method. In order to obtain the parameters of the Gaussian distribution, we de\ufb01ne\n\u03c8(Bd) as the un-normalized log-posterior of p(Bd|x\u00b7d, Z), i.e.,\n\u03c8(Bd) = log p(x\u00b7d|Bd, Z) + log p(Bd)\n\nn\nMd>\n\nBdo \u2212 NX\n\n= trace\n\n RX\n\nn\nBd>\n\nBdo \u2212 RK\n\nlog\n\nexp(zn\u00b7bd\u00b7r0)\n\nn=1\n\nr0=1\n\n\u2212 1\n2\u03c32\nB\n\ntrace\n\n!\n\nPN\nn=1 \u03b4(xnd = r)znk, where \u03b4(\u00b7) is the Kronecker delta function.\n\nwhere (Md)kr counts the number of data points for which xnd = r and znk = 1, namely, (Md)kr =\n\nAs we prove below, the function \u03c8(Bd) is a strictly concave function of Bd and therefore it has a\nunique maximum, which is reached at Bd\nMAP, denoted by the subscript \u2018MAP\u2019 because it coincides\nwith the mean value of the Gaussian distribution in the Laplace\u2019s method (MAP stands for maximum\na posteriori). We apply Newton\u2019s method to compute this maximum.\n\nBy de\ufb01ning (\u03c1d)kr =PN\n\nn=1 znk\u03c0r\n\n2\n\nlog(2\u03c0\u03c32\n\nB),\n\n(5)\n\nnd, the gradient of \u03c8(Bd) can be derived as\n\u2207\u03c8 = Md \u2212 \u03c1d \u2212 1\n\u03c32\nB\n\nBd.\n\n(6)\n\nTo compute the Hessian, it is easier to de\ufb01ne the gradient \u2207\u03c8 as a vector, instead of a matrix, and\nhence we stack the columns of Bd into \u03b2d, i.e., for avid Matlab users, \u03b2d = Bd(:). The Hessian\nmatrix can now be readily computed taking the derivatives of the gradient, yielding\n\nIRK + \u2207\u2207 log p(x\u00b7d|\u03b2d, Z)\n\n\u2207\u2207\u03c8 = \u2212 1\n\u03c32\nB\n= \u2212 1\n\u03c32\nB\n\nIRK \u2212 NX\n\nn=1\n\nnd, \u03c02\n\nnd,\n\n. . . , \u03c0R\nnd\n\nwhere \u03c0nd =(cid:2) \u03c01\n\n(cid:0)diag(\u03c0nd) \u2212 (\u03c0nd)>\u03c0nd\n(cid:3), and diag(\u03c0nd) is a diagonal matrix with the values of\n\n(cid:1) \u2297 (z>\n\nn\u00b7zn\u00b7),\n\n(7)\n\nthe vector \u03c0nd as its diagonal elements. The posterior p(\u03b2d|x\u00b7d, Z) can be approximated as\n\np(\u03b2d|x\u00b7d, Z) \u2248 q(\u03b2d|x\u00b7d, Z) = N (\u03b2d|\u03b2d\n\nMAP, (\u2212\u2207\u2207\u03c8)|\u03b2d\n\nMAP\n\n),\n\n(8)\n\nMAP contains all the columns of Bd\n\nwhere \u03b2d\nSince p(x\u00b7d|\u03b2d, Z) is a log-concave function of \u03b2d (see [3, p. 87]), \u2212\u2207\u2207\u03c8 is a positive de\ufb01nite\nmatrix, which guarantees that the maximum of \u03c8(\u03b2d) is unique. Once the maximum Bd\nMAP has\nbeen determined, the marginal likelihood p(x\u00b7d|Z) can be readily approximated by\n\nMAP stacked into a vector.\n\nlog p(x\u00b7d|Z) \u2248 log q(x\u00b7d|Z) = \u2212 1\n2\u03c32\nB\n\ntrace(cid:8)(Bd\n(cid:0)diag(b\u03c0nd) \u2212 (b\u03c0nd)>b\u03c0nd\n\u2212 1\n2\nwhereb\u03c0nd is the vector \u03c0nd evaluated at Bd = Bd\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)IRK + \u03c32\n\nNX\n\nMAP.\n\nlog\n\nn=1\n\nB\n\nMAP)>Bd\n\n(cid:1) \u2297 (z>\n\nMAP\n\nn\u00b7zn\u00b7)\n\n(cid:9)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + log p(x\u00b7d|Bd\n\nMAP, Z),\n\n(9)\n\nSimilarly as in [5], it is straightforward to prove that the limit of Eq. 9 is well-de\ufb01ned if Z has an un-\nbounded number of columns, i.e., as K \u2192 \u221e. The resulting expression for the marginal likelihood\np(x\u00b7d|Z) can be readily obtained from Eq. 9 by replacing K by K+, Z by the submatrix containing\nonly the non-zero columns of Z, and Bd\nMAP by the submatrix containing the K+ corresponding\nrows. Through the rest of the paper, let us denote with Z the matrix that contains only the K+\nnon-zero columns of the full IBP matrix.\n\n4\n\n\f3.2 Speeding up the matrix inversion\n\nThe inverse of the Hessian matrix, as well as its determinant in (9), can be ef\ufb01ciently carried out if\nwe rearrange the inverse of \u2207\u2207\u03c8 as follows\n\n!\u22121\n\n \nD \u2212 NX\n\nn=1\n\n(\u2212\u2207\u2207\u03c8)\u22121 =\n\nvnv>\n\nn\n\n,\n\n(10)\n\nwhere vn = (\u03c0nd)> \u2297 z>\n\nn\u00b7 and D is a block-diagonal matrix, in which each diagonal submatrix is\n\nDr =\n\n1\n\u03c32\nB\n\nIK+ + Z> diag (\u03c0r\u00b7d) Z,\n\n(11)\n\n1d, . . . , \u03c0r\n\nN d ]>\n\n. Since vnv>\n\nwith \u03c0r\u00b7d = [ \u03c0r\nn is a rank-one matrix, we can apply the Woodbury\nidentity [18] N times to invert the matrix \u2212\u2207\u2207\u03c8, similar to the RLS (Recursive Least Squares)\nupdates [7]. At each iteration n = 1, . . . , N, we compute\n= (D(n\u22121))\u22121 +\n\n(D(n\u22121))\u22121vnv>\n\nn (D(n\u22121))\u22121\n\n(D(n))\u22121 =\n\n(cid:17)\u22121\n\nD(n\u22121) \u2212 vnv>\n\nn\n\n(cid:16)\n\n.\n\n(12)\n\n1 \u2212 v>\n\nn (D(n\u22121))\u22121vn\n\nFor the \ufb01rst iteration, we de\ufb01ne D(0) as the block-diagonal matrix D, whose inverse matrix involves\ncomputing the R matrix inversions of size K+ \u00d7 K+ of the matrices in (11), which can be ef\ufb01-\nciently solved applying the Matrix Inversion Lemma. After N iterations of (12), it turns out that\n(\u2212\u2207\u2207\u03c8)\u22121 = (D(N ))\u22121.\nFor the determinant in (9), similar recursions can be applied using the Matrix Determinant Lemma\n\n[6], which states that |D + vu>| = (1 + v>Du)|D|, and |D(0)| =QR\n\nr=1 |Dr|.\n\n4 Experiments\n\n4.1\n\nInference over synthetic images\n\nWe generate a simple example inspired by the experiment in [5, p. 1205] to show that the proposed\nmodel works as it should. We de\ufb01ne four base black-and-white images that can be present or absent\nwith probability 0.5 independently of each other (Figure 1a), which are combined to create a binary\ncomposite image. We also multiply each pixel independently with equiprobable binary noise, hence\neach white pixel in the composite image can be turned black 50% of the times, while black pixels\nalways remain black. Several examples can be found in Figure 1c. We generate 200 examples to\nlearn the IBP model. The Gibbs sampler has been initialized with K+ = 2, setting each znk = 1\nwith probability 1/2, and the hyperparameters have been set to \u03b1 = 0.5 and \u03c32\nAfter 200 iterations, the Gibbs sampler returns four latent features. Each of the four features recovers\none of the base images with a different ordering, which is inconsequential. In Figure 1b, we have\nplotted the posterior probability for each pixel being white, when only one of the components is\nactive. As expected, the black pixels are known to be black (almost zero probability of being white)\nand the white pixels have about a 50/50 chance of being black or white, due to the multiplicative\nnoise. The Gibbs sampler has used as many as nine hidden features, but after iteration 60, the \ufb01rst\nfour features represent the base images and the others just lock on a noise pattern, which eventually\nfades away.\n\nB = 1.\n\n4.2 National Epidemiologic Survey on Alcohol and Related Conditions (NESARC)\n\nThe NESARC was designed to determine the magnitude of alcohol use disorders and their associated\ndisabilities. Two waves of interviews have been \ufb01elded for this survey (\ufb01rst wave in 2001-2002 and\nsecond wave in 2004-2005). For the following experimental results, we only use the data from the\n\ufb01rst wave, for which 43,093 people were selected to represent the U.S. population 18 years of age\nand older. Public use data are currently available for this wave of data collection.\nThrough 2,991 entries, the NESARC collects data on the background of participants, alcohol and\nother drug consumption and abuse, medicine use, medical treatment, mental disorders, phobias,\n\n5\n\n\f(a)\n\n(c)\n\n(e)\n\n(b)\n\n(d)\n\n(f)\n\nFigure 1: Experimental results of the in\ufb01nite binary multinomial-logistic model over the image data\nset. (a) The four base images used to generate the 200 observations. (b) Probability of each pixel\nbeing white, when a single feature is active (ordered to match the images on the left), computed\nusing Bd\nMAP. (c) Four data points generated as described in the text. The numbers above each \ufb01gure\nindicate which features are present in that image. (d) Probabilities of each pixel being white after\n200 iterations of the Gibbs sampler inferred for the four data points on (c). The numbers above each\n\ufb01gure show the inferred value of zn\u00b7 for these data points. (e) The number of latent features K+ and\n(f) the approximate log of p(X|Z) over the 200 iterations of the Gibbs sampler.\n\nfamily history, etc. The survey includes a question about having attempted suicide as well as other\nrelated questions such as \u2018felt like wanted to die\u2019 and \u2018thought a lot about own death\u2019. In the present\npaper, we use the IBP with discrete observations for a preliminary study in seeking the latent causes\nwhich lead to committing suicide. Most of the questions in the survey (over 2,500) are yes-or-no\nquestions, which have four possible outcomes: \u2018blank\u2019 (B), \u2018unknown\u2019 (U), \u2018yes\u2019 (Y) and \u2018no\u2019 (N).\nIf a question is left blank the question was not asked1. If a question is said to be unknown either it\nwas not answered or was unknown to the respondent.\nIn our ongoing study, we want to \ufb01nd a latent model that describes this database and can be used\nto infer patterns of behavior and, speci\ufb01cally, be able to predict suicide. In this paper, we build\nan unsupervised model with the 20 variables that present the highest mutual information with the\nsuicide attempt question, which are shown in Table 1 together with their code in the questionnaire.\nWe run the Gibbs sampler over 500 randomly chosen subjects out of the 13,670 that have answered\naf\ufb01rmatively to having had a period of low mood. In this study, we use another 9,500 as test cases\nand have left the remaining samples for further validation. We have initialized the sampler with an\nactive feature, i.e., K+ = 1, and have set znk = 1 randomly with probability 0.5, and \ufb01xing \u03b1 = 1\nand \u03c32\nIn Figure 2, we have plotted the posterior probability for each question when a single feature is\nactive. In these plots, white means 0 and black 1, and each row sums up to one. Feature 1 is active\nfor modeling the \u2018blank\u2019 and \u2018no\u2019 answers and, fundamentally, those who were not asked Questions\n8 and 10. Feature 2 models the \u2018yes\u2019 and \u2018no\u2019 answers and favors af\ufb01rmative responses to Questions\n1, 2, 5, 9, 11, 12, 17 and 18, which indicates depression. Feature 3 models blank answers for most\nof the questions and negative responses to 1, 2, 5, 8 and 10, which are questions related to suicide.\nFeature 4 models the af\ufb01rmative answers to 1, 2, 5, 9 and 11 and also have higher probability for\nunknowns in Questions 3, 4, 6 and 7. Feature 5 models the \u2018yes\u2019 answer to Questions 3, 4, 6, 7, 8,\n\nB = 1. After 200 iterations, we obtain seven latent features.\n\n1In a questionnaire of this size some questions are not asked when a previous question was answered in a\npredetermined way to reduce the burden of taking the survey. For example, if a person has never had a period\nof low mood, the attempt suicide question is not asked.\n\n6\n\n02040608010012014016018020002468IterationNumber of Features (K+)020406080100120140160180200\u22124400\u22124200\u22124000\u22123800\u22123600Iterationlog p(X|Z)\f10, 17 and 18, being ambivalent in Questions 1 and 2. Feature 6 favors \u2018blank\u2019 and \u2018no\u2019 answers in\nmost questions. Feature 7 models answering af\ufb01rmatively to Questions 15, 16, 19 and 20, which are\nrelated to alcohol abuse.\nWe show the percentage of respondents that answered positively to the suicide attempt questions in\nTable 2, independently for the 500 samples that were used to learn the IBP and the 9,500 hold-out\nsamples, together with the total number of respondents. A dash indicates that the feature can be\nactive or inactive. Table 2 is divided in three parts. The \ufb01rst part deals with each individual feature\nand the other two study some cases of interest. Throughout the database, the prevalence of suicide\nattempt is 7.83%. As expected, Features 2, 4, 5 and 7 favor suicide attempt risk, although Feature 5\nonly mildly, and Features 1, 3 and 6 decrease the probability of attempting suicide. From the above\ndescription of each feature, it is clear that having Features 4 or 7 active should increase the risk of\nattempting suicide, while having Features 3 and 1 active should cause the opposite effect.\nFeatures 3 and 4 present the lowest and the highest risk of suicide, respectively, and they are studied\ntogether in the second part of Table 2, in which we can see that having Feature 3 and not having\nFeature 4 reduces this risk by an order of magnitude, and that combination is present in 70% of\nthe population. The other combinations favor an increased rate of suicide attempts that goes from\ndoubling (\u201811\u2019) to quadrupling (\u201800\u2019), to a ten-fold increase (\u201801\u2019), and the percentages of population\nwith these features are, respectively, 21%, 6% and 3%.\nIn the \ufb01nal part of Table 2, we show combinations of features that signi\ufb01cantly increase the suicide\nattempt rate for a reduced percentage of the population, as well as combinations of features that\nsigni\ufb01cantly decrease the suicide attempt rate for a large chunk of the population. These results are\ninteresting as they can be used to discard signi\ufb01cant portions of the population in suicide attempt\nstudies and focus on the groups that present much higher risk. Hence, our IBP with discrete obser-\nvations is being able to obtain features that describe the hidden structure of the NESARC database\nand makes it possible to pin-point the people that have a higher risk of attempting suicide.\n\n#\n01\n02\n03\n04\n05\n06\n07\n08\n09\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20\n\nSource Code Description\nS4AQ4A17\nS4AQ4A18\nS4AQ17A\nS4AQ17B Went to emergency room for help because of depression\nS4AQ4A19\n\nThought about committing suicide\nFelt like wanted to die\nStayed overnight in hospital because of depression\n\nThought a lot about own death\nWent to counselor/therapist/doctor/other person for help to improve mood\nDoctor prescribed medicine/drug to improve mood/make you feel better\nStayed overnight in hospital because of dysthymia\nFelt worthless most of the time for 2+ weeks\n\nS4AQ16\nS4AQ18\nS4CQ15A\nS4AQ4A12\nS4CQ15B Went to emergency room for help because of dysthymia\nS4AQ52\nS4AQ55\nS4AQ21C\nS4AQ21A\nS4AQ20A\nS4AQ20C\nS4AQ56\nS4AQ54\nS4AQ11\nS4AQ15IR\n\nHad arguments/friction with family, friends, people at work, or anyone else\nSpent more time than usual alone because didn\u2019t want to be around people\nUsed medicine/drug on own to improve low mood prior to last 12 months\nEver used medicine/drug on own to improve low mood/make self feel better\nEver drank alcohol to improve low mood/make self feel better\nDrank alcohol to improve mood prior to last 12 months\nCouldn\u2019t do things usually did/wanted to do\nHad trouble doing things supposed to do -like working, doing schoolwork, etc.\nAny episode began after drinking heavily/more than usual\nOnly/any episode prior to last 12 months began after drinking/drug use\n\nTable 1: Enumeration of the 20 selected questions in the experiments, sorted in decreasing order\naccording to their mutual information with the \u2018attempted suicide\u2019 question.\n\n5 Conclusions\n\nIn this paper, we have proposed a new model that combines the IBP with discrete observations using\nthe multinomial-logit distribution. We have used the Laplace approximation to integrate out the\nweighting factors, which allows us to ef\ufb01ciently run the Gibbs sampler. We have applied our model\nto the NESARC database to \ufb01nd out the hidden features that characterize the suicide attempt risk. We\n\n7\n\n\fSuicide attempt probability\n\nHidden features\n\n1\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n0\n1\n-\n1\n\n-\n1\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n\n-\n-\n1\n-\n-\n-\n-\n0\n0\n1\n1\n0\n0\n1\n1\n1\n\n-\n-\n-\n1\n-\n-\n-\n0\n1\n0\n1\n1\n1\n0\n0\n0\n\n-\n-\n-\n-\n1\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n-\n\n-\n-\n-\n-\n-\n1\n-\n-\n-\n-\n-\n-\n-\n1\n-\n-\n\n-\n-\n-\n-\n-\n-\n1\n-\n-\n-\n-\n1\n-\n0\n0\n-\n\nTrain\n6.74%\n10.56%\n3.72%\n25.23%\n8.64%\n6.90%\n14.29%\n30.77%\n82.35%\n0.83%\n14.89%\n100.00%\n80.00%\n0.00%\n0.33%\n0.32%\n\nHold-out\n5.55%\n11.16%\n4.60%\n22.25%\n9.69%\n7.18%\n14.18%\n28.55%\n61.95%\n0.87%\n16.52%\n69.41%\n66.10%\n0.25%\n0.63%\n0.41%\n\nNumber of cases\nTrain Hold-out\n430\n322\n457\n111\n301\n464\n91\n26\n17\n363\n94\n4\n5\n\n8072\n6083\n8632\n2355\n5782\n8928\n1664\n571\n297\n6574\n2058\n\n85\n118\n4739\n5543\n5807\n\n252\n299\n317\n\nTable 2: Probabilities of attempting suicide for different values of the latent feature vector, together\nwith the number of subjects possessing those values. The symbol \u2018-\u2019 denotes either 0 or 1. The \u2018train\nensemble\u2019 columns contain the results for the 500 data points used to obtain the model, whereas the\n\u2018hold-out ensemble\u2019 columns contain the results for the remaining subjects.\n\nFigure 2: Probability of answering \u2018blank\u2019 (B), \u2018unknown\u2019 (U), \u2018yes\u2019 (Y) and \u2018no\u2019 (N) to each of\nthe 20 selected questions, sorted as in Table 1, after 200 iterations of the Gibbs sampler. These\nprobabilities have been obtained with the posterior mean weights Bd\nMAP , when only one of the\nseven latent features (sorted from left to right to match the order in Table 2) is active.\n\nhave analyzed how each of the seven inferred features contributes to the suicide attempt probability.\nWe are developing a variational inference algorithm to be able to extend these remarkable results for\nlarger fractions (subjects and questions) of the NESARC database.\n\nAcknowledgments\n\nFrancisco J. R. Ruiz is supported by an FPU fellowship from the Spanish Ministry of Education,\nIsabel Valera is supported by the Plan Regional-Programas I+D of Comunidad de Madrid (AGES-\nCM S2010/BMD-2422), and Fernando P\u00b4erez-Cruz has been partially supported by a Salvador de\nMadariaga grant. The authors also acknowledge the support of Ministerio de Ciencia e Innovaci\u00b4on of\nSpain (project DEIPRO TEC2009-14504-C02-00 and program Consolider-Ingenio 2010 CSD2008-\n00010 COMONSENS).\n\n8\n\n\fReferences\n[1] Summary of national strategy for suicide prevention: Goals and objectives for action, 2007. Available at:\n\nhttp://www.sprc.org/library/nssp.pdf.\n\n[2] D. M. Blei and J. D. Lafferty. A correlated topic model of Science. Annals of Applied Statistics, 1(1):17\u2013\n\n35, August 2007.\n\n[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, March 2004.\n[4] G. K. Brown, T. Ten Have, G. R. Henriques, S.X. Xie, J.E. Hollander, and A. T. Beck. Cognitive therapy\nfor the prevention of suicide attempts: a randomized controlled trial. Journal of the American Medical\nAssociation, 294(5):563\u2013570, 2005.\n\n[5] T. L. Grif\ufb01ths and Z. Ghahramani. The Indian Buffet Process: An introduction and review. Journal of\n\nMachine Learning Research, 12:1185\u20131224, 2011.\n\n[6] D. A. Harville. Matrix Algebra From a Statistician\u2019s Perspective. Springer-Verlag, 1997.\n[7] S. Haykin. Adaptive Filter Theory. Prentice Hall, 2002.\n[8] R. C. Kessler, P. Berglund, G. Borges, M. Nock, and P. S. Wang. Trends in suicide ideation, plans,\ngestures, and attempts in the united states, 1990-1992 to 2001-2003. Journal of the American Medical\nAssociation, 293(20):2487\u20132495, 2005.\n\n[9] K. Krysinska and G. Martin. The struggle to prevent and evaluate: application of population attributable\nrisk and preventive fraction to suicide prevention research. Suicide and Life-Threatening Behavior,\n39(5):548\u2013557, 2009.\n\n[10] D. J. C. MacKay. Information Theory, Inference & Learning Algorithms. Cambridge University Press,\n\nNew York, NY, USA, 2002.\n\n[11] J. J. Mann, A. Apter, J. Bertolote, A. Beautrais, D. Currier, A. Haas, U. Hegerl, J. Lonnqvist, K. Malone,\nA. Marusic, L. Mehlum, G. Patton, M. Phillips, W. Rutz, Z. Rihmer, A. Schmidtke, D. Shaffer, M. Sil-\nverman, Y. Takahashi, A. Varnik, D. Wasserman, P. Yip, and H. Hendin. Suicide prevention strategies: a\nsystematic review. The Journal of the American Medical Association, 294(16):2064\u20132074, 2005.\n\n[12] M. A. Oquendo, E. B. Garc\u00b4\u0131a, J. J. Mann, and J. Giner. Issues for DSM-V: suicidal behavior as a separate\ndiagnosis on a separate axis. The American Journal of Psychiatry, 165(11):1383\u20131384, November 2008.\n[13] K. Szanto, S. Kalmar, H. Hendin, Z. Rihmer, and J. J. Mann. A suicide prevention program in a region\n\nwith a very high suicide rate. Archives of General Psychiatry, 64(8):914\u2013920, 2007.\n\n[14] M. Titsias. The in\ufb01nite gamma-Poisson feature model. Advances in Neural Information Processing\n\nSystems (NIPS), 19, 2007.\n\n[15] J. Van Gael, Y. W. Teh, and Z. Ghahramani. The in\ufb01nite factorial hidden Markov model. In Advances in\n\nNeural Information Processing Systems (NIPS), volume 21, 2009.\n\n[16] C. K. I. Williams and D. Barber. Bayesian classi\ufb01cation with Gaussian Processes. IEEE Transactions on\n\nPattern Analysis and Machine Intelligence, 20:1342\u20131351, 1998.\n\n[17] S. Williamson, C. Wang, K. A. Heller, and D. M. Blei. The IBP Compound Dirichlet Process and its\n\napplication to focused topic modeling. 11:1151\u20131158, 2010.\n\n[18] M. A. Woodbury. The stability of out-input matrices. Mathematical Reviews, 1949.\n\n9\n\n\f", "award": [], "sourceid": 921, "authors": [{"given_name": "Francisco", "family_name": "Ruiz", "institution": null}, {"given_name": "Isabel", "family_name": "Valera", "institution": null}, {"given_name": "Carlos", "family_name": "Blanco", "institution": null}, {"given_name": "Fernando", "family_name": "P\u00e9rez-Cruz", "institution": null}]}