{"title": "Identifying Alzheimer's Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 1431, "page_last": 1439, "abstract": "Diagnosis of Alzheimer's disease (AD) at the early stage of the disease development is of great clinical importance. Current clinical assessment that relies primarily on cognitive measures proves low sensitivity and specificity. The fast growing neuroimaging techniques hold great promise. Research so far has focused on single neuroimaging modalities. However, as different modalities provide complementary measures for the same disease pathology, fusion of multi-modality data may increase the statistical power in identification of disease-related brain regions. This is especially true for early AD, at which stage the disease-related regions are most likely to be weak-effect regions that are difficult to be detected from a single modality alone. We propose a sparse composite linear discriminant analysis model (SCLDA) for identification of disease-related brain regions of early AD from multi-modality data. SCLDA uses a novel formulation that decomposes each LDA parameter into a product of a common parameter shared by all the modalities and a parameter specific to each modality, which enables joint analysis of all the modalities and borrowing strength from one another. We prove that this formulation is equivalent to a penalized likelihood with non-convex regularization, which can be solved by the DC ((difference of convex functions) programming. We show that in using the DC programming, the property of the non-convex regularization in terms of preserving weak-effect features can be nicely revealed. We perform extensive simulations to show that SCLDA outperforms existing competing algorithms on feature selection, especially on the ability for identifying weak-effect features. We apply SCLDA to the Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) images of 49 AD patients and 67 normal controls (NC). Our study identifies disease-related brain regions consistent with findings in the AD literature.", "full_text": " \n\n \n\nIdentifying Alzheimer\u2019s Disease-Related Brain Regions \nfrom Multi-Modality Neuroimaging Data using Sparse \n\nComposite Linear Discrimination Analysis \n\nShuai Huang1, Jing Li1, Jieping Ye2,3, Kewei Chen4, Teresa Wu1, Adam Fleisher4, Eric \n\nReiman4 \n\n1Industrial Engineering, 2Computer Science and Engineering, and 3Center for Evolutionary \nMedicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, USA \n\n4Banner Alzheimer\u2019s Institute and Banner PET Center, Banner Good Samaritan Medical \n\n{shuang31, jing.li.8, jieping.ye, teresa.wu}@asu.edu \n\n{kewei.chen, adam.fleisher, eric.reiman}@bannerhealth.com \n\nCenter, Phoenix, USA \n\nAbstract \n\nDiagnosis of Alzheimer\u2019s disease (AD) at the early stage of the disease development is of great \nclinical importance. Current clinical assessment that relies primarily on cognitive measures proves \nlow sensitivity and specificity. The fast growing neuroimaging techniques hold great promise. \nResearch so far has focused on single neuroimaging modality. However, as different modalities \nprovide complementary measures for the same disease pathology, fusion of multi-modality data \nmay increase the statistical power in identification of disease-related brain regions. This is \nespecially true for early AD, at which stage the disease-related regions are most likely to be weak-\neffect regions that are difficult to be detected from a single modality alone. We propose a sparse \ncomposite linear discriminant analysis model (SCLDA) for identification of disease-related brain \nregions of early AD from multi-modality data. SCLDA uses a novel formulation that decomposes \neach LDA parameter into a product of a common parameter shared by all the modalities and a \nparameter specific to each modality, which enables joint analysis of all the modalities and \nborrowing strength from one another. We prove that this formulation is equivalent to a penalized \nlikelihood with non-convex regularization, which can be solved by the DC (difference of convex \nfunctions) programming. We show that in using the DC programming, the property of the non-\nconvex regularization in terms of preserving weak-effect features can be nicely revealed. We \nperform extensive simulations to show that SCLDA outperforms existing competing algorithms on \nfeature selection, especially on the ability for identifying weak-effect features. We apply SCLDA \nto the Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) images of \n49 AD patients and 67 normal controls (NC). Our study identifies disease-related brain regions \nconsistent with findings in the AD literature. \n \n1 \nAlzheimer\u2019s disease (AD) is a fatal, neurodegenerative disorder that currently affects over five \nmillion people in the U.S. It leads to substantial, progressive neuron damage that is irreversible, \nwhich eventually causes death. Early diagnosis of AD is of great clinical importance, because \ndisease-modifying therapies given to patients at the early stage of their disease development will \nhave a much better effect in slowing down the disease progression and helping preserve some \ncognitive functions of the brain. However, current clinical assessment that majorly relies on \ncognitive measures proves low sensitivity and specificity in early diagnosis of AD. This is because \nthese cognitive measures are vulnerable to the confounding effect from some non-AD related \nfactors such as patients\u2019 mood, and presence of other illnesses or major life events [1]. The \nconfounding effect is especially severe in the diagnosis of early AD, at which time cognitive \n\nIntroduction \n\n \n\n1 \n\n\fimpairment is not yet apparent. On the other hand, fast growing neuroimaging techniques, such as \nMagnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET), provide great \nopportunities for improving early diagnosis of AD, due to their ability for overcoming the \nlimitations of conventional cognitive measures. There are two major categories of neuroimaging \ntechniques, i.e., functional and structure neuroimaging. MRI is a typical structural neuroimaging \ntechnique, which allows for visualization of brain anatomy. PET is a typical functional \nneuroimaging technique, which measures the cerebral metabolic rate for glucose. Both techniques \nhave been extensively applied to AD studies. For example, studies based on MRI have \nconsistently revealed brain atrophy that involves the hippocampus and entorhinal cortex [2-6]; \nstudies based on PET have revealed functional abnormality that involves the posterior temporal \nand parietal association cortices [8-10], posterior cingulate, precuneus, and medial temporal \ncortices [11-14]. \nThere is overlap between the disease-related brain regions detected by MRI and those by PET, \nsuch as regions in the hippocampus area and the mesia temporal lobe [15-17]. This is not \nsurprising since MRI and PET are two complementary measures for the same disease pathology, \ni.e., it starts mainly in the hippocampus and entorhinal cortex, and subsequently spreads \nthroughout temporal and orbiogrontal cortext, poseterior cingulated, and association cortex [7]. \nHowever, most existing studies only exploited structural and functional alterations in separation, \nwhich ignore the potential interaction between them. The fusion of MRI and PET imaging \nmodalities will increase the statistical power in identification of disease-related brain regions, \nespecially for early AD, at which stage the disease-related regions are most likely to be weak-\neffect regions that are difficult to be detected from MRI or PET alone. Once a good set of disease-\nrelated brain regions is identified, they can be further used to build an effective classifier (i.e., a \nbiomarker from the clinical perspective) to enable AD diagnose with high sensitivity and \nspecificity. \nThe idea of multi-modality data fusion in the research of neurodegenerative disorders has been \nexploited before. For example, a number of models have been proposed to combine \nelectroencephalography (EEG) and functional MRI (fMRI), including parallel EEG-fMRI \nindependent component analysis [18]-[19], EEG-informed fMRI analysis [18] [20], and \nvariational Bayesian methods [18] [21]. The purpose of these studies is different from ours, i.e., \nthey aim to combine EEG, which has high temporal resolution but low spatial resolution, and \nfMRI, which has low temporal resolution but high spatial resolution, so as to obtain an accurate \npicture for the whole brain with both high spatial and high temporal resolutions [18]-[21]. Also, \nthere have been some studies that include both MRI and PET data for classification [15], [22]-\n[25]. However, these studies do not make use of the fact that MRI and PET measure the same \nunderlying disease pathology from two complementary perspectives (i.e., structural and functional \nperspectives), so that the analysis of one imaging modality can borrow strength from the other. \nIn this paper, we focus on the problem of identifying disease-related brain regions from multi-\nmodality data. This is actually a variable selection problem. Because MRI and PET data are high-\ndimensional, regularization techniques are needed for effective variable selection, such as the L1-\nregularization technique [25]-[30] and the L2/L1-regularization technique [31]. In particular, \nL2/L1-regularization has been used for variable selection jointly on multiple related datasets, also \nknown as multitask feature selection [31], which has a similar nature to our problem. Note that \nboth L1- and L2/L1-regularizations are convex regularizations, which have gained them popularity \nin the literature. On the other hand, there is increasing evidence that these convex regularizations \ntend \nthese convex \nregularizations could lead to miss-identification of the weak-effect disease-related brain regions, \nwhich unfortunately make up a large portion of the disease-related brain regions especially in early \nAD. Also, convex regularizations tend to select many irrelevant variables to compensate for the \noverly severe shrinkage in the parameters of the relevant variables. Considering these limitations \nof convex regularizations, we study non-convex regularizations [33]-[35] [39], which have the \nadvantage of producing mildly or slightly shrunken parameter estimates so as to be able to \npreserve weak-effect disease-related brain regions and the advantage of avoiding selecting many \ndisease-irrelevant regions. \nSpecifically in this paper, we propose a sparse composite linear discriminant analysis model, \ncalled SCLDA, for identification of disease-related brain regions from multi-modality data. The \ncontributions of our paper include: \n\ntoo severely shrunken parameter estimates. Therefore, \n\nto produce \n\n \n\n2 \n\n\f\u2022 Formulation: We propose a novel formulation that decomposes each LDA parameter into a \nproduct of a common parameter shared by all the data sources and a parameter specific to \neach data source, which enables joint analysis of all the data sources and borrowing strength \nfrom one another. We further prove that this formulation is equivalent to a penalized \nlikelihood with non-convex regularization. \n\n\u2022 Algorithm: We show that the proposed non-convex optimization can be solved by the DC \n(difference of convex functions) programming [39]. More importantly, we show that in using \nthe DC programming, the property of the non-convex regularization in terms of preserving \nweak-effect features can be nicely revealed. \n\n\u2022 Application: We apply the proposed SCLDA to the PET and MRI data of early AD patients \nand normal controls (NC). Our study identifies disease-related brain regions that are \nconsistent with the findings in the AD literature. AD vs. NC classification based on these \nidentified regions achieves high accuracy, which makes the proposed method a useful tool for \nclinical diagnosis of early AD. In contrast, the convex-regularization based multitask feature \nselection method [31] identifies more irrelevant brain regions and yields a lower classification \naccuracy. \n\n \n2 \n\n\ud835\udc9b!\n\n!!!!\n\n be the overall normalized class SSQP. \n\nReview of LDA and its variants \n\n be the overall sample mean, \n be the total normalized sum of squares and products (SSQP), \n\nDenote \ud835\udc81= \ud835\udc4d!,\ud835\udc4d!,\u2026,\ud835\udc4d! !\n as the variables and assume there are \ud835\udc3d classes. Denote \ud835\udc41! as the \nsample size of class \ud835\udc57 and \ud835\udc41=\n is the total sample size. Let \ud835\udc33= \ud835\udc9b!,\ud835\udc9b!,\u2026,\ud835\udc9b! ! be the \n\ud835\udc41!\n!!!!\n\ud835\udc41\u00d7\ud835\udc5d sample matrix, where \ud835\udc9b! is the \ud835\udc56!! sample and \ud835\udc54\ud835\udc56 is its associated class index. Let \n be the sample mean of class \ud835\udc57, \ud835\udecd=!!\n\ud835\udecd!= !!!\n\ud835\udc9b!\n!!!!,!!!!\n\ud835\udc9b!\u2212\ud835\udecd \ud835\udc9b!\u2212\ud835\udecd !\n\ud835\udc13=!!\n!!!!\n\ud835\udc9b!\u2212\ud835\udecd! \ud835\udc9b!\u2212\ud835\udecd! !\n be the normalized class SSQP of class \ud835\udc57, and \ud835\udc16=\n\ud835\udc16!= !!!\n!!!!,!!!!\n\ud835\udc41!\ud835\udc16!\n!!\n!!!!\nThe objective of LDA is to seek for a \ud835\udc5d\u00d7\ud835\udc5e linear transformation matrix, \ud835\udec9! , with which \ud835\udec9!!\ud835\udc4d \nretains the maximum amount of class discrimination information in \ud835\udc4d. To achieve this objective, \none approach is to seek for the \ud835\udec9! that maximizes the between-class variance of \ud835\udec9!!\ud835\udc4d, which can \nbe measured by tr(\ud835\udec9!!\ud835\udc13\ud835\udec9!), while minimizing the within-class variance of \ud835\udec9!!\ud835\udc4d, which can be \nmeasured by tr(\ud835\udec9!!\ud835\udc16\ud835\udec9!). Here tr() is the matrix trace operator. This is equivalent to solving the \n \u00a0\ud835\udec9! =argmax\ud835\udec9! \ud835\udc2d\ud835\udc2b(\ud835\udec9!!\ud835\udc13\ud835\udec9!)\n\ud835\udc2d\ud835\udc2b(\ud835\udec9!!\ud835\udc16\ud835\udec9!) . (1) \nNote that \ud835\udec9! corresponds to the right eigenvector of \ud835\udc16!!\ud835\udc13 and \ud835\udc5e=\ud835\udc3d\u22121. \nAnother approach used for finding the \ud835\udec9! is to use the maximum likelihood estimation for \ncommon covariance matrix, and their mean differences lie in a \ud835\udc5e-dimensional subspace of the \ud835\udc5d-\n\nGaussian populations that have different means and a common covariance matrix. Specifically, as \nin [36], this approach is developed by assuming the class distributions are Gaussian with a \n\nfollowing optimization problem: \n\ndimensional original variable space. Hastie [37] further generalized this approach by assuming \nthat class distributions are a mixture of Gaussians, which has more flexibility than LDA. However, \nboth approaches assume a common covariance matrix for all the classes, which is too strict in \nmany practical applications, especially in high-dimensional problems where the covariance \nmatrices of different classes tend to be different. Consequently, the linear transformation explored \nby LDA may not be effective. \nIn [38], a heterogeneous LDA (HLDA) is developed to relax this assumption. The HLDA seeks \n\nfor a \ud835\udc5d\u00d7\ud835\udc5d linear transformation matrix, \ud835\udec9, in which only the first \ud835\udc5e columns (\ud835\udec9! ) contain \ndiscrimination information and the remaining \ud835\udc5d\u2212\ud835\udc5e columns (\ud835\udec9!!!) contain no discrimination \n\ninformation. For Gaussian models, assuming lack of discrimination information is equivalent to \nassuming that the means and the covariance matrices of the class distributions are the same for all \n\n \n\n3 \n\n\fThe proposed SCLDA \n\none aspect of the same set of physical variables, e.g., the MRI and PET capture the structural and \n\nwritten as below [38]: \n\n!!!log\ud835\udec9!!\ud835\udc16!\ud835\udec9!\n\n!!!!\n\ndata sources measure the same physical process. Also, when the sample size of each data source is \n\nTo tackle these problems, we propose a composite parameterization following the line as [40]. \n\nmentioning that the LDA in the form of (1) is a special case of the HLDA [38]. \n \n3 \n\nclasses, in the \ud835\udc5d\u2212\ud835\udc5e dimensional subspace. Following this, the log-likelihood function of \ud835\udec9 can be \n \ud835\udc59\ud835\udec9|\ud835\udc19 =\u2212!!log\ud835\udec9!!!! \ud835\udc13\ud835\udec9!!! \u2212\n+\ud835\udc41log\ud835\udec9 , (2) \nHere \ud835\udc00 denotes the determinant of matrix \ud835\udc00. There is no closed-form solution for \ud835\udec9. As a result, \nnumeric methods are needed to derive the maximum likelihood estimate for \ud835\udec9. It is worth \nSuppose that there are multiple data sources, \ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! , with each data source capturing \nfunctional aspects of the same brain regions. For each data source, \ud835\udc19! , there is a linear \ntransformation matrix \ud835\udec9! , which retains the maximum amount of class discrimination \ninformation in \ud835\udc19! . A naive way for estimating \ud835\udeaf= \ud835\udec9!,\ud835\udec9!,\u2026,\ud835\udec9! is to separately estimate \neach \ud835\udec9! based on \ud835\udc19! . Apparently, this approach does not take advantage of the fact that all the \nsmall, this approach may lead to unreliable estimates for the \ud835\udec9! \u2019s. \nSpecifically, let\u00a0\ud835\udf03!,!! be the element at the k-th row and l-th column of \ud835\udec9!. We treat \n\ud835\udf03!,!!,\ud835\udf03!,!!,\u2026,\ud835\udf03!,!! as an interrelated group and parameterize each \ud835\udf03!,!! as \ud835\udf03!,!! =\ud835\udeff!\ud835\udefe!,!! , for \n1\u2264\ud835\udc58\u2264\ud835\udc5d,\u00a01\u2264\ud835\udc59\u2264\ud835\udc5d and 1\u2264\ud835\udc5a\u2264\ud835\udc40. In order to assure identifiability, we restrict each \ud835\udeff!\u22650. \nHere, \ud835\udeff! represents the common information shared by all the data sources about variable \ud835\udc58, while \n\ud835\udefe!,!! represents the specific information only captured by the \ud835\udc5a!! data source. For example, for \ndisease-related brain region identification, if \ud835\udeff!=0, it means that all the data sources indicate \nvariable \ud835\udc58 is not a disease-related brain region; otherwise, variable \ud835\udc58 is a disease-related brain \nregion. \ud835\udefe!,!! \u22600 means that the \ud835\udc5a!! data source supports this assertion. \nThe log-likelihood function of \ud835\udeaf is: \n\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19!\n\u2212!!! log\ud835\udec9!!!! !\ud835\udc13!\ud835\udec9!!!! \u2212\n+\n!!!! log\ud835\udec9!!\ud835\udc16!!\ud835\udec9!!\n=\n!!!!\n!!!!\n\ud835\udc41!log\ud835\udec9! \u00a0, \non \ud835\udeaf: \n \ud835\udf03!,!! =\ud835\udeff!\ud835\udefe!,!! , \ud835\udeff!\u22650, 1\u2264\ud835\udc58,\ud835\udc59\u2264\ud835\udc5d, 1\u2264\ud835\udc5a\u2264\ud835\udc40. (3) \nLet \ud835\udeaa= \ud835\udefe!,!!,1\u2264\ud835\udc58\u2264\ud835\udc5d,1\u2264\ud835\udc59\u2264\ud835\udc5d,1\u2264\ud835\udc5a\u2264\ud835\udc40 and \ud835\udebf= \ud835\udeff!,1\u2264\ud835\udc58\u2264\ud835\udc5d . An \nchoice for estimation of \ud835\udeaa and \ud835\udebf is to maximize the \ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! \u00a0\u00a0subject to the \nconstraints in (3). However, it can be anticipated that no element in the estimated \ud835\udeaa and \ud835\udebf will be \nrelated regions. Thus, we encourage the estimation of \ud835\udebf and the first\u00a0\ud835\udc5e columns of \ud835\udeaa (i.e., the \ncolumns containing discrimination information) to be sparse, by imposing the L1-penalty on \ud835\udeaa and \n\ud835\udebf. By doing so, we obtain the following optimization problem for the proposed SCLDA: \n=argmin\ud835\udeaf \u2212\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! +\u00a0\ud835\udf06!\n\ud835\udeaf=argmin\ud835\udeaf\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19!\n\ud835\udeff!! +\n\u00a0 , subject to \n\ud835\udefe!,!!\n\ud835\udf06!\n \ud835\udf03!,!! =\ud835\udeff!\ud835\udefe!,!! , \ud835\udeff!\u22650, 1\u2264\ud835\udc58,\ud835\udc59\u2264\ud835\udc5d,\u00a01\u2264\ud835\udc5a\u2264\ud835\udc40.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (4)\u00a0\n!,!,!\nHere, \ud835\udf06! and \ud835\udf06! control the degrees of sparsity of \ud835\udebf and \ud835\udeaa, respectively. Tuning of two \n\nwhich follows the same line as (2). However, our formulation includes the following constraints \n\nexactly zero, resulting in a model which is not interpretable, i.e., poor identification of disease-\n\nintuitive \n\nregularization parameters is difficult. Fortunately, we prove the following Theorem which \nindicates that formulation (4) is equivalent to a simpler optimization problem involving only one \nregularization parameter. \n\n \n\n4 \n\n\fTheorem 1: The optimization problem (4) is equivalent to the following optimization problem: \n\n\ud835\udeaf=argmin\ud835\udeaf\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! \n =argmin\ud835\udeaf \u2212\ud835\udc59!\ud835\udeaf|\ud835\udc19!,\ud835\udc19!,\u2026,\ud835\udc19! +\u00a0\ud835\udf06\n!\nwith \ud835\udf06=2 \ud835\udf06!\ud835\udf06!, i.e., \ud835\udf03!,!! =\ud835\udf03!,!! . \n\n\ud835\udf03!,!!\n\n!!!!\n\n!!!!\n\n\u00a0, (5) \n\nprocedure that is included in the supplemental material. For each specification of the parameters \n\nThe proof can be found in the supplementary document. It can also be found in the supplementary \nmaterial how this formulation will serve the purpose of the composite parameterization, i.e., \ncommon information and specific information can be estimated separately and simultaneously. \nThe optimization problem (5) is a non-convex optimization problem that is difficult to solve. We \naddress this problem by using an iterative two-stage procedure known as Difference of Convex \nfunctions (DC) programming [39]. A full description of the algorithm can be found in the \nsupplemental material. \n \n4 \nIn this section, we conduct experiments to compare the performance of the proposed SCLDA with \nsparse LDA (SLDA) [42] and multitask feature selection [31]. Specifically, as we focus on LDA, \nwe use the multitask feature selection method developed in [31] on LDA, denoted as MSLDA. \nBoth SLDA and MSLDA adopt convex regularizations. Specifically, SLDA selects features from \none single data source with L1-regularization; MSLDA selects features from multiple data sources \nwith L2/L1 regularization. \nWe evaluate the performances of these three methods across various parameters settings, including \n\nSimulation studies \n\nthe number of variables, \ud835\udc5d, the number of features, \ud835\udc59, the number of data sources, M, sample size, \n\ud835\udc5b, and the degree of overlapping of the features across different data sources, s% (the larger the \n\ud835\udc60%, the more shared features among the datasets). Definition of \ud835\udc60% can be found in the simulation \nsettings, \ud835\udc40 datasets can be generated following the simulation procedure. We apply the proposed \nSCLDA to the \ud835\udc40 datasets, and identify one feature vector \ud835\udec9(!) for each dataset, with \ud835\udf06 and \ud835\udc5e \nelements in the learned feature vector \ud835\udec9(!)which are also non-zero in the \ud835\udec3!; false positives are the \nnon-zero elements in \ud835\udec9(!), which are actually zero in \ud835\udec3!. As there are \ud835\udc5a pairs of the TPs and FPs \nfor the \ud835\udc40 datasets, the average TP over the M datasets and the average FP over the M datasets are \nand FPs generation) can be repeated for \ud835\udc35 times, and \ud835\udc35 pairs of average TP and average FP are \ncollected for SCLDA. In a similar way, we can obtain \ud835\udc35 pairs of average TP and average FP for \nvariables (\ud835\udc5d=100,200,500), the number of data sources (\ud835\udc5a=2,5,10), and the degree of \noverlapping of the features across different data sources (\ud835\udc60%=90%,70%). Additionally, \ud835\udc5b \ud835\udc5d is \nkept constant, \ud835\udc5b \ud835\udc5d=1. A general observation is that SCLDA is better than SLDA and MSLDA \n\ud835\udc5a=2,5,10 respectively. It is clear that the advantage of SCLDA over both SLDA and MSLDA is \n\nboth SLDA and MSLDA. \nFigures 1 (a) and (b) show comparison between SCLDA, SLDA and MSLDA by scattering the \naverage TP against the average FP for each method. Each point corresponds to one of the N \nrepetitions. The comparison is across various parameters settings, including the number of \n\nacross all the parameter settings. Some specific trends can be summarized as follows: (i) Both \nSCLDA and MSLDA outperform SLDA in terms of TPs; SCLDA further outperforms MSLDA in \nterms of FPs. (ii) In Figure 2 (a), rows correspond to different numbers of data sources, i.e., \n\nchosen by the method described in section 3.3. The result can be described by the number of true \npositives (TPs) as well as the number of false positives (FPs). Here, true positives are the non-zero \n\nused as the performance measures. This procedure (i.e., from data simulation, to SCLDA, to TPs \n\nmore significant when there are more data sources. Also, MSLDA performs consistently better \nthan SLDA. Similar phenomena are shown in Figure 2 (b). This demonstrates that in analyzing \neach data source, both SCLDA and MSLDA are able to make use of the information contained in \nother data sources. SCLDA can use this information more efficiently, as SCLDA can produce less \nshrunken parameter estimates than MSLDA and thus it is able to preserve weak-effect features. \n(iii) Comparing Figures 2 (a) and (b), it can be seen that the advantage of SCLDA or MSLDA \nover SLDA is more significant as the data sources have more degree of overlapping in their \n\n \n\n5 \n\n\fperform similarly when \ud835\udc60%=40 or less. \n\nfeatures. Finally, although not presented here, our simulation shows that the three methods \n\n \n\n (a) (b) \nFigure 1: Average numbers of TPs vs FPs for SCLDA (green symbols \u201c+\u201d), SLDA (blue symbols \n\n\u201c*\u201d) and MSLDA (red symbols \u201co\u201d) (a) \ud835\udc60%=90%,\ud835\udc5b \ud835\udc5d=1; (b) \ud835\udc60%=70%,\ud835\udc5b \ud835\udc5d=1 \n\n \n\nCase study \n\nD ata preprocessing \n\n \n5 \n \n5.1 \nOur study includes 49 AD patient and 67 age-matched normal controls (NC), with each subject of \nAD or NC being scanned both by PET and MRI. The PET and MRI images can be downloaded \nfrom the database by the Alzheimer\u2019s Disease Neuroimaging Initiative. In what follows, we \noutline the data preprocessing steps. \nEach image is spatially normalized to the Montreal Neurological Institute (MNI) template, using \nthe affine transformation and subsequent non-linear wraping algorithm [43] implemented in the \nSPM MATLAB toolbox. This is to ensure that each voxel is located in the same anatomical region \nfor all subjects, so that spatial locations can be reported and interpreted in a consistent manner. \nOnce all the images in the MNI template, we further apply the Automated Anatomical Labeling \n(AAL) technique [43] to segment the whole brain of each subject into 116 brain regions. The 90 \nregions that belong to the cerebral cortex are selected for the later analysis, as the other regions are \nnot included in the cerebral cortex are rarely considered related with AD in the literature. The \nmeasurement of each region in the PET data is regional cerebral blood flow (rCBF); the \nmeasurement of each region in the MRI data is the structural volume of the region. \n \n5.2 \nSCLDA is applied to the preprocessed PET and MRI data of AD and NC with the penalty \nparameter selected by the AIC method mentioned in section 3. 26 disease-related brain regions are \nidentified from PET and 21 from MRI (see Table 1 for their names). The maps of the disease-\nrelated brain regions identified from PET and MRI are highlighted in Figure 2 (a) and (b), \nrespectively, with different colors given to neighboring regions in order to distinguish them. Each \nfigure is a set of horizontal cut away slices of the brain as seen from the top, which aims to \nprovide a full view of locations of the regions. \nOne major observation is that the identified disease-related brain regions from MRI are in the \nhippocampus, parahippocampus, temporal lobe, frontal lobe, and precuneus, which is consistent \nwith the existing literature that reports structural atrophy in these brain areas. [3-6,12-14]. The \nidentified disease-related brain regions from PET are in the temporal, frontal and parietal lobes, \nwhich is consistent with many functional neuroimaging studies that report reduced rCBF or \n\nD isease-related brain regions \n\n \n\n6 \n\n\freduced cortical glucose metabolism in these areas [8-10, 12-14]. Many of these identified \ndisease-related regions can be explained in terms of the AD pathology. For example, hippocampus \nis a region affected by AD the earliest and severely [6] Also, as regions in the temporal lobe are \nessential for memory, damage on these regions by AD can explain the memory loss which is a \nmajor clinic symptom of AD. The consistency of our findings with the AD literature supports \neffectiveness of the proposed SCLDA. \nAnother finding is that there is a large overlap between the identified disease-related regions from \nPET and those from MRI, which implies strong interaction between functional and structural \nalterations in these regions. Although well-accepted biological mechanisms underlying this \ninteraction are still not very clear, there are several explanations existing in the literature. The first \nexplanation is that both functional and structural alterations could be the consequence of dendritic \narborizations, which results from intracellular accumulation of PHFtau and further leads to neuron \ndeath and grey matter loss [14]. The second explanation is that the AD pathology may include a \nvascular component, which may result in reduced rCBF due to limited blood supply and may \nultimately result in structural alteration such as brain atrophy [45]. \n \n\n (a) (b) \n\nFigure 2: locations of disease-related brain regions identified from (a) MRI; (b) PET \n\n \n\nC lassification accuracy \n\n \n5.3 \nAs one of our primary goals is to distinguish AD from NC, the identified disease-related brain \nregions through SCLDA are further utilized for establishing a classification model. Specifically, \nfor each subject, the rCBF values of the 26 disease-related brain regions identified from PET and \nthe structural volumes of the 21 disease-related brain regions identified from MRI are used, as a \njoint spatial pattern of both brain physiology and structure. As a result, each subject is associated \nwith a vector with 47 features/variables. Linear SVM (Support Vector Machine) is employed as \nthe classifier. The classification accuracy based on 10-fold cross-validation is 94.3%. For \ncomparison purposes, MSLDA is also applied, which identifies 45 and 38 disease-related brain \nregions for PET and MRI, respectively. Linear SVM applied to the 45+38 features gives a \nclassification accuracy of only 85.8%. Note that MSLDA identifies a much larger number of \ndisease-related brain regions than SCLDA, but some of the identified regions by MSLDA may \nindeed be disease-irrelevant, so including them deteriorates the classification. \n \n5.4 \nseverity of cognitive im pairm ent in A D \nIn addition to classification, it is also of interest to further verify relevance of the identified \ndisease-related regions with AD in an alternative way. One approach is to investigate the degree to \nwhich those disease-related regions are relevant to cognitive impairment that can be measured by \nthe Alzheimer\u2019s disease assessment scale \u2013 cognitive subscale (ADAS-cog). ADAS measures \nseverity of the most important symptoms of AD, while its subscale, ADAS-cog, is the most \n\nR elationship betw een structural atrophy and abnorm al rC BF, and \n\n \n\n7 \n\n\fpopular cognitive testing instrument used in clinic trails. The ADAS-cog consists of 11 items \nmeasuring disturbances of memory, language, praxis, attention and other cognitive abilities that \nare often affected by AD. As the total score of these 11 items provides an overall assessment of \ncognitive impairment, we regress this ADAS-cog total score (the response) against the rCBF or \nstructure volume measurement (the predictor) of each identified brain region, using a simple \nregression. The regression results are listed in Table 1. \nIt is not surprising to find that some regions in the hippocampus area and temporal lobes are \namong the best predictors, as these regions are extensively reported in the literature as the most \nseverely affected by AD [3-6]. Also, it is found that most of these brain regions are weak-effect \npredictors, as most of them can only explain a small portion of the variability in the ADAS-cog \ntotal score, i.e., many R-square values in Table 1 are less than 10%. However, although the effects \nare weak, most of them are significant, i.e., most of the p-values in Table 1 are smaller than 0.05. \nFurthermore, it is worth noting that 70.22% variability in ADAS-cog can be explained by taking \nall the 26 brain regions identified from PET as predictors in a multiple regression model; 49.72% \nvariability can be explained by taking all the 21 brain regions from MRI as predictors in a multiple \nregression model. All this findings imply that the disease-related brain regions are indeed weak-\neffect features if considered individually, but jointly they can play a strong role for characterizing \nAD. This verifies the suitability of the proposed SCLDA for AD studies, as SCLDA can preserve \nweak-effect features. \n \n\nTable 1: Explanatory power of regional rCBF and structural volume for variability in ADAS-cog \n(\u201c~\u201d means this region is not identified from PET (or MRI) as a disease-related region by SCLDA) \n\n \n\nPET \n\np-\n\nMRI \n\np-\n\nBrain regions \n\nPrecentral_L \nPrecentral_R \nFrontal_Sup_L \nFrontal_Sup_R \nFrontal_Mid_R \nFrontal_M_O_L \nFrontal_M_O_R \n\nInsula_L \nInsula_R \n\nPET \n\np-\n\nMRI \n\np-\n\nBrain regions \n\n~ \n\n~ \n\nR2 \n\nR2 \n\nvalue \n\nvalue \n\nR2 \nR2 \nvalue \nvalue \n0.090 0.001 0.313 <10-4 \n0.003 0.503 0.027 0.077 \n0.038 0.034 0.028 0.070 \n0.044 0.022 \n0.066 0.005 0.044 0.023 \n0.051 0.013 0.047 0.018 \n0.038 0.035 0.026 0.081 \n0.044 0.023 \n0.001 0.677 \n0.056 0.010 0.072 0.003 \n0.173 <10-4 \n0.063 0.006 \n0.036 0.040 0.086 0.001 \n0.019 0.138 0.126 0.000 \n0.063 0.006 0.025 0.084 \n0.016 0.171 0.163 <10-4 Paracentr_Lobu_L 0.035 0.043 0.000 0.769 \n\nAmygdala_L \nCalcarine_L \nLingual_L \n\nPostcentral_L \nParietal_Sup_R \n\nAngular_R \nPrecuneus_R \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n~ \n\n~ \n~ \n\nAll regions \n\n0.020 0.122 \n\n0.082 0.001 \n\n0.242 <10-4 \n\nConclusion \n\nTemporal_P_S_R \nTemporal_Inf_R \n\nPallidum_L \nPallidum_R \nHeschl_L \nHeschl_R \n\n0.125 0.000 \n0.004 0.497 0.082 0.001 \n0.001 0.733 0.040 0.030 \n0.184 <10-4 \n0.158 <10-4 \n\n0.001 0.640 \n0.000 0.744 0.111 0.000 \n0.008 0.336 0.071 0.003 \n0.147 <10-4 \n0.187 <10-4 \n0.702 <10-4 \n0.497 <10-4 \n\nCingulum_A_R \nCingulum_Mid_L \nCingulum_Post_L \nHippocampus_L \nHippocampus_R \nParaHippocamp_L 0.206 <10-4 \n \n6 \nIn the paper, we proposed a SCLDA model for identification of disease-related brain regions of \nAD from multi-modality data, which is capable to preserve weak-effect disease-related brain \nregions due to its less shrinkage imposed on its parameters. We applied SCLDA to the PET and \nMRI data of early AD patients and normal controls. As MRI and PET measure two \ncomplementary aspects (structural and functional aspects, respectively) of the same AD pathology, \nfusion of these two image modalities can make effective use of their interaction and thus improve \nthe statistical power in identification of disease-related brain regions. Our findings were consistent \nwith the literature and also showed some new aspects that may suggest further investigation in \nneuroimaging research in the future. \n \n \n \n\n \n\n8 \n\n\fR eferences \n[1] deToledo-Morrell, L., Stoub, T.R., Bulgakova, M. 2004. MRI-derived entorhinal volume is a good predictor of \n\nconversion from MCI to AD. Neurobiol. Aging 25, 1197\u20131203. \n\n[2] Morra, J.H., Tu, Z. Validation of automated hippocampal segmentation method. NeuroImage 43, 59\u201368, 2008. \n[3] Morra, J.H., Tu, Z. 2009a. Automated 3D mapping of hippocampal atrophy. Hum. Brain Map. 30, 2766\u20132788. \n[4] Morra, J.H., Tu, Z. 2009b. Automated mapping of hippocampal atrophy in 1-year repeat MRI data. NeuroImage 45, \n\n213-221. \n\n[5] Schroeter, M.L., Stein, T. 2009. Neural correlates of AD and MCI. NeuroImage 47, 1196\u20131206. \n[6] Braak, H., Braak, E. 1991. Neuropathological stageing of Alzheimer-related changes. Acta Neuro. 82, 239\u2013259. \n[7] Bradley, K.M., O'Sullivan. 2002. Cerebral perfusion SPET correlated with Braak pathological stage in AD. Brain \n\n125, 1772\u20131781. \n\nAD. Brain Cogn. 32, 365\u2013383. \n\n[8] Keilp, J.G., Alexander, G.E. 1996. Inferior parietal perfusion, lateralization, and neuropsychological dysfunction in \n\n[9] Schroeter, M.L., Stein, T. 2009. Neural correlates of AD and MCI. NeuroImage 47, 1196\u20131206. \n[10] Asllani, I., Habeck, C. 2008. Multivariate and univariate analysis of continuous arterial spin labeling perfusion MRI \n\nin AD. J. Cereb. Blood Flow Metab. 28, 725\u2013736. \n\n[11] Du,A.T., Jahng, G.H. 2006. Hypoperfusion in frontotemporal dementia and AD. Neurology 67, 1215\u20131220. \n[12] Ishii, K., Kitagaki, H. 1996. Decreased medial temporal oxygen metabolism in AD. J. Nucl. Med. 37, 1159\u20131165. \n[13] Johnson, N.A., Jahng, G.H. 2005. Pattern of cerebral hypoperfusion in AD. Radiology 234, 851\u2013859. \n[14] Wolf, H., Jelic, V. 2003. A critical discussion of the role of neuroimaging in MCI. Acta Neuroal: 107 (4), 52-76. \n[15] Tosun, D., Mojabi, P. 2010. Joint analysis of structural and perfusion MRI for cognitive assessment and classification \n\nof AD and normal aging. NeuroImage 52, 186-197. \n\n[16] Alsop, D., Casement, M. 2008. Hippocampal hyperperfusion in Alzheimer's disease. NeuroImage 42, 1267\u20131274. \n[17] Mosconi, L., Tsui, W.-H. 2005. Reduced hippocampal metabolism in MCI and AD. Neurology 64, 1860\u20131867. \n[18] Mulert, C., Lemieux, L. 2010. EEG-fMRI: physiological basis, technique and applications. Springer. \n[19] Xu, L., Qiu, C., Xu, P. and Yao, D. 2010. A parallel framework for simultaneous EEG/fMRI analysis: methodology \n\nand simulation. NeuroImage, 52(3), 1123-1134. \n\n[20] Philiastides, M. and Sajda, P. 2007. EEG-informed fMRI reveals spatiotemporal characteristics of perceptual decision \n\nmaking. Journal of Neuroscience, 27(48), 13082-13091. \n\n[21] Daunizeau, J., Grova, C. 2007. Symmetrical event-related EEG/fMRI information fusion. NeuroImage 36, 69-87. \n[22] Jagust, W. 2006. PET and MRI in the diagnosis and prediction of dementia. Alzheimer\u2019s Dement 2, 36-42. \n[23] Kawachi, T., Ishii, K. and Sakamoto, S. 2006. Comparison of the diagnostic performance of FDG-PET and VBM. \n\n[24] Matsunari, I., Samuraki, M. 2007. Comparison of 18F-FDG PET and optimized voxel-based morphometry for \n\n[25] Schmidt, M., Fung, G. and Rosales, R. 2007. Fast optimization methods for L1-regularization: a comparative study \n\nEur.J.Nucl.Med.Mol.Imaging 33, 801-809. \n\ndetection of AD. J.Nucl.Med 48, 1961-1970. \n\nand 2 new approaches. ECML 2007. \n\n[26] Liu, J., Ji, S. and Ye, J. 2009. SLEP: sparse learning with efficient projections, Arizona state university. \n[27] Tibshirani, R. 1996. Regression Shrinkage and Selection via the Lasso, JRSS, Series B, 58(1):267\u2013288. \n[28] Friedman, J., Hastie, T. and Tibshirani, R. 2007. Sparse inverse covariance estimation with the graphical lasso. \n\nBiostatistics, 8(1):1\u201310. \n\n[29] Zou, H., Hastie, T. and Tibshirani, R. 2006. Sparse PCA, J. of Comp. and Graphical Statistics, 15(2), 262-286. \n[30] Qiao, Z., Zhou, L and Huang, J. 2006. Sparse LDA with applications to high dimensional low sample size data. \n\n[31] Argyriou, A., Evgeniou, T. and Pontil, M. 2008. Convex multi-task feature learning. Machine Learning 73(3): 243\u2013 \n\n[32] Huang, S., Li, J., et al. 2010. Learning Brain Connectivity of AD by Sparse Inverse Covariance Estimation, \n\nIAENG applied mathematics, 39(1). \n\n272. \n\nNeuroImage, 50, 935-949. \n\n[33] Candes, E., Wakin, M. and Boyd, S. 2008. Enhancing sparsity by reweighted L1 minimization. Journal of Fourier \n\nanalysis and applications, 14(5), 877-905. \n\n[34] Mazumder, R.; Friedman, J. 2009. SparseNet: Coordinate Descent with Non-Convex Penalties. Manuscript. \n[35] Zhang, T. 2008. Multi-stage Convex Relaxation for Learning with Sparse Regularization. NIPS 2008. \n[36] Campbell, N. 1984. Canonical variate analysis ageneral formulation. Australian Jour of Stat 26, 86\u201396. \n[37] Hastie, T. and Tibshirani, R. 1994. Discriminant analysis by gaussian mixtures. Technical report. AT&T Bell Lab. \n[38] Kumar, N. and Andreou, G. 1998. Heteroscedastic discriminant analysis and reduced rank HMMs for improved \n\nspeech recognition. Speech Communication, 26 (4), 283-297. \n\n[39] Gasso, G., Rakotomamonjy, A. and Canu, S. 2009. Recovering sparse signals with non-convex penalties and DC \n\nprogramming. IEEE Trans. Signal Processing 57( 12), 4686-4698. \n\n[40] Guo, J., Levina, E., Michailidis, G. and Zhu, J. 2011. Joint estimation of multiple graphical models. Biometrika 98(1) \n\n[41] Bertsekas, D. 1982. Projected newton methods for optimization problems with simple constraints. SIAM J. Control \n\n1-15. \n\nOptim 20, 221-246. \n\n[42] Clemmensen, L., Hastie, T., Witten, D. and Ersboll:, B. 2011. Sparse Discriminant Analysis. Technometrics (in press) \n[43] Friston, K.J., Ashburner, J. 1995. Spatial registration and normalization of images. HBM 2, 89\u2013165. \n[44] Tzourio-Mazoyer, N., et al., 2002. Automated anatomical labelling of activations in SPM. NeuroImage 15, 273\u2013289. \n[45] Bidzan, L. 2005. Vascular factors in dementia. Psychiatr. Pol. 39, 977-986. \n\n \n\n9 \n\n\f", "award": [], "sourceid": 827, "authors": [{"given_name": "Shuai", "family_name": "Huang", "institution": null}, {"given_name": "Jing", "family_name": "Li", "institution": null}, {"given_name": "Jieping", "family_name": "Ye", "institution": null}, {"given_name": "Teresa", "family_name": "Wu", "institution": null}, {"given_name": "Kewei", "family_name": "Chen", "institution": null}, {"given_name": "Adam", "family_name": "Fleisher", "institution": null}, {"given_name": "Eric", "family_name": "Reiman", "institution": null}]}