Shigeyuki Oba, Motoaki Kawanabe, Klaus-Robert Müller, Shin Ishii
In bioinformatics it is often desirable to combine data from various measurement sources and thus structured feature vectors are to be analyzed that possess different intrinsic blocking characteristics (e.g., different patterns of missing values, obser- vation noise levels, effective intrinsic dimensionalities). We propose a new ma- chine learning tool, heterogeneous component analysis (HCA), for feature extrac- tion in order to better understand the factors that underlie such complex structured heterogeneous data. HCA is a linear block-wise sparse Bayesian PCA based not only on a probabilistic model with block-wise residual variance terms but also on a Bayesian treatment of a block-wise sparse factor-loading matrix. We study vari- ous algorithms that implement our HCA concept extracting sparse heterogeneous structure by obtaining common components for the blocks and speciﬁc compo- nents within each block. Simulations on toy and bioinformatics data underline the usefulness of the proposed structured matrix factorization concept.