Reviews: Likelihood-Free Overcomplete ICA and Applications In Causal Discovery

Overcomplete ICA (more sources than data) often becomes feasible after making a parametric assumption on the distribution of sources to make the computation of the likelihood feasible. The authors have proposed a method to estimate the mixing matrix without computing the likelihood. The proposed method is minimizing a distributional distance (MMD) between the generated and observed data when each source is produced by a nonlinear transformation of an independent noise. The generation procedure makes sure that sources are independent. The mixing matrix and the parameters of the generator of each source distribution are learned together. Challenges: Identifiability: The authors proposed the use of MoG as a parametric model for sources when the data is scarce. Such method has been extensively studied by [1] and [2] It is not clear from the paper under which circumstances the proposed algorithm converges and if it converges to the true source distributions and true mixing matrix. The main advantage of ICA method compared with Deep generative or inference methods is identifiability. Can you argue under which conditions the method become identifiable? Regarding the sparsity regularizer, sparse coding has been extensively used to approach ICA and Overcomplete ICA problems [3, 4]. The use of GAN-based methods to solve nonlinear ICA problem was studied in [5]. Given these previous work, I think the proposed method lacks sufficient novelty for acceptance. [1] Choudrey, Rizwan A., and Stephen J. Roberts. "Variational mixture of Bayesian independent component analyzers." Neural Computation 15.1 (2003): 213-252. [2] Mehrjou, Arash, Reshad Hosseini, and Babak Nadjar Araabi. "Mixture of ICAs model for natural images solved by manifold optimization method." 2015 7th Conference on Information and Knowledge Technology (IKT). IEEE, 2015. [3] Olshausen, Bruno A., and David J. Field. "Emergence of simple-cell receptive field properties by learning a sparse code for natural images." Nature 381.6583 (1996): 607. [4] Doi, Eizaburo, and Michael S. Lewicki. "Sparse coding of natural images using an overcomplete set of limited capacity units." Advances in neural information processing systems. 2005. [5] LEARNING INDEPENDENT FEATURES WITH ADVERSARIAL NETS FOR NON-LINEAR ICA Update: Thanks to the authors for providing a detailed answer to my questions. Even though some of my concerns still remain unsolved, I'd like to increase my score from 4 to 6.

Reviewer 2

Independent component analysis (ICA) is a tool for statistical data analysis and signal processing that is able to decompose multivariate signals into their underlying source components. A particularly interesting variant of the classical ICA is obtained by assuming more sources than sensors, that is the overcomplete ICA paradigm. In that case the sources can not be uniquely recovered even if the mixing matrix is known. The overcomplete ICA problem has been solved by assuming some parametric probabilistic models. In that work, a methodology that does not require any parametric assumption on the distribution of the independent sources is proposed. The idea is to learn the mixing matrix by using a generator that allow to draw sample easily. A MLP generator model with standard Gaussian input is learned by minimizing the Maximum Mean Discrepancy (MMD). That is very relevant and offers quite a lot of promising perspectives. The proposed methodology and its application in causal discovery will have an important impact and should be published in NeurIPS 2019 proceedings. UPDATE AFTER THE REBUTTAL After reading all the reviews and the authors' feedback, I maintain my opinion: that contribution is very convincing and deserves to be accepted.

Reviewer 3

The authors present a method for training overcomplete generative ICA models using a GAN approach with no posterior or likelihood calculation. Overall, the methods is clearly described and a simple method to use GANs to perform OICA. The evaluations are clear if somewhat limited in scope. The generative model does not have a corresponding latent inference algorithm, which limits its use. Comparison with other OICA models RICA will likely find degenerate bases when overcomplete [1] and so may not be a great comparison method. Score matching might be a better OICA method [2]? However, score matching ICA models are better described as analysis models and may not be well matched to generative recovery [3]. It would also add to the generality of the method if the performance on a higher dimensional dataset was explored. Natural images patches are commonly used. Does this methods scale up? Small comments Line 89, “some empirical estimator” What estimator are you using? Table 1, what are s.e.m.s? How significant are differences? [1] Livezey, Jesse A., Alejandro F. Bujan, and Friedrich T. Sommer. "Learning overcomplete, low coherence dictionaries with linear inference." arXiv preprint arXiv:1606.03474 (2016). [2] Hyvärinen, Aapo. "Estimation of non-normalized statistical models by score matching." Journal of Machine Learning Research 6.Apr (2005): 695-709. [3] Ophir, Boaz, et al. "Sequential minimal eigenvalues-an approach to analysis dictionary learning." 2011 19th European Signal Processing Conference. IEEE, 2011. POST RESPONSE UPDATES: Author response addressed my concerns.

Paper ID:	3739
Title:	Likelihood-Free Overcomplete ICA and Applications In Causal Discovery

Reviewer 1

Reviewer 2

Reviewer 3