NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3163
Title:DM2C: Deep Mixed-Modal Clustering

Reviewer 1

The authors tackle the clustering problem in the setting where each sample is only characterized by one of multiple modalities. They present an adversarial deep model with cycle consistency concerned to learn translations across modal spaces for unifying the representations. This model is evaluated on two datasets and shows the improvement over baselines. Pros: - The paper is well-organized, especially the way the authors motivate the problem and present the solution. - The key problem being solved is novel and interesting to me. It seems to be the first work trying to deal with totally unpaired ‘multi-modal’ data without any other constraints to my best of knowledge. - The proposed method is capable of improving the clustering performance supported by empirical results. And, it seems to be a fresh perspective to interpret the learned cross-modal translations as the optimal transport plan between modal spaces. Cons: - Further discussions about the proposed model should be made to complete my understanding, e.g., the relationship between generators. - Some technical details are not very clear or intuitive. Typos: - Line 139: it may be more precise to use ‘parameter sets’ instead of ‘parameters’. - Line 157: there should be a ‘the’ before ‘discriminators’.

Reviewer 2

This paper proposes a novel and challenging task that clusters data in multiple modalities without any pairing information, with a simple and reasonably workable approach to solve it. The idea of cycle consistency is not novel since it has been already shown to be effective in addressing data with multiple modalities in the supervised setting. However, this work is the first trial to use this idea to solve the unpaired data clustering problem. Moreover, the authors demonstrate the connection between the proposed method and optimal transport, which may inspire the future work. In general, this paper is well written and easy to follow. The experimental results seem good. But the experiments are not that thoroughly executed for me. Some details should be elaborated and ablation study should be carried out. As an application paper, for me, its most valuable part lies in the task it proposes. As I mentioned in the former part, our community could benefit from such a novel task which will bring forth various innovative work. Typos: In Eq.(1), it seems that the summation symbol is missed. In line 170, ‘must be transport’ should be ‘must be transported’. In line 179, ‘are feed’ should be ‘are fed’.

Reviewer 3

his paper aims at solving the clustering problem on mixed-modal data. Unlike traditional settings, the modalities involved in this paper are represented in the total absence of pairing information, thus are very hard to be aligned. So it is very natural to learn the transformation functions between modalities based on the cycle consistency principle. Moreover, the experimental results indicate that this idea indeed improves the unification of mixed-modal data representations and bridges the semantic gap. Overall, the presentation of this paper is detailed. However, some of the statements are not very clear so they should be re-organized. Also, there are few typos to be corrected. 1. Line 119, divided -> split. 2. Line 129, set -> sets.