NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Similar to the univariate case, the ordering in the d-dimensional space should also be the constraint of the multivariate quantile function (MQF), correct? Would the parameterization of the sum-of-squares (SOS) flow enforce this constraint on the MQF? Besides SOS flow, is there any other generative model that can enforce this constraint? To demonstrate the advantage of the multiple gradient descent algorithm (MGDA) in optimizing the objective function (5), did the author run ablation study by comparing it against the baseline of training the reconstruction loss first and then optimizing the negative log-likelihood term? Is there any significant improvement by using MGDA except no need to tune lambda? The performance of difference approaches on the MNIST dataset are pretty close. Did the author run any statistical test to confirm the performance gap is significant?
Reviewer 2
I appreciated the author's response to my questions and felt they adequately answered the core issues. I have increased my score by one point. The proposal could be strengthened by showing a clear plausible (or even better real-world data) example of when the quantile approach is better. Currently, as a reader, I'm still not convinced that I would ever choose the quantile method over the likelihood method but I believe likely an example exists. ----------- Original review ---------- This paper proposes a novelty detection framework which is a synthesis of several components including feature extraction via neural networks, density estimation via flows and multiple gradient descent for optimization. The framework allows for novelty detection via two different methods: the first is based on quantiles and the other is based on likelihood. While each component is not novel, the synthesis of the components has some novelty. Overall, the paper was reasonably well-written. 1. Questionable novelty or core contribution of multivariate quantile function (MQF) definition The paper suggests that their definition of the MQF is novel and/or a core contribution when saying "We extend the ..." in the introduction and "We propose the following multivariate generalization of the quantile function..." Yet, even in the paper, it is mentioned that this definition was previously given in [9]. Also, this definition has similarities to the definition of a "density destructor" in [Inouye and Ravikumar, 2018] except that they define something akin to the CDF rather than inverse CDF (and only require invertibility rather than triangular mapping). Overall, the claim that this is a core contribution seems weak. What is novel or particularly interesting about this definition? Why should the definition of a multivariate extension of the quantile function require an increasing triangular map? Why not just require invertibility? (Maybe the definition could be named "Triangular Quantile Function" so that its meaning is obvious and clear rather than suggesting it is the most obvious extension of the univariate quantile function.) 2. Empirically, the results for log likelihood seem to always do similar or better than quantile-based (except for maybe the KDDCUP). One suggestion in the paper for using quantile is that it may be easier to set the threshold for quantile-based methods (maybe some more explanation for why this is actually easier would be good). Are there any other reasons to prefer the quantile method? For example, it seems that the likelihood method would do much better for a donut shaped distribution because the hole of the donut would be an outlier via NLL but not via quantiles. Why do you think it works better for the KDDCUP dataset versus the others? Adding a discussion on this simple case and others might be interesting. [Inouye and Ravikumar, 2018] Inouye, D. and Ravikumar, P. "Deep Density Destructors." ICML, 2018.
Reviewer 3
This paper presents a generic method for solving a practically relevant problem of novelty detection. The authors show that existing methods including the classical One-class SVM [42] as well as the recent Latent Space Autoregression [1] can be recovered as special cases of the proposed method. Overall the paper is well-written and ideas are presented clearly. On the technical front, it exploits the recent work [19] and combines the existing techniques to come up with a generic/unifying method for novelty detection. For this, it also introduces a novel multivariate generalization of quantile functions. What I further liked in the paper are 1. the proposed method allows one to define scoring rules for novelty detection based on both quantiles as well as the estimated density 2. instead of trying to find the right trade-off between reconstruction loss and the negative log-likelihood (via tuning the regularization parameter), it shows that one can employ multi-obejective optimization to get better results. 3. A thorough experiments are conducted by comparing several baseline methods with suitable scoring rules.