Paper ID: | 639 |
---|---|

Title: | Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models |

This paper provides an approach, based on PPCA and PCCA, to finding outlier observations in multi-view data sets. The model embeds observations in a latent space shared across views and finds projections from that latent space to each view. Observations that do not agree with the shared latent space and projection (common to all observations in a particular view) are recognized as outliers. The approach uses stochastic EM to approximate the random variables of the model conditioned on data.

The paper presents an intriguing approach to outlier detection. Using the probability of the number of clusters being greater than 1 is an interesting innovation that I have not seen before. One limitation of that approach I see is whether to interpret instances with more than one latent vector as "outliers" or just multimodal. The description at the end of Section 5 bears this out: does the fact that a movie has more than one genre make it an outlier? In general, this means that the proposed method will work best when each instance can be represented with a single vector across multiple views. Presentation was rated "sub-standard" mostly due to the very dense blocks of text in some places (e.g., the related work section). Also, it would be better to have subsections in the paper to structure the text more. The derivations starting line 99 needed more detail for how the marginal distributions were arrived at (perhaps a large chunk of this part can be pushed to the appendix?). The experiments covered a large number of data sets but the only meaningful benchmark was PCCA (and SVM for one experiment). I thought the method of creating anomalous instances was clever and could be useful to others. The paragraph starting line 219 describes results for multi-view anomaly detection but it would be helpful in pointing toward specific data sets in the evaluation when doing this. The purpose of the paragraph starting line 238 was not clear to me, I could not see why it was needed. I noticed a typo on line 89 (missing space).

2-Confident (read it all; understood it all reasonably well)

The authors propose a generative model approach for multi-view anomaly detection. The method is based on estimation of the number of latent vectors that generate each instance with Dirichlet process priors. The estimation itself is performed using a stochastic EM algorithm. The paper consists of the proposed algorithm, with its underlying model and background, and a set of experiments / simulations to compare its performance with existing anomaly detection algorithms.

The paper is relevant to anomaly detetction and the method appears novel. The technical contribution is sound. There are no theoretical results and in fact the paper's claim is solely based on empirical results. The empirical part appears carefully done and well reported. The model is reasonable and the estimation algorithm based on the model also appears reasonable. The empirical part aims to capture took much ground I feel. The main goal is anomaly detection yet the empirical part also discusses missing value imputation and latent dimension reduction, which are outside the immediate scope of the paper. This comes at the expense of anomaly detection. Since we are dealing with a detection algorithm, it would be beneficial to explain when the algorithm makes false positives and false negatives; to explain when the detection problem is easy or hard; and to try to use simulations, generating data according to the assumed model itself, in order to evaluate whether the algorithms works as expected - even under the assumed model. Since the empirical part is all the paper has to offer in terms to evidence for merit of the proposed method, I recommend to remove the parts discussing problems other than anomaly detection, and to expand the empirical work along the lines above.

1-Less confident (might not have understood significant parts)

The paper presents a simple multi-view linear Gaussian latent variable model, where latent factors for each data point are clustered. Also, a heuristic multi-view anomaly detection criterion is proposed.

I find the connections to PCCA far-fetched and confusing. In the experiments, it is a serious flaw to use PCCA abbreviation to a model that is not PCCA. The insight of the real application needs to be elaborated. As such, it is difficult to grasp utility of the model and what anomaly means. I have some doubts regarding the anomaly score. It depends on the parameters of the DP base distribution and I would expect to see a sensitivity analysis (i.e. how r affects clusterings/anomaly scores). Different cluster assignments may not be sufficient for detecting anomaly because the actual realisations may still be similar. Additionally, the component number K affects the performance as verified in the experiments. Why cross-validation or some other method for model selection (for each data collection) was not used? I doubt how useful the DP formulation is for small number of views. The number of views corresponds to the maximum number of clusters. The assumption of same noise variance for all views is very limiting in practice; I would recommend exploring performance for alternative choices. How many views were used in the experiments? How missing value imputation is related with anomaly detection? What is the rationale?

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

This paper uses a nonparametric Bayesian model for detecting anomalies in Multiview data. The idea is to determine the number of latent vectors that is needed to generate Multiview vectors that associated with the same entity. Ideally, the number should be one since the same data point in different views correspond to the same latent representation. If the number exceeds one with high probability, the data can be considered as an outlier. Overall, the proposed approach is professional and demonstrates good performance. But the writing is a bit vague. In terms of algorithm, the contribution seems not strong.

My two concerns are as follows: 1)It is not clear which part does the major contribution lie in. The major components of the method, i.e., generative modeling, EM algorithm, and Gibbs sampling, are well developed methods. It is hard to see any innovative aspect in terms of method. It is suggested that the authors itemize the contributions in the introduction. 2)The main objective is to detect outliers, but the detection process is rather vaguely described. In fact, there is only one sentence regarding outlier detection in lines 133-134. “Embedding” the detection process in Gibbs sampling looks quite heuristic and seems that there’s no theory backing it. This part may be explained in detail.

2-Confident (read it all; understood it all reasonably well)

The article looks at the problem of anomaly detection. The article wants to be able to solve this problem by looking at more than two views at the same time and to be able to combine in a way multiple view to use all the information available. The article introduces a probabilistic model. One would like to do inference. This inference problem is then solved by something that looks like the cavity method. But the reviewer is not sure and has stopped understanding what was happening in that article from there. Novelty/originality : The reviewer can not judge. Clarity and presentation : The reviewer can not judge. Technical quality : The reviewer can not judge.

If the part 3 on inference is using the cavity method (Belief propagation) to solve this inference problem This should show in the article since these two names will appeal to people coming from different background. I not then this means that the reviewer hasn't understand anything to this article.

1-Less confident (might not have understood significant parts)

This paper presents a probabilistic model general to multi-view data. The model assumes each view of a data instance is generated from a single latent component and that "anomalous" instances are those for which not all views are generated from the same component. A stochastic EM algorithm is given that alternates between Gibbs sampling the component assignments and maximum-likelihood estimation of the projection matrices with all other parameters analytically integrated out. The proposed model is shown to consistently outperform 4 baseline models on 11 anomaly detection datasets (in which the ground-truth anomalies are known) that are artificially made to be multiview datasets by randomly splitting features.

This paper is well-written. The proposed model is elegant in the way it defines multiview anomalies and seems to yield a simple and practical inference algorithm. The authors do a good job of motivating multiview anomaly detection as a worthy task and covering some of the relevant prior work. The experiments are described thoroughly and the proposed model is shown to consistently outperform the four chosen baselines. While I think this paper is well-written and well-executed, I think the model itself does not represent a major conceptual leap from prior work. In the past couple years, there has been a lot of work on matrix and tensor factorization methods for anomaly detection (see this recent survey [1]). I would have liked to see more coverage of recent anomaly detection methods in the paper. Also, for general multiview data, the Gaussian likelihood is too restrictive since we might expect binary, count or positive real data in some of the views. [1] Fanaee-T, Hadi, and João Gama. "Tensor-based anomaly detection: An interdisciplinary survey." Knowledge-Based Systems 98 (2016): 130-147.

2-Confident (read it all; understood it all reasonably well)