Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper proposes a sparse Gaussian MRF approximation for collaborative filtering applications. The proposed approach builds on a previous approach known as Besag’s pseudo-likelihood method for estimating a MRF under the auto-normal parameterization. The proposed sparse approximation model, and learning algorithm, appear to be novel. The learning algorithm is logically complex, although described reasonably well. However, given the complexity of the learning algorithm, it would be helpful for the authors to provide a high-level summary of the learning algorithm, perhaps via pseudocode. The experimental results shown in Table 1 and 2 are reasonably convincing, particularly regarding the training time speeds compared to the best performing baseline model. However, the authors should describe exactly why their proposed approach is orders of magnitude faster than the MULT-VAE baseline, perhaps by discussing the computational complexity of training MULT-VAE compared to the proposed approach. Additionally, it would be helpful for the authors to provide some explanation of why their proposed sparse MRF, which is a full-rank shallow model, is able to match or significantly exceed the performance of MULT-VAE, which is a deep nonlinear model. Furthermore, is the comparison with the MULT-VAE model with 3 hidden layers fair, or is it possible to further improve the predictive quality of MULT-VAE by adding additional hidden layers? Given the impressive training time performance and predictive quality, it seems likely that the proposed sparse MRF model could form the basis for future work that builds on this approach.
[UPDATE] Given the authors' response (which I find instructive and promising for the final version), I raise my score to a clear accept. The paper presents a novel method for recommendation with collaborative filtering based on Markov Random Fields (MRF). Starting from a general approach that regresses the full graph of items, the paper shows that a valid approximation can be obtained by proceeding with subgraphs that represent Markov blankets of an initial set of items. This approach yields significant computing gains, while yielding better recommendation performance compared to the state-of-the-art represented here by variational auto-encoders. ** Originality ** The paper presents original work on the topic with a new approach which generalises well to various datasets. As a general comment, I am wondering whether taking into account the popularity bias makes sense in the approach and if the authors thought about it. ** Quality ** I find this work to be of good quality overall. The claims are well supported by theoretical analysis. The experimental results are well documented and use well-known and appropriate datasets. Since the paper presents a new algorithm, I hope that the code will be submitted if the paper is accepted. It would have been nice to see the code during review, although the description of the algorithm in section 3.2 is clear and detailed. ** Significance ** The results are convincing in terms of performance and computing gains. There are benefits both in terms of recommendation performance and computing time. The state-of-the-art used as a benchmark is valid (variational auto encoders). The choice if 1,000 as a parameter (line 141) seems a bit arbitrary. It might be useful to explain how this parameter was chosen and if it impacts the results significantly. ** Clarity ** The method is presented in a clear and concise way. The authors acknowledge the fact that there is no proof of convergence, but illustrate the method in details on several reference datasets. The paper is well organised and clearly written. For the sake of clarity, it might be worth explaining how X is built early in section 2.2 as opposed to mentioning it only in section 5. Typo in reference  in the list of authors ("and" appears twice)
The paper is interesting and it is well-written. The methodology, algorithms, and results are easy to read and clear. My only concern is that this paper lack a solid theory. I believe that rather than yet another new algorithm that seems to work well, the community is mature enough for deeper theoretical results which are absent from this paper.