Paper ID: | 3637 |
---|---|

Title: | Learning Nonsymmetric Determinantal Point Processes |

The developments presented here for dealing with asymmetric DPPs make this class of model more useful when positive correlations need to be modeled in the sampled sets, making the models more powerful than traditional symmetric DPPs. The model, theory and experiments are clearly explained. Experimental results show the possible improvements provided by the nonsymmetric DPPs which can model positive correlations as well as their ability to learn negative correlations as is the case in traditional symmetric DPPs. Experiments on real data further validate the advantage of nonsymmetric DPPs in both MPR and AUC measures for recommended sets.

As mentioned in the previous section, this work proposes a solution to an important and interesting problem and the empirical results are convincing enough to establish this approach over the symmetric DPP approach for the tasks and datasets described in the paper. One major thing missing is the lack of other baselines apart from symmetric DPPs. What about simpler models for set selection like tasks which are less computationally intensive; these would establish the necessity for DPPs for the selected task. More egregiously, how does this approach compare to the one on learning signed DPPs (Brunel 2017), which has been cited but not compared against. The final method in the paper looks different enough to merit a comparison.

This paper studies determinantal point processes (DPP) with non-symmetric kernels. Most of the machine learning literature on DPP assumes symmetric kernels, and the prior work that studies non-symmetric kernels have assumed a quite restricted class of non-symmetric kernels. The novelty of this paper is in proposing the learning algorithm for a fairly general class of non-symmetric kernels. The proposed approach assumes a particular representation of non-symmetric kernels. This representation follows from two known results in a rather straightforward manner, as I also summarize in "1. Conclusion". Once this representation is given, the algorithm follows essentially in the same way as Gartrell+ 2017. Nevertheless, the proposed representation is new in the particular context of learning the kernels for DPP. The proposed approach can fail when the expression in logdet() of the first term of the right-hand side of Eq. (12) is singular, which can be the case when D < M (low rank). The proposed representation assumes low rank, but the property of low rank is not quite exploited in the proposed algorithm. The low rank representation can obviously reduce the space complexity, as is claimed in the paper, but is this relevant in practice? The time complexity seems to be the bottleneck in practice. Given these observations, the low rank representation only creates a pitfall (singularity in (12)) without much benefit (time complexity not improved). --- I thank the authors for the clarification regarding the subclass of P0-matrics. I now find that the proposed representation is not obvious, and this representation itself is the major contribution of the paper. Regarding the time complexity, I see that the low-rank representation reduces the time for matrix multiplication, but this is not the bottleneck. The bottleneck remains O(M^3) time. However, the time complexity is a secondary issue and should not be a reason to reject this paper. The only remaining major issue is the possible singularity of the first term of Eq. 12. In the response, the authors suggest a heuristic to deal with singularity by adding "epsilon I". This additional term is essential and must be stated in the paper. In my opinion, it should be added into Eq. 12, because otherwise the proposed algorithm can fail. Overall I like the paper despite the deficiency.