NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
After reading the author feedback, my review stays the same and in particular, I still think this is a good submission. Thanks to the authors for the thorough response! I realized that there was an additional point of discussion I didn't mention in my original review that could be helpful to discuss: you benchmark against standard Euclidean embeddings, but what about hyperbolic embeddings? Hyperbolic embeddings could encode trees and I suspect that they can handle hierarchical clustering structure like what is analyzed in Theorem 4.6. This could possibly be an interesting direction of future research. * * * * * Original review below * * * * * I found the paper very clear and easy to follow. The key ideas for limiting the number of queries (symmetry in Q(i,j) and Q(j,i), triangle inequality, concentration inequalities) are quite straightforward, and at some level much of the paper reads off as doing diligent bookkeeping with the lower and upper bounds in updating the active sets as to keep the query cost low while maintaining accuracy. My comments are minor: 1. Perhaps some discussion on whether the theory could be extended to recovering k-NN graphs would be helpful, along with what the technical hurdles are in doing so. It seems like we could use a best-k-arm strategy per node. 2. In Algorithm 1's "Require" line, there is a reference to Algorithm 2 that should probably say something like "SETri (Alg. 2)" or "SETri (Algorithm 2)" rather than just "SETri, 2". There's a similar issue in Algorithm 2's "Require" line in referencing Algorithm 1.
Reviewer 2
---------- I read the author response and am satisfied with their promise to fix the minor issues and provide a more elaborate related work. I think the Kliendesener and Luxburg (JMLR 2017) works with noisy triplets to build kNN graph. I do not feel the need to change my review score ---- The algorithm though yields good results is relatively straight-forward involving standard probability concentration bounds and using triangular inequality to the full extent. I believe the method is technically correct and the overall quality of the presentation is good. There are many minor typos and grammatical mistakes (such as equation (5), repetitions on line 181 and 183, etc.) I think the second item (Line 79) is not stated correctly, but it does not affect the algorithm or its subsequent analysis. The notation of the paper can be (and should be) simplified, it is overly technical and notations are introduced on the go within the statements (such as in Lemma 3.1 with =:). My main reservations on the quality and significance of work are its limitation to the metric case, the number of query analysis of the simplified algorithm only, the very large space complexity. In the experimental section, the distance measure defined on the real dataset is not a metric (as acknowledged by the authors), but its effect on the quality of results is not discussed. This is particularly important since triangle inequality is the fundamental building block of the algorithm. The related work section is very poorly written; very closely related work must be elaborated on. The results should be compared with techniques that build nearest neighbor graphs based on triplets (directly) without ordinal embedding. The reproducibility checklist shows that a link to the code is provided, while in the paper the author(s) state(s) that it will be provided upon publication.
Reviewer 3
The paper is clean. As a non theory person I was able to follow the motivation, the problem set-up and the main result. The intuition that not all queries Q(j,k) needs to be issued conditioned on the past queries due to the fact that we're in a metric space. The bounds looks fine, mostly it is a construction of the confidence in the estimation of Q(j,k), again, under the constraint that Q(j,k) is a metric space with metric-ish properties. As a piece of technical achievement this paper is just fine. But it can improve in the following sense of story-telling and organisation: 1) we're doing an optimal query problem, namely, querying a noisey oracle Q(j,k) to construct a nearest-neighbor graph G(x_i,x_j). Given this is the setting, I was hoping to see some kind of quantification over the entropy on all the possible nn-graph G, and how the next-best-query you're querying is maximally reducing that entropy (i.e. with this new query, I gain the most information on distinguishing valid nnGraphs from incompatible graphs) 2) It is unclear to me then, which criteria you are using to select the next query Q(i,j) to make. Are you selecting a query that maximally reduce the entropy on the space of possible nearest-neighbor graphs? Or you're selecting the query that maximally reduce the entropy on the pair-wise distance metric Q(i,j) itself? These two are different objects and you'd expect the querying scheme to be different. No doubt this distinction is somewhere in the paper, but it would be good to have it in english form, stated up-front, so a more "casual" reader, tasked with implementing this algorithm could follow. I was hoping to see a sentence like "We're try to maximally reduce the uncertainty on the space of all possible nn graphs. thus, based on our past K observations Q(j,k)_1 \dots Q(j,k)_K, we compute for all ?? the confidence bound of ?? and query the most ?? pair from the oracle, maximally reducing the uncertainty". 3) it might be good to have some notion of sub-modularity argument. From what I can read this paper uses a bandit-like approach, which is in a sense greedy, picking the most ?? query at each step instead of planning ahead a sequence of K queries that, maybe themselves do not lead to good information gain, but in conjunction leads to huge information gain. Greedy solutions are just fine if your problem is sub-modular, in this case, the entropy gain over the space of nearest-neighbor graphs is sub-modular with respect to the set of queries you are issuing, it might be good to prove this, and then you can easily justify your greedy strategy as compatible to optimal. So rating is as follows: Originality: fair. It appears to me that there has not been a paper previously that only assume general metric and noisy reads in the context of inferring nn-graph Quality: good. Math is clean, theorem is well-written, evaluation is well constructed (although it doesn't really add much given this is a theory paper anyways) Clarity: poor. No idea what the selection criteria is, and no intuition on why this particular selection criteria is optimal (or greedy-optimal) Significance: fair. Pretty clean problem, I'm sure someone would find use of it later down the line.