NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1837
Title:Optimal Decision Tree with Noisy Outcomes

Reviewer 1

The setup is original and I see high value in the persistent-noise assumption worked out by the authors. I do have one main question to the authors and while I recommend this paper to be accepted based on significance and appearance of correctness, I do expect a very strong answer on this point for the score to remain high after rebuttal phase. The authors state in their experiment: "To ensure every pair of chemicals can be distinguished, we removed the chemicals that are not identifiable from each other." Well, for significance of the present work, we also need to know how the algorithms are going to behave in the worst-case if there are symmetries and this kind of preprocessing step is omitted. Note that the user would be happy with being presented a set of hypotheses and a certificate that no further test is available to distinguish among them. But I do want this certificate, I do want the analysis to be able to give me the expected time to get it, and I do want the experimental results in the paper extended to the case without the preprocessing that reduce solution sets to singletons by altering the real data set (which makes it a synthetic data set optimized to validating the proposal). UPDATE: Author response read. Based on good responses, score of this reviewer to remain as Clear accept.

Reviewer 2

This paper considers challenging settings for ODT problems with noisy outcomes. The authors consider several new settings such as unknown number of hypotheses and persistent noisy models. The authors design new adaptive and non-adaptive algorithms to achieve (nearly) best possible approximation ratio. I have to say that I don't work on approximate algorithms. The results look interesting to me, but I am not at a good position to judge the novelty.

Reviewer 3

I comment here on the long version of the paper, which is actually easier to read than the short version. Title: it oversells the scope of the results, since the only type of "noisy" outcomes * handled thoroughly in the paper are erasures with equal probability to be +/-. Section 3: - the proof of Lemma 3.4 shows that G_E(e*) > S/4, but the statement of of Lemma 3.4 claims that G_E(e*) > S/2. - the proof of Lemma 3.5 states that y \noin \cup T_{b,x}(e) f_{b,x}(E): I guess that f_{b,x}(E) should not be there. Section 4: - The equivalence between both algorithms is interesting, but it basically boils down in showing that the way the * entries are treated is the same for both instances. It would be good to clearly identify the alternative algorithm ODTN_h in Section 4.1 (only ODTN_r is actually introduced in Section 4). Section 5: - in Proposition 2, how do you get that the number of hypotheses in X' is at most 2Cm^\alpha and not 4Cm^\alpha, since |T^{+}(e) \cup T^{-}(e)| \leq 2 \max{ |T^{+}(e)|, |T^{-}(e)| } \leq 2Cm^\alpha ? REVISION: Comment on Section 7 deleted after authors' feedback, which clarified my question, indeed r = 245 and h = 45 are correct. Other comments remain unchanged.

Reviewer 4

The paper is well-written, carefully done, and a good contribution to the area. See answers to question 1. UPDATE: Read author feedback, which did a good job of answering the questions/concerns raised by the other reviewers. I'm still scoring this paper as a clear accept.