Review for NeurIPS paper: Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web

NeurIPS 2020

Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web

Review 1

Summary and Contributions: In this paper the authors propose a new version of a learned Bloom filter focusing on the fact that rather than treating the learned model as a binary classifier, we can use the predicted probabilities to tune different number of hash functions and memory allocation. This is a clever insight that leads to notable improvements in FPR vs memory trade-off.

Strengths: 1. Interesting application of ML to a traditionally algorithmic application. 2. Valuable insight into how to think about the intersection of the information provided by an ML prediction and how it can inform algorithmic design.

Weaknesses: 1. Heuristic approach for setting boundaries, etc. 2. Writing could be improved for better clarity.

Correctness: Yes

Clarity: Writing could be made clearer.

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: As discussed above, I think the insight is great but the algorithm for how to make use of that insight is fairly heuristic and the writing could be improved to give clearer, simpler intuition for the approach. That said, I think this is a clear step forward in our understanding and worth publishing. ----- Already liked the paper, but thanks for the welcoming response.

Review 2

Summary and Contributions: The paper proposed a modification of the Learned Bloom filter that can utilise the output of a score-based membership classifier in a more sophisticated manner by splitting the score axis into several distinct regions each supplied with its own hash table. This allows under certain conditions to lower the false-positive rate compared to LBF.

Strengths: AdaBF is a novel and interesting model that makes fuller use of the learned membership classifier and may find interesting applications.

Weaknesses: Just as LBF, AdaBF operates in the static regime of fixed and known keys and queries distributions. This significantly limits its applicability (or rather the space of applications where its strengths will be prominent) compared to e.g. Neural Bloom filter.

Correctness: I did skim through the proof of Theorems 1 and 2 and did not spot a mistake. A small comment is that in the beginning of proof of T1, authors probably meant a probability of a _non-_key.

Clarity: Overall, the paper is written clearly and both AdaBF and its disjoint variant are introduced in an accessible way.  I have a major comment however, regarding positioning of the proposed method. I think the paper is missing a clear distinction between the classical Bloom filter, Learned Bloom filter and Neural Bloom filter. While at the query time, these three models have the same interface, the process of constructing the filter is very different in the assumptions being made. Thus, LBF assumes a static distribution of keys which motivates the idea of employing a machine-learning model. NBF makes the assumption on the meta-distribution of key and query sets which is arguably a more widely applicable setting. While generally I agree with arguments made by authors in lines 46-70, it will be helpful to explicitly position the proposed method within these different settings to highlight not only positive contributions but also potential lack of features. If I understand it right, this work considers the setting of LBF and hence can only be applied in the presence of static query distribution. Then it looks to me that the “Strong dependency on Generalization” critique is fully applicable to AdaBF too. Minor comments: 88: I don’t think h_i(x) = 1 is what authors intended to write. 108: zero hash function or functionS? 154: may leadS

Relation to Prior Work: See my comment in the Clarity section regarding BF, LBF and NBF. I think NBF should also be included in the comparison or its absence should be clearly motivated.

Reproducibility: Yes

Additional Feedback: If wonder if authors could comment on the idea of using a more sophisticated machine-learning model that can, in some sense, separate keys from non-keys but not in one-dimensional way. For example, is it interesting to consider a high-dimensional feature space learned for example by VAE to define the group division? Or a more structured clustering algorithm? Post-rebuttal comments: I appreciate authors feedback and raise my score hoping that the differences in the AdaBF and NBF will be clarified in the final version of the paper. I still struggle to understand, though, the argument made about the smaller model size. If it is planned to be used in the paper, its further clarification will be appreciated.

Review 3

Summary and Contributions: The paper proposes two algorithms Ada-BF and disjoint Ada-BF and their experiments with two datasets show that their algorithms perform better than the original Bloom filter, learned Bloom filter (LBF) [Kra+18] and sandwiched learned Bloom filter (SLBF) [Mit18]. The paper also provides a theoretical analysis and a hyper-parameter tuning method for each algorithm.

Strengths: The main strength of the paper is the idea is novel. LBF and SLBF use learned predictor to reduce the number keys going into Bloom filter (when prediction score is greater than a threshold). Their idea is to extend the binary classification to multiple groups based on the prediction scores. Ada-BF adjusts the number of hash functions for each group's Bloom filter, whereas disjoint Ada-BF uses different bit vectors of different sizes for each group's Bloom filter. I think the key idea to use multiple groups is novel and worth publishing.

Weaknesses: There are several weaknesses. First, their experimental results show that disjoint Ada-BF is about the same as Ada-BF with the malicious URL dataset and a bit worse than Ada-BF with virus scan dataset. Why would anyone consider this approach? Second, the datasets used in the experiments are similar. It is not clear under what situations, the approaches are not as good. Third, the paper does not evaluate latency or throughput. A main concern for all learned Bloom filter approaches are computational overheads. The proposed methods have higher overhead than LBF or SLBF. Readers want to know about the additional overheads. Fourth, the paper does not mention if they will put their implementation and datasets publicly accessible so another group can reproduce their results.

Correctness: See weaknesses 2 and 3 above.

Clarity: The paper is mostly clearly written. There are some small English gramma issues.

Relation to Prior Work: The paper should say more about how to compare with [RBL19]. It seems better than LBF and SLBF in terms of space saving as well as how to train the classifiers.

Reproducibility: No

Additional Feedback: I suggest the authors evaluate their approaches with datasets with different characteristics, to show under what condition their approaches are better or worse than other approaches.