NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:9259
Title:PIDForest: Anomaly Detection via Partial Identification


		
This paper provides an outlier measure, along with an algorithm to estimate it, which handles heterogeneous data sets with attributes of different nature. This measure is based on information-theoretic intuitions of how difficult it is to identify or characterize the point, which can be expressed in terms of the sparsity of the containing sub-cube. Compared to previous work (such as Isolation Forest, and subsequent works), this is a richer notion that goes significantly beyond previous notions of "low density" regions for outlier detection, and also has benefits around interpretability.  The reviewers were unanimous in their vote to accept. Authors are encouraged to revise with respect to reviewer comments.