NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper provides an outlier measure, along with an algorithm to estimate it, which handles heterogeneous data sets with attributes of different nature. This measure is based on information-theoretic intuitions of how difficult it is to identify or characterize the point, which can be expressed in terms of the sparsity of the containing sub-cube. Compared to previous work (such as Isolation Forest, and subsequent works), this is a richer notion that goes significantly beyond previous notions of "low density" regions for outlier detection, and also has benefits around interpretability. The reviewers were unanimous in their vote to accept. Authors are encouraged to revise with respect to reviewer comments.