Part of Advances in Neural Information Processing Systems 25 (NIPS 2012)
Jayadev Acharya, Hirakendu Das, Alon Orlitsky
The minimax KL-divergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. In online es- timation and learning, it is the lowest expected log-loss regret when guessing a sequence of random values generated by a distribution in P. In hypothesis testing, it upper bounds the largest number of distinguishable distributions in P. Motivated by problems ranging from population estimation to text classiﬁcation and speech recognition, several machine-learning and information-theory researchers have recently considered label-invariant observations and properties induced by i.i.d. distributions. A sufﬁcient statistic for all these properties is the data’s proﬁle, the multiset of the number of times each data element appears. Improving on a sequence of previous works, we show that the redun- dancy of the collection of distributions induced over proﬁles by length-n i.i.d. sequences is between 0.3 · n1/3 and n1/3 log2 n, in particular, establishing its exact growth power.