The paper provides data-dependent guarantees for learning discrete distributions, possibly with infinite support. Some of the steps in the argument appear in previous works (in particular the presence if the 1/2 norm), but the results are novel and I think there are some other technical tidbits that may be of interest to the community. I recommend acceptance.