This paper received overall good reviews and is considered novel and of interest. In terms of technical contribution it seems the improvement over previous work ([2]) is somewhat incremental. Another issue that was raised is relevance to the audience. The authors should better explain and justify the connection between their work and the current research performed in ML. Also, perhaps discussing relevant literature in ML on learning algorithms that work over lossless compressed data and how the aforementioned lower bound relates to existing techniques. (see for example, Paskov et al. "Compressive feature learning." 13 and later works)