The paper addresses an important problem in ML with EMR data: how to make the most of this richly structured dataset, where the actual labels of interest might be sparse. The proposed approach is elegant and well-executed, building on the strengths of EMR data in a way that I believe will be an inspiration for much future work. While the experiments are satisfactory, reviewers suggested several important new experiments and experimental evaluations, and I trust the authors will include these (some already present in the authors' response) in the revised version of the paper.