This paper presents a simple alternative to the Gumbel-Softmax based on Gaussians and invertible transformations to the hypersimplex. As one reviewer noted, "the proposed approach is simple, has nice properties, and extensible". Many reviewers criticized the lack of experiments on non-linear models in the main text. Some reviewers felt that the clarity of the draft could be improved, in particular the motivation. This was a borderline paper, however I would like to recommend acceptance. I found this paper to be well-written. The experiments are well-considered, careful, and show a clear benefit of the method in many settings. I agree with the reviewers that the results on non-linear models should be moved into the main text from the supplementary, and I agree more motivation for the specific choices made would be useful. My recommendation is ultimately compelled by a very important point that is demonstrated in this paper and largely ignored in the literature on relaxed gradient estimators: the Gumbel-Softmax is only special because its discrete limit is the Gibbs distribution, but this doesn't mean it's the most useful relaxation in practical deep learning applications. Please address the clarity concerns and incorporate the non-linear model results in the main text.