Reviews: Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration

The paper completes the picture of post-training calibration by proposing Dirichlet calibration as a natural generalization of Beta calibration to the multi-class setting, and showing the connection between it and matrix scaling in the context of neural net models. The comprehensive experiments with both deep neural nets and non-neural models comparing a variety of post-training calibration techniques are also a strong point of the paper and was appreciated by all reviewers. On the negative side, the results are mixed with performance differences between the new techniques and other approaches being rather small. The authors should incorporate the reviewers' comments (R4 gave very detailed and thoughtful post-rebuttal comments). In particular the authors should: - cite older calibration work from statistics (see R4 comments for references). - experiment with vector scaling for non-neural methods. - analyze the whether the ODIR is so strong that the matrix scaling reverts to vector scaling, which would explain the marginal improvements over vector scaling - maybe discuss in more detail the use of other exponential families in place of Dirichlet.

Paper ID:	6658
Title:	Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration