__ Summary and Contributions__: CVaR (Conditional Value at Risk or Expected Shortfall) of a random variable
is the expected value of this random variable conditionally on the fact that this
random variable exceeds a given quantile (the quantile is the Value at Risk). CVaR has
gained popularity in quantitative risk management as CVaR is a coherent risk measure (CRM)
while VaR is not. In recent years, Machine Learners have explored using CVaR as an alternative
to average loss. This motivates minimizing an Empirical CVaR as a proxy for minimizing
CVaR. Justifying this version of Empirical Risk Minimization requires
controlling the fluctuations of the Empirical CVaR around CVaR. The paper develops
PAC-Bayesian bounds for CVaR (Theorem 1): it provides a high probability upper bound
on the CVaR of hypotheses picked according to a posterior distribution
in terms of empirical CVaR of this posterior and of the relative entropy between the posterior
and a fixed prior (this has the form of other PAC-Bayesian bounds). The proof
(which is non-trivial in its current form) relies on several twists. The first one is a
variational formulation of Coherent Risk Measures. Then the authors introduce
a sequence of intermediate random variables that can be compared with empirical CVaR
and that are amenable to stochastic analysis. It is not clear to me whether
this approach can be simplified and made more transparent, but this
manuscript represents a substantial body of work.

__ Strengths__: - Explore an interesting risk criterion
- Non trivial concentration for empirical risk
- Nice flops between variational characterizations

__ Weaknesses__: - Generalization bounds are not as interesting as excess risk bounds
- Few comments on computational hardness of the CVaR minimization problem
(do we need to consider surrogate risks)
- More perspective on the proof techniques could help

__ Correctness__: So far so good.

__ Clarity__: Writing could be improved in two respects: build stronger motivation for investigating CVaR; make the derivation of the PAC-Bayesian bound more transparent. Bayesian (even PAC-Bayesian) are not always edible. Make readers life easier.

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper introduces a PAC-Bayesian generalization bound for the Conditional Value at Risk, CVaR, performance measure. On the way, they establish a new concentration inequality for the CVaR.

__ Strengths__: Strengths:
- Solid technical paper focusing on a risk measure that is relevant for real-life scenario, in the PAC-Bayesian framework.
- A smart solution to get the bound, using an auxiliary estimator of the CVaR, properties of Coherent Risk Measures, and the folk Donsker-Varadhan Formula.
- A concentration bound for the CVaR, which has some value on its own.

__ Weaknesses__: Weaknesses: generally speaking, a broader perspective is lacking
- Starting with the concentration bound of section 5, isn't it possible to get a wider range of generalization bounds, i.e bounds from other frameworks other than the PAC-Bayes framework such as, eg, Rademacher bounds? This part is not discussed.
- Also, in the same vein: concentration inequalities are essential to derive generalization bounds and it's even "straightforward" to get PAC Bayes bounds from concentration inequalities. It is not discussed how this route could have been taken for the paper: 1) introduce the concentration result and 2) give the PAC Bayes bound.
- Coherent risk measures are introduced and a bit discussed, but it is not clear how the results provided here cannot carry over to those CRMs in general, the result for CVaR being a specific case.

__ Correctness__: From a mathematical point of view: everything is correct.
No empirical results provided: the authors could have given a hint on how practical the provided results could become.

__ Clarity__: The paper is clearly written. Again, my main feedback is about the perspective: a broader perspective could have implied a totally different write-up with yet more clarity -- and/but that would be a significantly different paper.

__ Relation to Prior Work__: A clear statement is made as how the present work differs from previous work: proof techniques, tightness of the result, possible unboundedness of the risk considered.

__ Reproducibility__: Yes

__ Additional Feedback__: - a(n) hypothesis
- Line 294: should be Lemma 13 instead of Lemma 12.

__ Summary and Contributions__: This paper proposes a general framework for machine learning applications based on the minimization of an empirical estimate of the conditional value at risk (CVaR). The main goal of this article is to provide theoretical guarantees for the excess of risk. Although this topic has been extensively studied in the literature for expected loss, the case of the CVaR is not as well understood. The authors start by recalling existing guarantees for the CVaR. Then they present their PAC-Bayesian generalization bound for the CVaR, which is tight in the parameter alpha, contrary to previous analysis. Finally, they describe the proof, which relies on a reduction to the expected loss for an auxiliary random variable Y.

__ Strengths__: Strong theoretical contribution. May be used for risk-aware methods based on the conditional value at risk.

__ Weaknesses__: Illustrations, both experimental and theoretical, are missing. For instance, what is the optimal classifier in binary classification with 0-1 loss in the CVaR case?

__ Correctness__: Yes. The ERM with CVaR framework is correctly defined, and then analysed.

__ Clarity__: The paper is very well written.

__ Relation to Prior Work__: Yes, related works are discussed and existing results are recalled.

__ Reproducibility__: Yes

__ Additional Feedback__: Question 1: What is the optimal classifier in binary classification with the CVaR of the 0-1 loss? Is it a variant of the Bayes classifier with the auxiliary random variable Y (from the reduction to expectation)? --> answered in the rebuttal: it is the “new” SVM
Question 2: More practically, could this reduction to expectation trick be used for optimization purpose, by e.g. using the Stochastic Gradient Descent algorithm? --> this reduction is for now a theoretical trick: not straightforward to leverage it for optimization purpose

__ Summary and Contributions__: This paper provides statistical insights in learning with Conditional Value at Risk (CVaR). It provides concentration inequalities for CVaR when the loss function is bounded or unbounded (resulting in a sub-exponential random variable). In the bounded case, the authors derive the main result of the paper, which is a sharp PAC-Bayesion generalization bound for learning with CVaR.

__ Strengths__: The contribution of this manuscript is theoretical and definitely relevant for the NeurIPS community. As far as I know, the results are novel (sharper than previous ones) and sounds.
The paper is very well written and presents pedigagically a technical work (I appreciate this initiative greatly and I thank the authors for it). This is a valuable work.

__ Weaknesses__: As far as I am concerned, the only flaws of the paper are three typos:
- Line 23: "uniform converges arguments" may mean "uniforme convergence arguments".
- Line 254: exp is missing in the lhs.
- Line 303: "Appendix E" refers actually to Appendix B (LaTeX compilation bug).

__ Correctness__: Everything seems correct.

__ Clarity__: The paper is very clear.

__ Relation to Prior Work__: Relation to prior work is adequately addressed.

__ Reproducibility__: Yes

__ Additional Feedback__: I thank the authors for addressing my comments in their rebuttal.