__ Summary and Contributions__: In this paper, the authors consider the Generalized Min-Sum-Set-Cover (GMSSC) problem under an online learning setting. Due to the NP-hardness of the problem, the current non-regret methods are computationally inefficient. The main contributions are 1. The authors prove that there exists an algorithm for online GMSSC in polynomial-time meanwhile obtains a low regret. 2. A key idea of the algorithm is to relax the original problem (in the space of permutation) into the space of doubly stochastic matrices. Thus a polynomial-time algorithm exists.

__ Strengths__: 1. The authors consider the GMSSC problem under an online learning setting, which is new and theoretically interesting.
2. The theoretical contributions to this new problem are solid.
3. The proposed method can run in polynomial-time with a low regret guarantee. The idea of relaxing the original problem into the space of doubly stochastic matrices is technically interesting.

__ Weaknesses__: 1. The designed algorithm may not be empirically efficient. At each iteration, the key step of Line 5 of Algorithm 1 needs to solve a convex problem based on the ellipsoid method, which is impractical especially when n goes large. The problem of online learning setting is motivated from the real-world application where each iteration (a small response time) needs to be very efficient. For example, in the application of web search, the parameter n is usually very large or maybe larger than T in some cases?
2. It would be more interesting if there is any simulation study and check how these regrets decrease as the time T goes large.

__ Correctness__: Claims and methods are correct. The empirical methodology is also correct.

__ Clarity__: The presentation is clear. The paper is well written.

__ Relation to Prior Work__: Yes. The authors clearly discussed the related work in the Introduction section.

__ Reproducibility__: No

__ Additional Feedback__: In general, the authors propose new methods for an important problem under the online learning setting. The theoretical contributions are solid. The presentation is clear.
However, my only concern is from a practical point view. I would argue that the designed algorithm may not be empirically efficient. At each iteration, the key step of Line 5 of Algorithm 1 needs to solve a convex problem based on the ellipsoid method, which is impractical especially when n goes large.
Q1. I was wondering if it is possible to reduce the approximation complexity \mathcal{O}(n^5.5)} due to the norm of approximated subgradients. Could the authors discuss a little bit more about this?
One typo:
Typos: Decent -> Descent

__ Summary and Contributions__: The paper presents new algorithms utilizing online projected gradient descent(OPGD) that solves generalized min-set sum cover (GMSSC) problem in polynomial time with approximately 28-regret and solves min-set
sum cover (MSCC) problem in polynomial time with approximately 11-regret. Compared with previous result that focuses on offline problems, this paper focuses on online version and shows even in online set up, low-regret solution can still be obtained in polynomial time.
--- Update based on the Author Response
The author addressed some of my concerns, but my overall assessment of the paper remains the same.

__ Strengths__: The novelty lies on the use of OPGD and how this can be translated into a permutation of items for online requests which yields better solution (in the sense of lower regret) than previous results. The proof of the theoretical result is rigorous as well, ranging from the main result to showing some key steps can be computed in polynomial time.

__ Weaknesses__: Some of the results in the paper can be viewed as minor improvement/modification over the result from ``Preemptive and non-preemptive generalized min sum set cover'' written by Im S, Sviridenko M, Zwaan R (2013) and ``A note on the generalized min-sum set cover problem'' written by Martin Skutella and David P.
Williamson(2011). From this perspective, there is indeed novelty, but the novelty is not significant enough to be viewed as strong submission.

__ Correctness__: Yes.

__ Clarity__: There is much room for improvement in writing. Several typos can be found in the proof, for example in the proof of Lemma 7, there are several summations missing; in the proof of Lemma 8, there is e ∈ S which should be e ∈ R, in the proof of Lemma 9, the last inequality shall follow Lemma 7 instead of Lemma 8.
There are also typos in the statement of Lemma 5 and in other places in the proof. For audience not familiar with this field, this will lead to confusion in reading.
Besides, more explanation can be given instead of purely stating ”as a result” or ”from manipulation of definition”.

__ Relation to Prior Work__: Yes, the paper gives the result of previous literature and show how it differs from previous ones.

__ Reproducibility__: Yes

__ Additional Feedback__: The writing can be improved. Besides, it might be a good idea to discuss the coefficients (5.03 in algorithm 3 and 1.67 in Algorithm 4).
Also, the authors obtained the coefficient in algorithm 4 through optimizing4 z/(1−2e^{-z}). However, for different K(R), there might exist
a different optimal coefficient. A natural question is whether a general principle exists. In addition, I will recommend cleaning up the writing not only to correct for the typos but also to explain the proof in more details.

__ Summary and Contributions__: The authors consider online preference aggregation - wherein a set of preferred items $R_1, ..R_t$ is given in each round, along with the demand for any $k_t$ items within the set $R_t$. The learner maintains a ranking $\pi_t$ of items. The objective is to minimize the rank at which the $k_t$th item from $R_t$ appears in the ranking $\pi_t$. This problem is the online version of the generalized min sum set cover (GMSSC) problem, which as a special case subsumes the well-studied min sum set cover (MSSC) problem. The MSSC is NP-hard and inapproximable beyond a factor of 4. To develop a no-regret online learning algorithm, the authors learn over doubly stochastic matrices (instead of the permutahedron) using the MWU algorithm and consider a relaxation for the GMSSC that approximates the actual objective with a constant factor, thereby obtaining a polynomial time algorithm.

__ Strengths__: The GMSSC problem is interesting in that it goes beyond learning over rankings, and applies broadly to many preference aggregation settings. The minimum linear ordering problem is challenging, and MSSC is a special case of this. Learning over doubly stochastic matrices instead of the convex hull of permutations is quite standard, however, the change of the objective to get a convex relaxation to the GMSSC objective is interesting.
--------
-------
EDIT: I thank the authors for their response.

__ Weaknesses__: 1. In the definition of the fractional access cost, the authors assume a fixed accuracy parameter, \epsilon, and ignore dependence of FAC_R(A) on \epsilon. What is the dependence on \epsilon?
2. Writing in unclear. What is the fractional access cost linear program? Why is it exponential in number of constraints? Is this the only possible formulation? Where does the n^4/\epsilon term come from? Why is there a 5.03/\alpha in the conversion of doubly stochastic matrices to probability distribution over permutations? Why doesn't any conversion that respects the marginals suffice? I suspect claim 1 in the appendix is crucial to understanding why $n^4/\epsilon$ is the right scaling.
3. What do the algorithms presented here give for the special case of $|R_t|=1$, i.e. the setting in Seuhiro et a. [39]?
4. Can the relaxation of the GMSSC objective over the doubly stochastic matrices improve some known approximation bounds for the minimum linear ordering problem or its special cases, i.e. what is the implication on offline optimization?

__ Correctness__: The key claims of the paper look correct indeed, however I am not completely sure about the constants.

__ Clarity__: The linear programs, discussions, notation, rationale for choice of parameters - could have been written much more transparently and clearly.

__ Relation to Prior Work__: Yes, related work is thorough.

__ Reproducibility__: Yes

__ Additional Feedback__: Some more explanations of why MSSC results would not improve on the constants would be good, they are currently mentioned fleetingly in footnotes and do not have enough detail.

__ Summary and Contributions__: The paper studies a variant of online ranking problem. In the offline settings preferred items belong to different groups, and one need to generate a sequence of items so that qualitatively speaking for each group R_t of items at least $k_t$ elements appear high in the list. More formally they formulate the problem as an online version of the known "Generalized Min-Sum Set Cover" (GMSSC) task: In this problem, given a set $U = {1,...,n}$ of $n$ items, in each step
1. The learner selects a permutation $\pi_t$ items in $U$
2. The adversary selects a request $R_t \subset U$ with demand $k_t$.
3. The learner incurs a cost equal to the access cost of $R_t$ with respect to $\pi_t$ which is defined as smallest index $i$, so that at least $k_t$ elements of $\R_t$ appear in the first $i$ elements of $\pi$.
The goal is to minimize the multiplicative regret, that is the ratio of the total cost to the cost of selecting a fixed optimal permutation $\pi*$ in all steps. a strategy has $alpha$-regret, if its cost is at most alpha times the optimum.
The problem has been mostly studied in the offline setting before, and there exists an $O(1)$-approximation algorithm for that. The only previous result for the online settings is given by [1], which shows a $1$-regret (just an additive regret) strategy exists, however, that strategy can not be implemented in a polynomial time. It is largely because the the exponentially large action space of the problem (n! permutations). In general such a result can not be obtained as even in the offline version it is NP-hard to get a better than $4$-approximation algorithm. It is largely because the the exponentially large action space of the problem (n! permutations). They first map the permutation space into the space of doubly stochastic matrices, thus offering a continuous relaxation of the problem. Then, they can achieve a $1$-regret algorithm for the continuous problem using fairly standard techniques. The most technical part of the paper is to project the continuous solution back into the permutation space. They give O(1)-approximation rounding scheme which leads to a O(1)-regret algorithm for the original problem.
1. Dimitris Fotakis, Loukas Kavouras, Grigorios Koumoutsos, Stratis Skoulakis, and Manolis351 Vardas. The online min-sum set cover problem. ICALP 2020

__ Strengths__: I think the ideas are very theoretically interesting; I expect that the proposed relaxation can be exploited for other variants of the ranking problem, and more generally other online algorithms where the actions space is the permutation space.

__ Weaknesses__: My main critisim to the paper is the lack of empirical results. The problem seems very practical especailly in recommendation systems, but the paper does not provide any empirical results for their method. The description of the algorithm is clear and the enough details are provided for reader to implement their method, though.
Edit: This is partially addressed in the authors' feedback.

__ Correctness__: Theoretically I verified the proofs. No empirical results is provided.

__ Clarity__: The results and proofs are well-explained.

__ Relation to Prior Work__: yes.

__ Reproducibility__: Yes

__ Additional Feedback__: