
Submitted by Assigned_Reviewer_41
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary
This paper proposes a recommendation algorithm based on reinforcement learning, which pursues both exploration and exploitation in an online setting. Also, the authors introduce a theoretical framework analyzing the proposed algorithm. Basic assumptions in this paper are 1) the users are (clearly) grouped into k typical user types, and 2) likable items are clearly separated. Under these assumptions, this paper proved that the proposed algorithm performs well.
Contribution
As the authors claimed, there has been a lot of online recommendation systems in literature, but there was few theoretical analysis about them. Although the proposed algorithm is quite simple, it is indeed meaningful to have a theoretical frame to anlayze a recommendation system in an online setting.
Some issues
1. It is well known that users (or items) are grouped into several categories. This paper makes use of this fact (or hypothesis) for analyzing the algorithm, not for the algorithm itself. How can we make use of this in recommendation task itself?
2. When users (or items) are clustered, we usually think of a topdown clustering, without allowing for each user or item to be a member of more than one clusters. A recent paper below finds a similar set of users or items in bottomup fashion. How can we apply the proposed framework when clusters can overlap? J. Lee, S. Kim, G. Lebanon, Y. Singer. Local LowRank Matrix Approximation, ICML 2013.
3. In algorithm 1, alpha is defined as some value in (0, 4/7]. How this particular value 4/7 was decided?
4. The definition of \epsilon_R(n) and \epsilon_J(t) make sense as they are, but it seems no evidence that this actually works better than constants for those. Can you prove or show by experiment this? Also, did you try to find the best \alpha? The performance may be affected by the choice of \alpha, so including the experimental result varying \alpha would be useful.
5. (minor) Three lines in Figure 1(b) are not distinguishable when printed by blackandwhite printers. Can you change the shape of each line?
6. (minor) Line 413: [4] study > [4] studies (or studied)
7. (minor) Line 419: Another related work is by [12], who study > Another related work is [12], which studied Q2: Please summarize your review in 12 sentences
This paper proposes a simple online recommendation algorithm and a theoretical analysis for the proposed method. I believe this paper is worth to be published in NIPS. Submitted by Assigned_Reviewer_44
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
** COMMENTS AFTER AUTHOR FEEDBACK **
This is a high quality paper.
Minor comment. Not addressed in the rebuttal is
"1. The definition of neighbourhood is very unclear, because of the use of “jointly”. Line 178 ... contradicts Lemma 8’s proof (Line 576)."
which might be good to clarify in the paper.
 Disclaimer: the reviewer is not an expert in Learning Theory, but in Collaborative Filtering. In assessing the paper’s theoretical significance, he/she will submit to the viewpoint of expert Learning Theorists.
MAIN IDEAS OF THE PAPER:
We know that collaborative filtering works in practice: users who buy A also tend to buy B. The paper takes a bold stab at providing theoretical guarantees on how well it works, albeit in a restricted and slightly hypothetical setting. The hypothetical setting is a simplified sketch of reality, necessary to obtain a theoretically analysable handle on the problem.
The paper analyses a very simple collaborative filtering algorithm, where collaborative filtering relies on looking for a set of neighbouring users with the most similar like/dislike patters (within some cosine distance) and using their extra like/dislike votes to score items to provide next item to a user. The goal is to maximize the expected number of liked items presented to each user within a fixed time window of 1 to T recommendations.
The paper shows that the users are “learnable” after around log(km) = log(number of user types) + log(number of items) rounds of recommendations (up to log^{7/3}(km), depending on a parameter choice). Essentially in time logarithmic in the number of items, roughly optimal performance can be expected. The whole setup relies on some conditions, like having enough users.
The result relies on two additional steps in the collaborative filtering algorithm: (1) random exploration, and (2) structured exploration. Both are necessary, (1) to interrogate potential future items for recommendation, and (2) to have enough of an overlap in like/dislike items to learn about the similarity between users.
The experimental results try to simulate a real scenario by streaming a curated subset of the Netflix prize data to the data set’s users. In the simulation, the average cumulative reward is higher than those of two existing methods (PAF and DM, incidentally neither of which has ever been used in practice, to my knowledge and those of peers).
RELATIONSHIP TO PREVIOUS WORK:
The paper is set apart by using two exploration steps, one of which learns the similarity between users (setting it apart from standard multiarm bandits). Furthermore, no arm can be pulled more than once.
QUALITY and CLARITY:
The paper is very well explained, and beautifully written.
I worked through most of the proofs in the supplementary material, and they seem technically correct.
I would like to comment on some textual issues for the benefit of the authors:
1. The definition of neighbourhood is very unclear, because of the use of “jointly”. Line 178 suggests that *only* useritem indexes that were recommended in the joint exploration step in the algorithm are eligible for defining neighbours (and hence cosine similarity). Why are other overlapping items not considered in defining the neighbourhood? The confusion arises as Line 178 is intentional, but if we read Lemma 8’s proof (Line 576), it contradicts Line 178, as now jointly means the items that two users rated in common, and *not* the subset of that set, which was tagged with “jointly” in your Algorithm (see Line 157).
2. Sec 2, Lines 103 and others. The paper exclaims the point that one should not recommend a consumed item to a user again (especially a disliked one). This is rather obvious in the recommendations community, and I assume you drive home the point to distinguish your work from a standard multiarm bandit setting?
3. Typos. Line 165 should refer to \tilde{p}_{ui} and not index j? Line 524 (Lemma 5’s proof). \bar{Z}_s = 1 / s^\alpha – Z_s would give a zero mean. Line 610 [unnecessary inequality in proof] Line 634 user u has is...
ORIGINALITY and SIGNIFICANCE:
This paper is the first attempt (to my knowledge) to characterize learning rates in a (pseudo) real recommendations setting.
The (new) practical takehome message is that joint exploration is required for the algorithm (and maybe more generally?) to achieve optimality in time logarithmic in the number of items.
PRACTICAL WEAKNESSES TO BE AWARE OF:
To achieve the main results (that is, how long and how much should one explore and exploit before attaining optimal performance) the proofs and algorithm rely on a number of assumptions. These assumptions already give useful insights (like how many users one needs relative to an item catalogue size and tolerance), but from a practitioner’s viewpoint, they break down on many levels.
1. The assumption that users belong to each type with odds 1/k is invalid. In reality, these are typically according some power law.
2. Random (and joint) exploration, whilst theoretically appealing (and necessary) is a dangerous game, as it can hurt user retention in systems like those described in the Introduction. The algorithm doesn’t make provision for a user never returning because of it.
3. From calibration plots in real systems, we know that condition A1 does not hold in practice. There are pairs for which the odds are really, well, coinflip!
These are not criticisms, but differences to be aware of.
Q2: Please summarize your review in 12 sentences
A beautifully written paper that shows that the users are “learnable” after around log(number of user types) + log(number of items) rounds of recommendations, given a simple algorithm. The result proof relies on a number of conditions to be met, which are essentially a “simplification of the real world”, and tells us that structured (joint) exploration of users is a requirement.
Submitted by Assigned_Reviewer_45
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Quality and originality: The focus on the online setting in recommendation is welcome — this is a genuinely challenging issue in domains like movies, books and music — and I find the introductory material to be of high quality. The performance guarantees and proofs seem correct to me, but I’m not at all an expert in that area. The CollaborativeGreedy model itself is less convincing. Though it outperforms two other models, the datasets used are so small that it’s hard to get an intuition as to how it would work for more realistic datasets (Netflix prize, Million Song dataset). The authors show awareness of this pint in observing that a full validation would require an actual interactive online recommendation system. Also, the point of the paper seemed more to be a theoretical exploration of possibilities, and the datasets seem large enough to support the authors’ claims. I find the final claim of the paper to be particularly thoughtful: that two types of exploration are useful for learning mixed distributions in an active learning setting.
Clarity and Significance: This paper was pretty far from my area of research. I found the paper to be “locally clear” (i.e. I think I understood each individual paragraph) but I fear I’m missing some of the bigger picture. Q2: Please summarize your review in 12 sentences
This paper explores the online setting in collaborative filtering, presents a model and learning problem for online recommendation, provides performance guarantees (with proofs) and discusses some experimental results. Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point. We thank the reviewers for their feedback. Their time and effort are much appreciated.
We briefly recall the motivation behind our model and analysis. Our paper focuses on understanding when and why a simple cosinesimilarity collaborative filtering method works in the online setting for recommendation systems. In such systems, items are recommended to users over time, and the preferences of users are revealed as a function of what they were recommended. Thus, an algorithm must learn about the users while simultaneously providing good recommendations to ensure a pleasant user experience. This classic exploreexploit tradeoff is the central theme of research on multiarmed bandit problems, and plays a crucial role in recommendation systems. We add the constraint that an item consumed by a user cannot be recommended to that user again, and show that having users of relatively few types enables collaborative filtering to be useful. Our analysis captures both the exploreexploit and collaboration aspects of the problem and provides a theoretical guarantee for a simple collaborative filtering algorithm. We crucially use two types of exploration, one to explore user similarity, and one to explore the space of items.
Question: Why does the algorithm not explicitly make use of user clusters?
We use a simple model in which users originate from a few different groups. There are existing algorithms that can do clustering. For example, Bayesian clustered tensor factorization estimates both user clusters and item clusters. More simply, one could cluster the users based on their revealed item ratings using cosine distance. Without trying to explicitly solve this clustering problem, however, we find that a cosinesimilarity collaborative filtering method can still work well, provided that the different groups are "separated" enough in terms of cosine similarity (this is our "incoherence" condition A2) and that the ratings are not too noisy (this is our "no ambiguous items" condition A1). Thus, the nearest neighbors found in estimating the probability of a user liking an item are, after enough time steps, mostly from the same cluster as the user. In this sense, the algorithm does have a "clustering" component  it is just local to each user.
General clarification regarding model assumptions:
What happens when our theoretical assumptions A1 and A2 break down? In practice, if two clusters of users are not well separated in cosine similarity (i.e., A2 fails), then it may make sense to treat the two clusters as one cluster. Thus, the incoherence condition can be interpreted as defining the resolution used to differentiate user clusters. Meanwhile, we could have items that have a high amount of noise (i.e., A1 fails). For example, an item could have probability 1/2 of being liked by any user cluster. In this case, we would quickly have the estimate that the item has probability of being liked close to 1/2. We could then modify the algorithm to handle "too noisy" items in a special way, such as exploring instead. Note that without any special treatment of such "too noisy" items, during an exploitation step, since we pick the item with highest estimated probability of being liked for a user, we will avoid picking a "too noisy" item unless there are no other items with higher estimated probability of being liked.
Question: Is it possible to allow users to belong to multiple clusters?
Allowing a user to be a member of multiple clusters would be a nontrivial extension to our current theoretical setup. (Note, however, that the algorithm itself could still work without any changes, since it does not actually explicitly try to find the global clusters that explain all the users.) There are at least two ways for users to belong to multiple clusters: 1) the user’s item preference vector is a convex combination of the item preference vectors of two or more clusters, or 2) the user behaves like one cluster for some of the items and like another cluster for other items. An inelegant solution that handles both cases is to simply explode the number of clusters to account for these mixtures. A more refined solution is to think of clustering in a hierarchical fashion. As time elapses, the algorithm teases apart finer and finergrain clusters up to some resolution limit. The analysis would become quite a bit more involved.
Question: What is the significance of 4/7 in the definition of alpha?
Regarding decay factor alpha, taken in (0, 4/7]. The 4/7 is an artifact of the proof of Lemma 5 (in the supplementary material). It can actually be different from 4/7 (in which case the constant in the decaying exponential would no longer be 1/20 toward the end of the proof of Lemma 5). In the supplementary material A.3, we give a slightly more detailed explanation of the experimental setup in which we did try different values for alpha on training data. The one that worked best with training data (alpha = 0.5) was then applied to test data.
Question: Why do the exploration probabilities change over time?
The exploration probabilities change over time or as a function of number of users. Some of the proofs would be simpler if the exploration probabilities were constant. Making the explorations decay over time makes the algorithm performance robust to changes in model parameters: not too much more exploration than is necessary occurs due to the decay. We suspect that in practice, for very large datasets with massive amounts of items or especially if the number of users and items are growing over time, having a constant probability for each exploration may be desirable as there are always items that are unexplored and more user structure that we need to uncover.
 