Paper ID: | 1932 |
---|---|

Title: | Learning Sample-Specific Models with Low-Rank Personalized Regression |

This paper presents a model for performing personalized regression; i.e. allowing for individual prediction models for each sample, rather than estimating a model for a group of samples. This is clearly useful in personalized medicine, but could also be applied in other settings, e.g. voter behaviour as shown in the paper. The method uses a linear model (although it could be extended to glms). Estimation is made feasible by reducing the p*n space of parameters to a lower-dimensional subspace via factor analysis. Additionally, regularization between parameter vectors for different samples is applied via a learned distance metric consisting of a weighted sum over covariate-specific base distances. Prediction for unseen datapoints is achieved via a nearest-neighbour approach using the learned distance function. The authors test the approach using a simulation study, as well as applying it to three real-world datasets from finance, medicine and voting records. The method seems original, and I am not aware of other approaches that share all of the characteristics of the one presented here. Regarding the quality, the method is well-motivated and robust, and the evaluation is thorough. In particular, I appreciate that the authors compared with a number of different methods, including a mixture regression model and a deep neural network. The simulation study could have been a bit more extensive; in particular, what is missing is an exploration of the effect of varying n and p. The work is well-presented and explained, and I had no issues following the details . However, there is one exception, which is that the subgradient approach requires (of course) the gradients of the objective function with respect to the parameters. These have not been included in either the main text or the supplementary material as far as I can tell. Perhaps they are easy to derive, but it would aid reproducibility to include these. In terms of significance, the improvement over the methods compared with in this paper is certainly impressive on some of the real-world datasets. Overall I would say that the method is presenting a number of incremental improvements over existing methods, that taken together amount to an important contribution to the literature. Edit after author response: The authors have provided a detailed response which addresses several of my concerns, including expressions for the subgradients and a more detailed simulation study. As a consequence, I have increased my score to an 8. Minor points: - For the prediction, it seems that only one nearest neighbour is used to predict the response for an unseen data point. This seems prone to error (what if the model for that nearest neighbour is not estimated well?). Could the authors discuss if there are ways to make the prediction more robust, e.g. by considering several nearest neighbours. - For the election dataset, the MR model seems to be doing exceedingly badly (much worse than a mean estimate) if the negative R^2 is to be believed. Why is this? - Reproducibility checklist: The authors say that they have included "an analysis of the complexity". I could not find this.

This paper presents a novel method to estimate mixture models by matching structure in sample covariates. It is clearly written and easy to understand. The proposed method is quite straightforward, and therefore the originality may not be strong enough. In addition, since the model is very complicated, it might suffer overfitting problems with noise data. Itt is also hard to scale and apply the proposed method to big data using popular cloud computing infrastructure today.

Overall the manuscript is well written and very accessible. The authors introduce a nice idea to personalize prediction models based on individual samples. They avoid over-fitting by constraining the matrix of personalized parameters to be low-rank. Furthermore, they introduce a regularization scheme as a second option, encouraging model parameters to be similar if the covariates are similar. The manuscript could benefit from drawing a connection to unsupervised domain adaptation methods, maybe at the point, where the distance matching approach is introduced.