NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 2784 Learning to Multitask

### Reviewer 1

This paper proposes a new learning framework to identify effective multitask model for a given multitask problem. This is done by computing task embeddings, using a graph neural network. These embeddings are used to train a function that estimate the relative test error, based on historical multitask experience. Quality: The technical content of the paper is well explained. The experiment section is very dense (probably due to lack of space) and could be better organized with some subsections. Clarity: The paper is generally well-written and structured clearly. The Figures 2 and 3 are way to small and can not be read on a printed version (again, probably due to lack of space). Originality: The main idea of this paper is very interesting and the empirical results look encouraging. I am not convinced by the argumentation given between line 320 and 325. Even if this experiment is interesting, showing that other functions lead to worse performance do not really "demonstrate the effectiveness of the proposed functions"... Significance: The model is evaluated on standard multi-task evaluation benchmarks and compared with many baseline systems. The paper clearly demonstrate the interest of this model for identifying good multitask model for a new multitask problem.

### Reviewer 2

This paper proposes a learning to multitask (L2MT) framework to identify a suitable multitask model under a unified formulation (1) for a multitask problem. This paper is well written and easy to follow. Different from traditional model selection approaches (e.g., cross validation) to determine the suitable multitask model for a multitask problem, the proposed L2MT provides a novel learning approach to solve this issue. This paper is novel in the following aspects: 1) The proposed LGNN is used to learn a dataset embedding for a task. This is totally different from traditional approaches which rely on manually designed features to represent a dataset. With the LGNN, the training process of L2MT becomes end-to-end, which alleviates the tedious feature engineering process. 2) Based on the unified multitask formulation (1), each multitask model is parameterized by the task covariance matrix $\Omega$. With such continuous parameterization, the testing process can learn a novel $\Omega$ out of all the known multitask models presented in Section 2 and this makes learning a better multitask model than all the existing models possible. I think this is a reason that the proposed L2MT can outperform all baseline models in the experiments. 3) The construction of the estimation function is based on the kernel function, which is easy to understand and also facilitates the optimization in the test process. The interesting idea of introducing a link function to transform labels simplifies the construction of the estimation function. 4) The experiments are thorough by presenting different aspects of the proposed L2MT. I have some minor questions: (1) In Theorems 1 and 2, Schatten a-norm and squared Schatten a-norm with 1<=a<2 are instances of formulation (1). What about other cases for a, e.g., 0=2? Can these cases be unified into formulation (1)? (2) The generalization bound seems important to L2MT. It is better to put the analysis in the main body instead of the supplementary material though I understand this is due to page limit.