NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center

### Reviewer 1

In this work, authors proposed a framework for multi-task learning, where there is assumed to be latent task space, and the learner simultaneously learn the latent task representation as well as the coefficients for these latent variables, namely the Latent Task Assignment Matrix (LTAM). Authors further imposed block-diagonal structure on the assignment matrix, and developed spectral regularizers for it. Authors then proposed a relaxed objective that can be optimized via a scheme similar to block coordinate descent. Authors also provided generalization learning guarantees as well as the structure recovery performance. Simulation experiments showed that the proposed algorithm can recover the true structure and provide improvement in prediction accuracy. The framework is interesting and the results are meaningful. However, the paper is poorly written and hard to follow. The authors didn't give enough explanation and discussion of their results, which makes them hard to digest. I think the paper needs major revision. Some questions and comments: - This whole framework is limited to linear models. Can this be easily extended to other hypotheses families? - About optimization: the overall objective is non-increasing, which is nice, can we say anything about convergence rate? Also, since the objective is not jointly convex, what happens if the algorithm converges to a local minimum instead of a global one? - In the definition of H(L, S, \tilde S, U), is there any restriction on U and \tilde S? I thought they must belong to some sets, as in the definition of Obj, but these are not mentioned in the definition of H. - I can't find the proof of Theorem 3, which seems to be adapted from another paper. However I'd still like to see how the adaption is made, just to double check. Also, there is no \xi_3 in the generalization bound, which seems strange since it also defines the hypothesis set H. ------ I read other reviews and authors' response, and most of my concerns are addressed. I'll increase the score to 6. However, I still think the paper can benefit a lot from adding more explanations for the motivation and for the theoretical results, e.g. those given in the response.

### Reviewer 2

1. This paper is clearly written. The methodology section covers dense information. However, the subtitles before each paragraph make it easy to read. 2. This paper formulates a regularization scheme for the negative transfer problem. The arguments in this paper follow rigorous math，which makes the main idea convincing. 3. The figures in this paper are nicely plotted. Fig.1 demonstrates the structure recovery properties of their proposed regularization term. Fig.2 shows that the spectral embedding does has a strong semantic power as promised in the paper. 4. A minor issue is that some details of the optimization process are missing. More elaborations should be provided to improve clarity. (see also the Improvements part)

### Reviewer 3

1) This paper is well-written, the goal and motivation are well-clarified and convincing. Overall it is easy to follow. 2) It is very nice to see that the methodology is strictly organized under solid theoretical analysis and discussion. Moreover, we see sufficient empirical studies as well, where we can observe some interesting results and visualization. 3) Last but not least, I have to say that what attracts me most is the insights this paper brings to me. It does not merely give a new regularizer for leveraging heterogenous block diagonal structure. More importantly, it also brings a new wind by bridging the Laplacian-based structural learning problem with the optimal transport problem. In my opinion, this suggests that leveraging the expected structural sparsity could then be regarded as a matching problem across latent tasks and output tasks.