NIPS 2017
Mon Dec 4th through Sat the 9th, 2017 at Long Beach Convention Center
Paper ID: 1492 Multitask Spectral Learning of Weighted Automata

### Reviewer 1

Multitask Spectral Learning of Weighted Automata The paper proposes an approach to estimate a common representation of multiple weighted finite automata (WFA), via multitask spectral learning. To do so, the multitask problem is formulated as novel vector-valued WFAs, from which a joint feature space is extracted using spectra learning. The method is evaluated in synthetic (randomly generated WFAs) and real data (task-related learning using 33 languages), where the multitask approach is shown to outperform single task learning. This paper appears to be a first attempt to the multitask approach for WFAs. One major comment is perhaps a somewhat weaker description of the context and motivation. The approach is immediately presented with a spectral learning (SL) of WFAs, without spending too much time on when, where and why SL-WFA is used, in what context - nor on referencing the advantages and limitations of other competing algorithms in that context. A small related work section is provided, but may not be enough convincing on why the proposed approach is better. The evaluation is also comparing SL-WFAs against (M)SL-WFAs. This is nice to evaluate single vs multi-task, but give no information on why one should use the proposed approach, ML-SL, over other methods. The paper, however, presents a nice methodology for estimating a joint feature maps, with potential applications where finite-state machines are used (language-related tasks are illustrated). The methodology is well supported with upper bound conditions and an analysis on computational complexity.

### Reviewer 2

In this paper, the authors have proposed to estimate m weighted automata based on spectral learning algorithms applied to the associated Hankel tensor. Specifically, they apply rank factorization via SVD to the Hankel Tensor. They also how this factorization can then be used to learn individual functions. I have the following comments: Is the factorization technique proposed here for Hankel tensor a novel contribution of this paper? In Definition 2, the authors introduce a measure of relatedness tau. Is there any specific reason for this specific definition? I wasn't able to relate it to other quantities in the paper; more detail on the purpose of defining this quantity would be very helpful. Would it dictate the choice R - the common rank? I was hoping that the benefit of estimating multiple functions together would be apparent through dependence of error measures on tau in Theorem 5 (maybe in a more general version of it) or computational complexity. Specifically, it seems the computational complexity is worse than when each individual WFAs are estimated separately. Can the relatedness not be leveraged in any way? I certainly feel that it would be very interesting and illuminating to see a more general version of Theorem 5. If the WFAs are minimally related - would the performance be worse than doing it individually? Specially, in simulation what if d_S were 0 or at least less than d_T. It seemed concerning that even when there is commonality - if d_S and d_T are same MT-SL seems to offer no benefit compare to simple SL. In showing the benefit of multiple learning at least for maximally related WFAs, the authors note that the estimation error for enough tasks would be O(T^2) compared to O(T) if done individually - where T is || E ||_F/s_R(H). I couldn't follow why O(T^{2}) would be better that O(T); is T <= 1?