Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper extends the popular "multitask feature
learning" formulation to the infinite task scenario: 1) Instead of
having a finite number task, the problem has an infinite number of tasks
characterized by a continuous task parameter \theta (e.g., \theta could be
time t). 2) Instead of having each example belong to only one of the
finite tasks, each example can belong to the entire class of infinite
tasks where its weight in a task is defined by w(\theta), where \theta is
the task parameter. 3) The author shows that it's possible to solve
this infinitetask feature learning problem because, under certain
conditions, the model parameters \beta(\theta) is piecewise linear on the
task parameter \theta.
The Originality, Quality, Clarity and
Significance are reviewed as follows.
Originality:  The idea
of having infinite tasks characterized by a task parameter is novel and
interesting.  The authors also discussed and studied a few realworld
scenarios, where the infinitetask formulation looks a natural solution.
Clarity:  Overall the paper is well written. Examples used to
illustrate the applications of infinitetask formulation is clear and
interesting.  Discussion/review on related work can be improved. One
way to view the presented work is to impose a strong known task structure
(characterized by a continuous task parameter) on multitask feature
learning formula (in order to handle infinite tasks). In this sense, the
author should discuss previous work that tries to infer (unknown) task
structure and incorporate it into multitask feature learning, e.g.,
"Learning Multiple Tasks with a Sparse MatrixNormal Penalty" at NIPS
2010. Can we combine the strength of inferring unknown task structure and
handling infinite tasks?
Quality:  The paper only extend a
very specific type of multilearning method  the multitask feature
learning  to the infinitetask case.  The empirical study part is
relatively weak, mostly based on simulation or small toy data. I like to
see the application of the proposed formulation to some significant
problem with realworld impact.
Significance:  The
infinitetask learning idea is interesting and has potential applications.
Q2: Please summarize your review in 12
sentences
This paper extends "multitask feature learning" to
infinitetask cases, where an infinite number of tasks are characterized
by a continuous task parameter and each example is (more or less) belong
to all tasks. The idea is interesting and has many potential applications.
Discussion on related work can be improved, and the authors may also
discuss extending more multitask learning methods to infinitetask cases.
Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents a parametric task learning whose
formulation is motivated by multitask feature learning. I don’t think the
proposed method is a generalization of multitask learning since most
instances of parametric task learning is not multitask learning problems.
Due to computational reason, the type of loss functions used in
PTL is very limited and this prevents its application to many problems.
The authors should present the detailed optimization procedure to
the step 1 in Algorithm 2 since many readers may not be familiar with
parametric quadratic programming.
The experiments are not very
satisfactory. Experiments on toy data for nonstationary are not enough.
The authors are encouraged to do more experiments on realworld datasets.
In experiments on costsensitive learning, the authors don’t compare their
method with the benchmark costsensitive methods and so I cannot say the
proposed method is good in terms of the generalization performance. For
the joint quantile regression, the approach as described from line 365 to
372 is different from the method described in section 4. Why use this
alternative approach? Moreover, the authors don’t compare with other
quantile regression methods. Q2: Please summarize your
review in 12 sentences
This paper presents a parametric task learning
motivated by multitask feature learning. The experiments are not very
satisfactory. Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors consider problems where tasks are related
by a continuous parameter. They show that jointly optimizing over the
range of the parameter can result in more consistent
classifiers/regressors and thus better performance.
The paper is
clearly written, and the main contribution is showing that under certain
assumptions, the entire path of solutions over the range of the parameter
can be computed via a parametric QP.
The consistency/quality of
the solutions is demonstrated on a nonstationary regression problem,
costsenstive SVM problem, and quantile regression. (Note: for the SVM
benchmarks, it seems like the standard deviations overlap in Table 1. It's
probably honest to point that out.)
Q2: Please
summarize your review in 12 sentences
Useful contribution, clean exposition.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
Dear Reviewers,
Thank you for fruitful
comments. Please find our answers to the questions below.
Reply to
Reviewer 4
[41] Discussion/review on related work can be
improved.
We agree with the reviewer in that an alternative
approach to learning infinitely many related tasks is to impose or learn a
restricted (e.g., parametric) model of task structure. In this view, the
novelty of our work is that common shared structure for infinitely many
tasks can be identified in a fully nonparametric way with the help of
parametric programming. We will discuss this viewpoint and argue the
highlevel difference with the suggested (and possibly other) related
works in the final version.
[42] The paper only extends a very
specific type of MTL method
Our highlevel idea for handling
infinitely many tasks is the use of parametric programming. Although we
restricted our attention to the problems that can be cast into
"piecewiselinear" parametric programming where the computation is quite
efficient, the same idea can be naturally extended to more general
nonlinear parametric programming situations. In particular, recently
proposed approximate parametric programming (with approximation guarantee)
is useful for such generalization. In the final version, we will describe
our highlevel idea, and discuss the possible generalization to other
types of MTL methods.
[43] The empirical study part is relatively
weak
Our main goal is to introduce the parametric task learning
(PTL) framework and show that there are many problems which can be
naturally formulated as PTL. That is why we provided three simple examples
rather than focusing on a single particular application in detail. We will
discuss possible realworld applications of our PTL framework in the final
version, including the use of joint quantile regression in financial data
analysis and other fields.
Reply to Reviewer 5
[51] Most
instances of parametric task learning is not multitask learning problems.
We use the term multitask learning (MTL) in a broad sense to
refer to the situations where there are many tasks which are expected to
share some common representation. PTL is a generalization of MTL to
infinitely many tasks. There are many practical problems that can be
naturally cast into the PTL framework (e.g., see [43] above).
[52] The type of loss functions used in PTL is limited
See our answer to comment [42] from Rev 4.
[53] The
authors should present the detailed optimization procedure
We will
describe the detail procedure in step 1 in the appendix of the final
version. Thanks for the suggestion.
[54] The experiments are not
very satisfactory.
See our answer to comment [43] from Rev 4.
[55] comparison with benchmark costsensitive methods
Sorry for the confusion. In the experiment, we compared our PTL
approach with costsensitive SVM ('Ind' indicates costsensitive SVM). We
will clarify this point in the final version.
[56] QR approach in
line 365372 is different from the method described in section 4
Sorry for confusion. In practical applications of quantile
regression, it is usually recommended to separately model the
homoscedastic part (i.e., conditional mean function E[Yx]) and the
heteroscedastic part, because each part should be often penalized with
different magnitudes. In our manuscript, we simplified the description and
avoided applicationspecific details in Section 4 since our objective was
to provide examples of PTL. We will clarify this issue in Section 6 of the
final version.
Reply to Reviewer 6
[61] Standard
deviations overlap in Table 1
Thank you for pointing out this. The
differences are sometimes not statistically significant. We will clarify
this fact in the final version.
