|
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I enjoyed reading this paper. It is thought-provoking and addresses an issue that should be of great concern to the ML community: how to reduce technical debt and hence make ML software systems more successful and sustainable for a broader set of real-life applications. In the long run, this will be essential for ML to fulfill the high expectations of industry and society. Certainly, it is important for researchers in ML to be aware that, from a SW engineering perspective, increasing complexity of ML systems and algorithms for marginal accuracy gains may be valuable from an academic perspective, but in the long run undermine the impact of ML research. Having said that, I have two main concerns with this paper: 1) I don't see it is in the scope of the NIPS call for papers. Overall, it reads more like a systems paper. 2) I find certain aspects should be dealt with in more depth. The authors should try to provide more empirical evidence for their claims. The solutions they propose often stay at a very high level. Are there any novel design patterns that could be derived from their analysis? A lot of the issues addressed by the authors pertain to pipelines for processing data and extracting features. There is a vast amount of Computer Systems research that deals with data management services; specific references to that work might be appropriate and help to put this paper in context.
Q2: Please summarize your review in 1-2 sentences
The paper addresses a technical issue of great relevance to machine learning system in a broader sense. However, I don't see it's really in the scope of the NIPS call for papers; moreover, certain aspects are treated only superficially.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
(Light Review) This well-written and thought-provoking paper
catalogues the myriad real-world engineering risks associated
with integrating machine learning into engineering systems.
The
paper does a fantastic job of outlining the various pitfalls -
as a researcher and practitioner I can personally vouch for
having encountered nearly all of them. This could be an
extremely valuable guide to teams and companies that are new to
(or considering) integrating machine learning into their
systems, in order to help them plan for the potential risks.
The paper doesn't offer quite as much in terms of fixes, though
it does offer some best practices (and anti-practices).
I think
this paper would generate great discussion at the conference and
would be often-cited (and more often *used*) in both research
and industrial settings.
I'd defintiely support the inclusion of
this paper at NIPS.
Q2: Please summarize your review in 1-2 sentences
This well-written and thought-provoking paper does a fantastic
job cataloging the myriad real-world engineering risks
associated with integrating machine learning into engineering
systems. While a bit untraditional for NIPS, I think this paper
would be of immense value to practitioners in both industry and
research.
I'd defintely support the inclusion of this paper at NIPS.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper shines the light on the systems aspect of the design and maintenance of the ML systems.
This is not a typical NIPS paper as it does not propose new methodology.
However, there are barely any papers of its ilk despite the fact that most major corporations which make decisions based on data are employing some version of a ML system.
Technical debt leads to high overhead in maintenance which carries significant monetary costs to the companies.
By summarizing major issues associated with the maintenance of such systems (and thus potentially helping avoid some of the pointed out issues), this paper will have a *very* significant impact on the development of new ML systems.
I very much enjoyed reading the paper.
The paper resonates very strongly with me as after many years as a university ML researcher, I recently left to join industry, and over the last two years, I have observed first-hand the systems issues described in the paper.
The authors did a great job summarizing the main pitfalls.
I could add a few points to the taxonomy even though it is already quite comprehensive: - (Intro) Deploying ML systems carries its own overhead as well manifesting in running of A/B experiments (and then cleaning up after them) and maintaining an archive of artefacts for deployed models. - (ML-System Anti-Patterns) When reading about the Glue Code, I cannot help but think of incorporating open source packages into the system, especially if additional custom modifications need to be made.
Using these packages leads to several potential (unmentioned) headaches: (1) someone needs to maintain this package internally, (2) whenever there is an updated version of the package, it needs to be re-integrated into the internal code-base, and (3) if internal modifications we made, they often need to be committed back to the open source (to maintain future compatibility); this may require jumping through administrative hoops. - (Configuration Debt) It is worth mentioning that interruptions in data logging may lead not only to stale features but also to (potentially cascading) failures in feature builds. - (Monitoring and Testing) Additional aspect could be changes in available bandwidth, both to extract/build the features and to train the model.
Typos: - Page 2 (Correction Cascades) $a$ -> $m_a$ (two places) - Page 3, line 131, the -> them
Q2: Please summarize your review in 1-2 sentences
A very detailed paper on the potential pitfalls of maintaining a machine learning system.
The paper is very timely as ML is maturing and is being put in production across many industries; in my opinion, it is a very useful read for anyone building or working with an essential ML system.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
I think this is an interesting position paper. It talks about challenges in designing and maintaining ML systems. It takes a software engineering perspective and analyzes current systems that seem to use ML algorithms as black boxes.
I see that this is an applied paper but I think that it would have been seriously strengthened with actual case studies of real systems that showcase these problems. In its current state it is an amalgamation of different design decisions that one should consider. I think it would also help to prioritize the different criteria rather than just listing them, specifically with ML systems in mind. This again would require actual case studies and I think would provide valuable insight into design of other similar systems.
Lastly, since this paper really doesn't have any technical/empirical contribution but as I mentioned before is a position paper more catered towards software engineering, I am not sure how suitable it is for NIPS. I think more applied venues such as industry tracks of data mining/ML conferences or conferences that are at the intersection of software engineering and data analytics might be more suitable.
Q2: Please summarize your review in 1-2 sentences
This is a position paper which I think would be more suitable for more applied venues. It would also help to have case studies to ground the suggested ideas.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their careful
reviews.
One of the key questions that was raised by reviewers 1,
3, and 6 was whether this paper is suited to NIPS as opposed to more
applied / engineering / systems conferences. Our opinion here is that
highlighting the systems-level complications and costs hidden in ML is
important for the NIPS community in particular because this community's
large and growing influence on real world systems and applied work.
Indeed, a major portion of even "academic" researchers at NIPS have an
industry affiliation of one form or another. Thus, we agree with
reviewers 2, 4, and 5 that this topic area would be of high value to the
NIPS community.
Reviewer 1 notes that this paper's topic area is
not listed in the NIPS call for papers. In our opinion, CFP's should be
interpreted as guidelines rather than exhaustive lists, otherwise new
areas of investigation might never surface. The CFP itself states that
papers of interest are "not limited to" the listed topic areas. We also
note that the inclusion of the word Systems in the NIPS conference title
is some suggestion that a discussion of systems-level issues for ML is not
inappropriate for this venue. A significant portion of recent advances in
ML in recent years have been at the systems level, bringing into scope not
just algorithms and proofs, but also hardware, networking, and other
topics traditionally seen as "engineering". In this view, a paper on
ML-specific technical debt may be seen as highly relevant to
NIPS.
Overall, we agree with reviewer 3 that case studies would be
another excellent way to study this problem, and with reviewer 1 that
additional empirical evidence would further strengthen the paper. As the
first paper to explore these issues in detail, we felt that breadth was
also a priority. Unfortunately the 8 page limit creates a tension between
breadth and depth. If there are specific places where the reviewers feel
content could be cut to make room for additional depth in other areas, we
would be open to these suggestions. Fortunately (or unfortunately?) there
is no lack of material from which to draw.
Reviewer 1 suggests
incorporating more references from traditional data management literature.
We will do so, but also note that the key difference between traditional
data management and ML data management is that ML data directly impacts
system behavior, making this a much more difficult problem
area.
Reviewer 6 notes that the paper does not provide many "clean"
mechanisms for dealing with ML related technical debt. We think this is
in some sense a fundamental difficulty in this area. There is not a magic
bullet for resolving complex system-level entanglements. However, as
reviewer 2 notes, the paper does provide a number of best practices to
emulate and anti-practices to avoid.
On the whole, it appears that
the reviewers generally felt this work was valuable if untraditional.
Again, we appreciate their insightful comments. |
|