Reviews: PyTorch: An Imperative Style, High-Performance Deep Learning Library

*Rebuttal* I thank the authors for addressing my comments and including the additional performance comparison. I will increase my score to 9. *Summary* PyTorch is an open-source deeplearning library that strives to marry good performance, flexibility and usability. It is specifically designed for researchers with the goal to enable easy experimenting with new features. Through seamless integration in the Python ecosystem it enables the interoperability with other python libraries which makes prototyping easy for the user. One may argue whether a systems/software paper, presenting the implementation details of a library should be published at NeurIPS, or whether it would be a better fit for USENIX or SysML. However, given the impact of the library in the research community I strongly support the publication of this paper at NeurIPS. PyTorch is specifically designed for the research community, thus of high interest to most attendees. The library offers what a researcher needs to validate new ideas without large implementation effort. By being fully open-source and providing flexibility to the user, it enables the exploration of new features, including new models and performance optimizations. Thus, it has high value to the community and helps drive research forward. [clarity] The paper is well written and easy to follow. It touches upon many challenges faced when implementing such a framework and gives some insights into its power and also its limitations. *Comments* I feel from the paper it is not clear whether PyTorch can run in the absence of GPUs. It says that it supports GPUs, but does it depend on the presence of GPUs? My main criticism is that you claim that PyTorch offers performance comparable to the fastest current libraries for DL. But what are these libraries? In Section 6 you compare to MXNet and Tensorflow. I feel to support this claim a broader comparison to other libraries would be necessary, or at least some evidence/reference that Tensorflow and MXNet are currently the fastest out there. *Minor comments* - I think it is nice to have a dedicated publication for PyTorch so it is more clear in the future how to cite it. However I wonder how attribution of this open-source library is handled. - The authors could try to tight the paper a bit more to the conference. There is only a single reference that has been published at NeurIPS and maybe there could be done a better link. - Line 87: remove ‘.’

This paper presents the PyTorch deep learning library, with a major focus on design principles and design choices made in the creation of the software. From what I can tell, the paper presents the first peer-reviewed submission of the full PyTorch software library (a previous submission to a NIPS autodiff workshop, https://openreview.net/forum?id=BJJsrmfCZ, seems to detail only the automatic differentiation component). I applaud the authors for focusing their submission around their key design principles and decisions, rather than presenting a detailed tutorial of the software's use. This distinguishes the submission from a "white paper," i.e., a paper which reports on how to use a piece of software while marketing its merits, but having less scientific value. Instead, the paper reads as a kind of manifesto for what deep learning software should aspire to be, while also detailing how this vision can be achieved through careful design and implementation. Design decisions are carefully mapped out, and justifications are given for why a certain choice was made, always referencing back to the basic design principles (in particular, usability and performance). The paper is well thought-out, very clear, and carefully written. There is no one big idea here. Ideas emerging from previous works are clearly stated. Nor is there any new or surprising result here, since PyTorch has been available (and widely used) for several years in the community. However, the paper demonstrates the successful synthesis of several key ideas in the field of deep learning software. There are a lot of smaller ideas in the implementation which, while not flashy, fit together to give the library overall strong performance. The authors detail the consequences and tradeoffs of their design choices carefully, making clear that sometimes this may lead to slightly worse performance, which is acceptable if it makes the software easier to use. There was no guarantee that PyTorch would be competitive with other deep learning libraries based on static dataflow graphs. It could well have been that PyTorch forced users to make a big tradeoff between performance and ease of programmability. Due to the careful design and implementation of the library, it appears that PyTorch is on par performance-wise with other graph-based libraries, despite efficiency being the not-quite-primary design principle. The developers should be acknowledged for taking this risk. I also appreciate that, while the authors focus the paper on outlining their design principles and decisions, they also provided some evaluations which, with limited space, largely confirm the validity of their (sometimes technical) design choices. These are presented in a variety of ways: benchmarks, comparisons, code profiling, and gauging user adoption. The evaluations were very clear. For instance, I have no experience in GPU memory management, so it took some concentration to follow along Sec. 5.3 carefully; yet the accompanying evaluation in Sec. 6.2 made the consequences of PyTorch's custom memory allocator very clear. Overall, PyTorch is a significant contribution to the field of deep learning software, influencing not just the researchers who use the library, but also designers of other scientific software (and likely to have lasting influence beyond this). As such, this paper deserves recognition as one of the top papers at NeurIPS.

Paper ID:	4399
Title:	PyTorch: An Imperative Style, High-Performance Deep Learning Library

Reviewer 1

Reviewer 2

Reviewer 3