Are deep ResNets provably better than linear predictors?
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
Direct Runge-Kutta Discretization Achieves Acceleration
Escaping Saddle Points in Constrained Optimization
Online Learning of Dynamic Parameters in Social Networks