Dimitris Tsioutsias, Eric Mjolsness
We investigate the optimization of neural networks governed by general objective functions. Practical formulations of such objec(cid:173) tives are notoriously difficult to solve; a common problem is the poor local extrema that result by any of the applied methods. In this paper, a novel framework is introduced for the solution oflarge(cid:173) scale optimization problems. It assumes little about the objective function and can be applied to general nonlinear, non-convex func(cid:173) tions; objectives in thousand of variables are thus efficiently min(cid:173) imized by a combination of techniques - deterministic annealing, multiscale optimization, attention mechanisms and trust region op(cid:173) timization methods.