Rn.

R that \nis differentiable on some convex subset S of RP. Given a norm 11\u00b711 on RP, we would like to measure \nhow \"smooth\" the function f is on S with respect to 11\u00b711. Towards this end, we define the following: \nDefinition 1. Given a set S, and nonn II ,11, we define the Restricted Smoothness Property (RSP) \nconstant of a function f : RP -> R as \n\nL \n\n[[\u00b711 \n\n(f S) \n\n; \n\n:= \n\nsup \n\nx,yES,a:E(O,l] \n\nf((l - a)x + ay) -\n\nf(x) -\na Y - x \n\n211 \n\n(V f(x), a(y - x)) \n112 \n\n(3) \n\nSince f is convex, it is clear that LII'II (f; S) <': O. The larger it is, the larger the function f \"curves \nup\" on the set S. \nRemark 1. (Connection to Lipschitz continuity of the gradient) Recall that a function f : RP -> R \nis said to have L-Upschitz continuous gradients w.r.t. II \u00b711 if for all x, y E RP, we have IIV f(x) -\nV f(y)ll* ::; L 'lIx - yll where II . 11* is the norm dual to II . II. Using the mean value theorem it is \neasy to see that if f has L-Upschitz continuous gradient w.r.t. 11\u00b711 then LII'II (f; S) ::; L. However, \nLII'II (f; S) can be much smaller since it only looks at the behavior of f on S and cares less about \nthe global smoothness of f. \nRemark 2. (Connection to boundetiness of the Hessian) If the function f is twice differentiable on \nS, using second order Taylor expansion, L II 'II (f; S) can be bounded as \n\nLII'II (f; S)::; sup \n\nx,y,zES \n\n(V2 f(z)(y - x),y - x) \n\nII _ xl12 \nY \n\n(4) \n\nAgain, suppose we have global control on V 2 f(x) in the form'lz E RP, IIIV2 fez) III ::; H where \n111\u00b7111 is the II . II -> II . 11* operator norm of the matrix M defined as 111M III := sUPllxIl91IMxll*. \nThen, we immediately have LII'II (f; S) ::; H but this inequality might be loose in general. \n\nIn the statement of our results, we will derive convergence rates that would depend on this Restricted \nSmoothness Property (RSP) constant of the loss function f in (2). \n\n4 Greedy Algorithm and Analysis \n\nIn this section, we consider a general greedy scheme to solve the general optimization problem in (2) \nwhere f is a convex, smooth function. The idea is to add one atom to our representation at a time in a \nway that the stucture of the set of atoms can be exploited to perform the greedy step efficiently. Our \ngreedy method is applicable to any constrained problem where the objective is sufficiently smooth. \n\nAlgorithm 1 A general greedy algorithm to minimize a convex function f over the \",-scaled atomic(cid:173)\nnorm ''ball'' \n1: Xo +- \",ao for an arbitrary atom ao E A \n2: for t = 0, 1, 2, 3, ... do \n3: \n4: \n5: Xt+l +- Xt + G't(x;at - Xt) \n6: end for \n\na, +- argminaEA (V f(x,), a) \nat +- argrninaE[o,l] f(x, + a(\",at - Xt)) \n\n4 \n\n\fTheorem 1. Assume that ! is convex and differentiable and let II . II be any norm. Then, for any \nT ~ 1. the iterates generated by Algorithm 1 lie in !