Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule
Jason Altschuler, Pablo Parrilo
Can we accelerate the convergence of gradient descent without changing the algorithm—just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed
Silver Stepsize Schedule
optimizes strongly convex functions in
The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period
The core algorithmic intuition is hedging between individually suboptimal strategies—short steps and long steps—since bad cases for the former are good cases for the latter, and vice versa. Properly combining these stepsizes yields faster convergence due to the misalignment of worst-case functions. The key challenge in proving this speedup is enforcing long-range consistency conditions along the algorithm’s trajectory. We do this by developing a technique that recursively glues constraints from different portions of the trajectory, thus removing a key stumbling block in previous analyses of optimization algorithms. More broadly, we believe that the concepts of hedging and multi-step descent have the potential to be powerful algorithmic paradigms in a variety of contexts in optimization and beyond.
This paper publishes and extends the first author’s 2018 Master’s Thesis (advised by the second author)—which established for the first time that judiciously choosing stepsizes can enable acceleration in convex optimization. Prior to this thesis, the only such result was for the special case of quadratic optimization, due to Young in 1953.