[CS.DS] Revolutionary: Negative Stepsizes Enable GDA Conv...

Efficient computation of min-max problems is a central question in optimization, learning, games, and control. Gradient-descent-ascent (GDA) is arguably the most natural algorithm. However, since the 1970s, conventional wisdom has held that GDA fails to converge even on simple problems, leading to extensive literature on modifications such as extragradients, optimism, momentum, and anchoring. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We demonstrate that all three properties are necessary for convergence, enabling GDA to converge on classical counterexamples (e.g., unconstrained convex-concave problems).

The core algorithmic intuition is that although negative stepsizes induce backward progress, they desynchronize the min and max variables, overcoming GDA's cycling issue, and lead to a slingshot phenomenon where forward progress in subsequent iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence, which is otherwise impossible for GDA. We interpret this as a second-order finite-differencing algorithm and show that it intriguingly approximately implements consensus optimization, a popular empirical algorithm for min-max problems involving deep neural networks (e.g., training GANs).

Blogger's Review: The negative stepsize strategy proposed in this paper offers a fresh perspective on GDA's convergence, challenging traditional views. By introducing time-varying asymmetric stepsizes, the authors effectively address the cycling issue, providing a novel solution for complex optimization problems, particularly with promising applications in deep learning.

[CS.DS] Revolutionary: Negative Stepsizes Enable GDA Convergence