The heavy-ball method: Gradient-dominated and smooth

Problem setup

Consider the unconstrained minimization problem

\[\minimize_{x \in \calH} f(x) \quad \iff \quad \text{find } x \in \calH \;\text{ such that }\; 0 \in \partial f(x) = \set{\nabla f(x)},\]

where \(f : \calH \to \reals\) is \(\mu_{\textup{gd}}\)-gradient dominated and \(L\)-smooth with \(0<\mu_{\textup{gd}}\le L\).

For initial points \(x^{-1}, x^0 \in \calH\), momentum \(\delta \in \reals\), and step size \(\gamma \in \reals_{++}\), the heavy-ball update [Pol64] is

\[(\forall k \in \naturals)\quad x^{k+1} = x^k - \gamma \nabla f(x^k) + \delta(x^k - x^{k-1}).\]

In this example, we search for the smallest contraction factor \(\rho\in[0,1)\) provable using AutoLyap such that

\[f(x^k) - f(x^\star) = O(\rho^k) \quad \textup{ as } \quad k\to\infty,\]

where

\[x^\star \in \Argmin_{x \in \calH} f(x).\]

Model the problem in AutoLyap and search for the smallest rho

\(f\) is modeled as an intersection of GradientDominated and Smooth.
The optimization problem is represented by InclusionProblem.
The update rule is represented by HeavyBallMethod.
Function-value parameters are obtained with IterationIndependent.LinearConvergence.get_parameters_function_value_suboptimality.
The contraction factor is searched with IterationIndependent.LinearConvergence.bisection_search_rho.

Run the iteration-independent analysis

This example uses the MOSEK Fusion backend (backend="mosek_fusion"). Install the optional MOSEK dependency first:

pip install "autolyap[mosek]"

from autolyap import SolverOptions
from autolyap.algorithms import HeavyBallMethod
from autolyap.problemclass import GradientDominated, InclusionProblem, Smooth
from autolyap.iteration_independent import IterationIndependent

mu_gd = 0.5
L = 1.0
gamma = 1.0
delta = 0.5

problem = InclusionProblem(
    [
        [GradientDominated(mu_gd=mu_gd), Smooth(L=L)],  # f in intersection
    ]
)
algorithm = HeavyBallMethod(gamma=gamma, delta=delta)
solver_options = SolverOptions(backend="mosek_fusion")
# License-free option:
# solver_options = SolverOptions(backend="cvxpy", cvxpy_solver="CLARABEL")

# Build V(P, p, k) and R(T, t, k) for function-value suboptimality analysis.
P, p, T, t = IterationIndependent.LinearConvergence.get_parameters_function_value_suboptimality(
    algorithm,
    tau=0,
)

result = IterationIndependent.LinearConvergence.bisection_search_rho(
    problem,
    algorithm,
    P,
    T,
    p=p,
    t=t,
    solver_options=solver_options,
)

if result["status"] != "feasible":
    raise RuntimeError("No feasible Lyapunov certificate in the requested rho interval.")

print(f"rho (AutoLyap): {result['rho']:.8f}")

What to inspect in result:

result["status"]: feasible, infeasible, or not_solved.
result["solve_status"]: raw backend status.
result["rho"]: certified contraction factor when feasible.
result["certificate"]: Lyapunov certificate matrices/scalars.

Sweeping over 100 values of \(\gamma\) on \((0,2.30]\) and 100 values of \(\delta\) on \([-1,1]\) with \(\mu_{\textup{gd}}=0.5\) and \(L=1\) gives the plot below. Each dot is a feasible parameter pair, and the color encodes the certified contraction factor \(\rho\).

Heavy-ball feasible (gamma, delta) pairs for gradient-dominated smooth objectives, with color showing rho.

References

[Pol64]

B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964. doi:10.1016/0041-5553(64)90137-5.