|
NONLINEAR SYSTEMS IN ROBOTICS
Accelerated Zero-Order SGD under High-Order Smoothness and Overparameterized Regime
G. K. Bychkovab, D. M. Dvinskikhcad, A. V. Antsiferovaea, A. V. Gasnikovfgc, A. V. Lobanovcha a ISP RAS Research Center for Trusted Artificial Intelligence,
ul. Alexandra Solzhenitsyna 25, Moscow, 109004 Russia
b Lomonosov Moscow State University,
ul. Leninskiye Gory 1, Moscow, 119991 Russia
c Moscow Institute of Physics and Technology,
Institutskiy per. 9, Dolgoprudny, 141701 Russia
d National Research University Higher School of Economics,
Pokrovsky bul. 11, Moscow, 109028 Russia
e Institute for Artificial Intelligence, Lomonosov Moscow State University,
Lomonosovsky pr. 27/1, Moscow, 119192 Russia
f Innopolis University,
ul. Universitetskaya 1, Innopolis, 420500, Russia
g Institute for Information Transmission Problems of RAS,
per. Bolshoy Karetny 19, Moscow, 127051 Russia
h Skolkovo Institute of Science and Technology,
Bolshoy Boulevard 30, bld. 1, Moscow, 121205 Russia
Abstract:
We present a novel gradient-free algorithm to solve a convex stochastic optimization problem, such as those encountered in medicine, physics, and machine learning (e.g., the adversarial multi-armed bandit problem), where the objective function can only be computed through numerical simulation, either as the result of a real experiment or as feedback given by the function evaluations from an adversary. Thus, we suppose that only black-box access to the function
values of the objective is available, possibly corrupted by adversarial noise: deterministic or
stochastic. The noisy setup can arise naturally from modeling randomness within a simulation
or by computer discretization, or when exact values of the function are forbidden due to privacy
issues, or when solving nonconvex problems as convex ones with an inexact function oracle.
By exploiting higher-order smoothness, fulfilled, e.g., in logistic regression, we improve the
performance of zero-order methods developed under the assumption of classical smoothness (or
having a Lipschitz gradient). The proposed algorithm enjoys optimal oracle complexity and is
designed under an overparameterization setup, i.e., when the number of model parameters is
much larger than the size of the training dataset. Overparametrized models fit to the training
data perfectly while also having good generalization and outperforming underparameterized
models on unseen data. We provide convergence guarantees for the proposed algorithm under
both types of noise. Moreover, we estimate the maximum permissible adversarial noise level
that maintains the desired accuracy in the Euclidean setup, and then we extend our results to
a non-Euclidean setup. Our theoretical results are verified using the logistic regression problem.
Keywords:
zero-order optimization, gradient-free algorithms, high-order smoothness, kernel approximation, overparametrization
Received: 03.11.2024 Accepted: 17.12.2024
Citation:
G. K. Bychkov, D. M. Dvinskikh, A. V. Antsiferova, A. V. Gasnikov, A. V. Lobanov, “Accelerated Zero-Order SGD under High-Order Smoothness and Overparameterized Regime”, Rus. J. Nonlin. Dyn., 20:5 (2024), 759–788
Linking options:
https://www.mathnet.ru/eng/nd922 https://www.mathnet.ru/eng/nd/v20/i5/p759
|
Statistics & downloads: |
Abstract page: | 37 | Full-text PDF : | 10 | References: | 8 |
|