Loading [MathJax]/jax/element/mml/optable/GeneralPunctuation.js
Успехи математических наук
RUS  ENG    ЖУРНАЛЫ   ПЕРСОНАЛИИ   ОРГАНИЗАЦИИ   КОНФЕРЕНЦИИ   СЕМИНАРЫ   ВИДЕОТЕКА   ПАКЕТ AMSBIB  
Общая информация
Последний выпуск
Архив
Импакт-фактор
Правила для авторов
Загрузить рукопись
Историческая справка

Поиск публикаций
Поиск ссылок

RSS
Последний выпуск
Текущие выпуски
Архивные выпуски
Что такое RSS



УМН:
Год:
Том:
Выпуск:
Страница:
Найти






Персональный вход:
Логин:
Пароль:
Запомнить пароль
Войти
Забыли пароль?
Регистрация


Успехи математических наук, 2023, том 78, выпуск 4(472), страницы 3–52
DOI: https://doi.org/10.4213/rm10081
(Mi rm10081)
 

Averaging and mixing for stochastic perturbations of linear conservative systems

G. Huangab, S. B. Kuksincb

a School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
b Peoples' Friendship University of Russia (RUDN University), Moscow, Russia
c Université Paris-Diderot (Paris 7), UFR de Mathématiques, Paris, France
Список литературы:
Аннотация: We study stochastic perturbations of linear systems of the form
dv(t)+Av(t)dt=εP(v(t))dt+εB(v(t))dW(t),vRD,
where A is a linear operator with non-zero imaginary spectrum. It is assumed that the vector field P(v) and the matrix function B(v) are locally Lipschitz with at most polynomial growth at infinity, that the equation is well posed and a few of first moments of the norms of solutions v(t) are bounded uniformly in ε. We use Khasminski's approach to stochastic averaging to show that, as ε0, a solution v(t), written in the interaction representation in terms of the operator A, for 0tConstε1 converges in distribution to a solution of an effective equation. The latter is obtained from () by means of certain averaging. Assuming that equation () and/or the effective equation are mixing, we examine this convergence further.
Bibliography: 27 titles.
Ключевые слова: averaging, mixing, stationary measures, effective equations, uniform in time convergence.
Финансовая поддержка Номер гранта
National Natural Science Foundation of China 20221300605
Министерство науки и высшего образования Российской Федерации 075-15-2022-1115
The first author was supported by National Natural Science Foundation of China (project No. 20221300605). Both authors were supported by the Ministry of Science and Higher Education of the Russian Federation (megagrant No. 075-15-2022-1115).
Поступила в редакцию: 25.06.2022
Англоязычная версия:
Russian Mathematical Surveys, 2023, Volume 78, Issue 4, Pages 585–633
DOI: https://doi.org/10.4213/rm10081e
Реферативные базы данных:
Тип публикации: Статья
УДК: 517.928.7+519.216
MSC: 34C29, 34F05, 34F10
Язык публикации: английский

Dedicated to the memory of M. I. Vishik on the occasion of his 100th birthday

1. Introduction

1.1. The setting and problems

The goal of this paper is to present an averaging theory for perturbations of conservative linear differential equations by locally Lipschitz nonlinearities and stochastic terms. Namely, we examine the stochastic equations

dv(t)+Av(t)dt=εP(v(t))dt+εB(v(t))dW(t),vRD,
where 0<ε1, A is a linear operator with non-zero pure imaginary eigenvalues {iλj} (so that the dimension D is even), P is a locally Lipschitz vector field on RD, W(t) is the standard Wiener process in RN and B(v) is a D×N matrix. We wish to study for small ε the behaviour of solutions of equation (1.1) on intervals of time of order ε1, and under some additional restriction on the equation we examine the limiting behaviour of solutions as ε0, uniformly in time.

1.2. Our results and their deterministic analogues

We have tried to make our work “reader-friendly” and accessible to people with just a limited knowledge of stochastic calculus. To achieve this in the main part of the paper we restrict ourselves to the case of equations with additive noise εBdW(t) and exploit there a technical convenience: we introduce a complex structure in RD, by rewriting the phase space RD as CD/2 (recall that D is even), in such a way that the operator A is diagonal in the corresponding complex coordinates: A=diag{iλj}.

General equations (1.1) are discussed in Section 8, where they are treated in parallel with the equations with additive noise considered previously.

As it is custom in the classical deterministic Krylov–Bogolyubov averaging (for example, see [4], [1], and [13]), to study solutions v(t)CD/2 we write them in the interaction representation, which preserves the norms of the complex components vj(τ), but amends their angles. (See the substitution (2.8) below.) The first principal result of the work is given by Theorem 4.7, where we assume uniform, in ε and in tCε1, bounds on a few of first moments of the norms of solutions. The theorem states that as ε0, for tCε1 solutions v(t), written in terms of the interaction representation, converge weakly in distribution to solutions of an additional effective equation. The latter is obtained from equation (1.1) by means of certain averaging of the vector field P in terms of the spectrum {iλj} and in many cases can be written down explicitly. The proof of Theorem 4.7, given in Section 4, is obtained by mean of a synthesis of the Krylov–Bogolyubov method (as it is presented, for example, in [13]) and Khasminski’s approach to stochastic averaging [16]; it can serve as an introduction to the latter. The number of works on stochastic averaging is immense (see Section 1.3 for some references). We were not able to find there the result of Theorem 4.7, but do not insist on its novelty (and certainly related statements can be found in the literature).

In Section 5 we suppose that the bounds, mentioned above, on the moments of the norms of solutions are uniform in time, and that equation (1.1) is mixing. So, as time tends to infinity, its solutions converge in distribution to a unique stationary measure (which is a Borel measure in RD=CD/2). In Theorem 5.5, postulating that the effective equation is mixing too, we prove that, as ε0, the stationary measure for equation (1.1) converges to that for the effective equation. Note that this convergence holds without passing to the interaction representation.

In the short Section 6 we discuss non-resonant systems (1.1) (where the frequencies {λj} are rationally independent). In particular, we show that then the actions Ij(v(t)) of solutions v(t) (see (1.2) below) converge in distribution, as ε0, to solutions of a system of stochastic equations depending only on actions. The convergence holds on time intervals 0tCε1.

In Section 7 we keep the assumption on the norms of solutions from Section 5. Assuming that the effective equation is mixing (but without assuming this for the original equation (1.1)) we prove there Theorem 7.4. It states that the convergence as in Theorem 4.7, our principal result, is uniform for t0 (and not only for tCε1).

In Proposition 9.4 we present a simple sufficient condition on equation (1.1), which is based on results in [17], which ensures that Theorems 4.7, 5.5, and 7.4 apply to it.

In Section 8 we go over to the general equations (1.1), where the dispersion matrix B depends on v. Assuming the same estimates on solutions as in Section 4 we show that Theorem 4.7 remains valid if either the matrix B(v) is non-singular, or it is a C2-smooth function of v. Theorems 5.5 and 7.4 also remain true for the general systems (1.1), but we do not discuss this, hoping that the corresponding modifications of the proofs should be clear after reading Section 8.

A deterministic analogue of our results, which deals with equation (1.1) for W=0 and describes the behaviour of its solutions on time intervals of order ε1 in the interaction representation in comparison with solutions of the corresponding effective equation, is given by Krylov–Bogolyubov averaging; see [4], [1], [13] (Theorem 4.7 also applies to such equations, but then its assertion becomes unnatural). Theorem 5.5 has no analogues for deterministic systems, but Theorem 7.4 has. Namely, it is known for Krylov–Bogolyubov averaging that if the effective equation has a globally asymptotically stable equilibrium, then the convergence of solutions of equation (1.1)W=0, written in the interaction representation, to solutions of the effective equation, is uniform in time. This result is known in folklore as the second Krylov–Bogolyubov theorem and can be found in [6].

The Krylov–Bogolyubov method and Khasminski’s approach to averaging which we exploit are flexible tools. They are applicable to various stochastic systems in finite and infinite dimension, including stochastic PDEs, and the particular realization of the two methods that we use now is inspired by our previous work on averaging for stochastic PDEs. See [12], [19] for an analogue of Theorem 4.7 for stochastic PDEs, [12] for an analogue of Theorem 5.5, and [11] for an analogue of Theorem 7.4 (also see [7] for more results and references on averaging for stochastic PDEs).

1.3. Relation to classical stochastic averaging

Averaging in stochastic systems is a well-developed topic, usually dealing with fast-slow stochastic systems (see, for example, [16], [10; § 7], [23; § II.3], [21], [25], [18], and the references therein). To explain the relation of that theory to our work let us write equation (1.1) in the complex form v(t)Cn, n=D/2 (when the operator A is diagonal) and then pass to the slow time τ=εt and the action-angle coordinates (I,φ)=(I1,,In;φ1,,φn)Rn+×Tn, where R+={xR:x0}, Tn=Rn/(2πZn), and

Ik(v)=12|vk|2=12vkˉvk,φk(v)=ArgvkS1=R/(2πZ),k=1,,n
(if vk=0, then we set φk(v)=0S1). In these coordinates equation (1.1) takes the form
{dI(τ)=PI(I,φ)dτ+ΨI(I,φ)dβ(τ),dφ(τ)+ε1Λdτ=Pφ(I,φ)dτ+Ψφ(I,φ)dβ(τ).
Here β=(β1,,βN), where {βl} are independent standard real Wiener processes, and the coefficients of the system are given by Itô’s formula. This is a fast-slow system with slow variable I and fast variable φ. Stochastic averaging treats systems like (1.3), usually adding a non-degenerate stochastic term of order ε1/2 to the fast part of the φ-equation. The (first) goal of an analysis of a system is usually to prove that on time intervals 0τT the distributions of the I-components of solutions converge as ε0 to the distributions of solutions of a suitably averaged I-equation. After that other goals can be pursued1.

Unfortunately, stochastic averaging does not apply directly to systems (1.3), coming from equations (1.1), since then the coefficients of the φ-equation have singularities when some Ik vanish, and since the fast φ-equation is rather degenerate if the vector Λ is resonant. Instead we borrow Khasminski’s method [16] for stochastic averaging from the theory and apply it to equation (1.1) written in the interaction representation, thus arriving at the assertion of Theorem 4.7. Averaging theorem for stationary solutions of equation (1.3) and for the corresponding stationary measures are known in stochastic averaging, but (of course) they control only the limiting behaviour of the I-components of the stationary solutions and measures, while our Theorem 5.5 describes the limit of the whole stationary measure. It seems that no analogue of Theorem 7.4 is known in stochastic averaging.

At the origin of this paper are lecture notes for an online course that SK was teaching in the Shandong University (PRC) in the autumn term of the year 2020.

Notation

For a Banach space E and R>0 we denote by BR(E) the open R-ball {eE:|e|E<R} and by ¯BR(E) its closure {|e|ER}; Cb(E) denotes the space of bounded continuous functions on E, and C([0,T];E) is the space of continuous curves [0,T]E endowed with the sup-norm. For any 0<α1 and uC([0,T];E),

This is a norm in the Hölder space C^\alpha([0,T];E). The standard C^m-norm for C^m-smooth functions on E is denoted by |\cdot|_{C^m(E)}. We use the notation \mathcal{D}(\xi) for the law of the random variable \xi, the symbol \rightharpoonup denotes weak convergence of measures, and \mathcal{P}(M) is the space of Borel measures on the metric space M. For a measurable mapping F\colon M_1\to M_2 and \mu\in \mathcal{P}(M_1) we denote by F\circ\mu\in \mathcal{P}(M_2) the image of \mu under F; that is, F\circ\mu(Q)=\mu(F^{-1}(Q)).

If m\geqslant0 and L is \mathbb{R}^n or \mathbb{C}^n, then \operatorname{Lip}_m(L, E) is the set of maps F\colon L \to E such that for any R\geqslant1 we have

\begin{equation} (1+|R|)^{-m} \Bigl(\operatorname{Lip}\bigl(F|_{\overline{B}_R(L)}\bigr)+\sup_{v\in \overline{B}_R(L)}|F(v)|_E\Bigr) =:\mathcal{C}^m(F) <\infty, \end{equation} \tag{1.5}
where \operatorname{Lip}(f) is the Lipschitz constant of the map f (note that, in particular, |F(v)|_E \leqslant \mathcal{C}^m(F) (1+ |v|_L)^m for any v\in L). For a complex matrix A=(A_{ij}), A^*= (A^*_{ji}) denotes its Hermitian conjugate: A^*_{ij}=\bar A_{ji} (so for a real matrix B, B^* is the transposed matrix). For a set Q we denote by \mathbf{1}_Q its indicator function, and by Q^c its complement. Finally, \mathbb{R}_+ (\mathbb{Z}_+) is the set of non-negative real numbers (non-negative integers), and for real numbers a and b, a\vee b and a\wedge b indicate their maximum and minimum.

2. Linear systems and their perturbations

In this section we present the setting of the problem and specify our assumptions on the operator A, vector field P and noise \sqrt\varepsilon\,\mathcal{B}(v)\,dW in equation (1.1). To simplify the presentation and explain better the ideas, in the main part of the text we assume that the noise is additive, that is, \mathcal{B} is a constant (possibly singular) matrix. We discuss the general equations (1.1) in Section 8.

2.1. Assumptions on A and {W}(t)

We assume that the unperturbed linear system

\begin{equation} \frac d{dt}v +Av=0, \qquad v\in\mathbb{R}^D, \end{equation} \tag{2.1}
is such that all of its trajectories are bounded as t\to\pm\infty. Then the eigenvalues of A are pure imaginary, go in pairs \pm i\lambda_j, and A has no Jordan cells. We also assume that A is invertible. So

By these assumptions D=2n, and there exists a basis \{\mathbf{e}_1^+,\mathbf{e}_1^-,\dots, \mathbf{e}_n^+, \mathbf{e}_n^-\} in \mathbb{R}^{2n} in which the linear operator A takes the block-diagonal form:

\begin{equation*} A= \begin{pmatrix} \begin{matrix}0&-\lambda_1\\ \lambda_1&0\end{matrix}&&0\\ &\ddots&\\ 0&&\begin{matrix}0&-\lambda_n\\\lambda_n&0\end{matrix} \end{pmatrix}. \end{equation*} \notag
We denote by (x_1,y_1,\dots, x_n,y_n) the coordinates corresponding to this basis, and for j=1,\dots,n we set z_j=x_j+iy_j. Then \mathbb{R}^{2n} becomes the space of complex vectors (z_1,\dots,z_n), that is, \mathbb{R}^{2n}\simeq\mathbb{C}^n. In the complex coordinates the standard inner product in \mathbb{R}^{2n} reads
\begin{equation} \langle z,z'\rangle =\operatorname{Re}\sum_{j=1}^nz_j\bar{z}_j',\qquad z,z'\in\mathbb{C}^n. \end{equation} \tag{2.2}
Let us denote by
\begin{equation*} \Lambda=(\lambda_1,\dots,\lambda_n)\in(\mathbb{R}\setminus\{0\})^n \end{equation*} \notag
the frequency vector of the linear system (2.1). Then in the complex coordinates z the operator A reads
\begin{equation*} Az=\operatorname{diag}\{i\Lambda\} z, \end{equation*} \notag
where \operatorname{diag}\{i\Lambda\} is the diagonal operator, sending (z_1,\dots,z_n) to (i\lambda_1z_1,\dots,i\lambda_nz_n). Therefore, in \mathbb{R}^{2n} written as the complex space \mathbb{C}^n linear equation (2.1) takes the diagonal form
\begin{equation*} \frac d{dt}v_k+i\lambda_k v_k=0, \qquad 1\leqslant k\leqslant n. \end{equation*} \notag
Below we examine the perturbed equation (1.1) using these complex coordinates.

Next we discuss the random process W(t) expressed in the complex coordinates. The standard complex Wiener process has the form

\begin{equation} \beta^c(t) =\beta^+(t)+i\beta^-(t)\in\mathbb{C}, \end{equation} \tag{2.3}
where \beta^+(t) and \beta^-(t) are independent standard (real) Wiener processes, defined on some probability space (\Omega,\mathcal{F},\mathsf{P}). Then \bar\beta^c(t)=\beta^+(t)-i\beta^-(t), and any Wiener process W(t)\in\mathbb{C}^n can conveniently be written in the complex form as
\begin{equation} W_k =\sum_{l=1}^{n_1}\Psi_{kl}^1\beta^c_l +\sum_{l=1}^{n_1}\Psi_{kl}^2 \bar\beta^c_l, \qquad k=1,\dots,n, \end{equation} \tag{2.4}
where \Psi^1=(\Psi_{kl}^1) and \Psi^2=(\Psi_{kl}^2) are complex n\times n_1 matrices and \{\beta^c_l\} are independent standard complex Wiener processes. Again, in order to simplify the presentation, we suppose below that the noise in (1.1) is of the form
\begin{equation*} W_k(t) =\sum_{l=1}^{n_1}\Psi_{kl}\beta^c_l(t), \qquad k=1,\dots,n. \end{equation*} \notag
We do not assume that the matrix \Psi is non-singular (in particular, it may be zero). Then the perturbed equation (1.1) in the complex coordinates reads as
\begin{equation} d v_k+i\lambda_kv_k\,dt =\varepsilon P_k(v)\,dt +\sqrt{\varepsilon}\,\sum_{l=1}^{n_1}\Psi_{kl}\,d\beta^c_l(t), \qquad k=1,\dots,n, \end{equation} \tag{2.5}
where v=(v_1,\dots,v_n)\in\mathbb{C}^n and 0<\varepsilon\leqslant1.

The results obtained below for equation (2.5) remain true for the general equations (1.1) at the price of heavier calculation. The corresponding argument is sketched in Section 8.

2.2. Assumptions on P and on the perturbed equation

Our first goal is to study equation (2.5) for 0<\varepsilon\ll 1 on a time interval 0\leqslant t \leqslant \varepsilon^{-1}T, where T>0 is a fixed constant. Introducing the slow time

\begin{equation*} \tau=\varepsilon t \end{equation*} \notag
we write the equation as
\begin{equation} \begin{gathered} \, dv_k(\tau)+i\varepsilon^{-1}\lambda_kv_k\,d\tau =P_k(v)\,d\tau+\sum_{l=1}^{n_1}\Psi_{kl}\,d\tilde\beta^c_l(\tau), \\ \quad k=1,\dots,n, \quad 0\leqslant \tau\leqslant T. \notag \end{gathered} \end{equation} \tag{2.6}
Here \{\tilde\beta^c_l(\tau),\,l=1,\dots,n_1\} is another set of independent standard complex Wiener processes, which we now re-denote back to \{\beta^c_l(\tau),\,l=1,\dots,n_1\}. We stress that the equation above is nothing but the original equation (1.1), where its linear part (2.1) is conservative and non-degenerate in the sense of conditions 1) and 2), written in the complex coordinates and slow time. So all results below concerning equation (2.6) (and equation (2.11)) may be reformulated for equation (1.1) at the price of heavier notation.

Let us formulate the assumptions concerning the well-posedness of equation (2.6) which hold throughout this paper.

Assumption 2.1. (a) The drift P(v)=(P_1(v),\dots,P_n(v)) is a locally Lipschitz vector field, belonging to \operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n) for some m_0\geqslant0 (see (1.5)).

(b) For any v_0\in\mathbb{C}^n equation (2.6) has a unique strong solution v^\varepsilon(\tau;v_0), \tau\in[0,T], which is equal to v_0 at \tau=0. Moreover, there exists m_0'>(m_0\vee1) such that

\begin{equation} \mathsf{E}\sup_{0\leqslant\tau\leqslant T}|v^\varepsilon(\tau;v_0)|^{2 m'_0} \leqslant C_{m'_0}(|v_0|,T)<\infty \quad \forall\,0<\varepsilon\leqslant1, \end{equation} \tag{2.7}
where C_{m'_0}(\,{\cdot}\,) is a non-negative continuous function on \mathbb{R}_+^2, which is non-decreasing in both arguments.

Our proofs generalize easily to the case when the vector field P is locally Lipschitz and satisfies |P(v)| \leqslant C (1+|v|)^{m_0} for all v and some C>0 and m_0\geqslant0 (see [13] for averaging in deterministic perturbation of equation (2.1) by locally Lipschitz vector fields). In this case the argument remains essentially the same (but becomes a bit longer), and the constants in estimates depend not only on m_0, but also on the local Lipschitz constant of P, which is a function R\mapsto\operatorname{Lip}(P|_{\overline{B}_R(\mathbb{C}^n)}).

Below T>0 is fixed and the dependence of constants on T is usually not indicated. Solutions of (2.6) are assumed to be strong unless otherwise stated. As usual, strong solutions are understood in the sense of an integral equation. That is, v^\varepsilon(\tau;v_0)= v(\tau), 0\leqslant\tau\leqslant T, is a strong solution, equal to v_0 at \tau=0, if

\begin{equation*} v_k(\tau)+\int_0^\tau\bigl(i\varepsilon^{-1}\lambda_kv_k(s)-P_k(v(s))\bigr)\,ds =v_{0k}+\sum_{l=1}^{n_1}\Psi_{kl}\beta^c_l(\tau), \qquad k=1,\dots,n, \end{equation*} \notag
almost surely for 0\leqslant\tau\leqslant T.

2.3. The interaction representation

Now in (2.6) we go over to the interaction representation, which means that we substitute in

\begin{equation} v_k(\tau) =e^{-i\tau\varepsilon^{-1}\lambda_k}a_k(\tau), \qquad k=1,\dots,n. \end{equation} \tag{2.8}
Then v_k(0)=a_k(0), and we obtain the following equations for variables a_k(\tau):
\begin{equation} da_k(\tau) =e^{i\tau\varepsilon^{-1}\lambda_k}P_k(v) +e^{i\tau\varepsilon^{-1}\lambda_k}\sum_{l=1}^{n_1}\Psi_{kl}\,d\beta^c_l(\tau), \qquad k=1,\dots,n. \end{equation} \tag{2.9}
The actions I_k=|a_k|^2/2 for solutions of (2.9) are the same as the actions for solutions of (2.6). It comparison to (2.6), in (2.9) we removed the large term \varepsilon^{-1}\operatorname{diag}(i\Lambda)v from the drift at the price that now coefficients of the system are fast oscillating functions of \tau.

To rewrite the above equations conveniently we introduce the rotation operators \Phi_w: for each real vector w=(w_1,\dots,w_n)\in\mathbb{R}^n we set

\begin{equation} \Phi_w\colon \mathbb{C}^n\to\mathbb{C}^n, \quad \Phi_w=\operatorname{diag}\{e^{iw_1},\dots,e^{iw_n}\}. \end{equation} \tag{2.10}
Then
\begin{equation*} (\Phi_w)^{-1}=\Phi_{-w},\quad \Phi_{w_1}\circ\Phi_{w_2}=\Phi_{w_1+w_2},\quad \Phi_0=\operatorname{id}, \end{equation*} \notag
where each \Phi_w is a unitary transformation, so that \Phi_w^*=\Phi_w^{-1}. Moreover,
\begin{equation*} |(\Phi_wz)_j|=|z_j|\quad \forall\,z,w,j. \end{equation*} \notag
In terms of the operators \Phi we write v(\tau) as \Phi_{\tau\varepsilon^{-1}\Lambda} a(\tau), and we write system (2.9) as
\begin{equation} da(\tau) =\Phi_{\tau \varepsilon^{-1}\Lambda} P(\Phi_{-\tau \varepsilon^{-1}\Lambda}a(\tau))\,d\tau +\Phi_{\tau \varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau), \qquad a(\tau)\in\mathbb{C}^n, \end{equation} \tag{2.11}
where \beta^c(\tau)=(\beta^c_1(\tau),\dots,\beta^c_{n_1}(\tau)). This is the equation which are going to study for small \varepsilon for 0\leqslant\tau\leqslant T, under the initial condition
\begin{equation} a(0)=v(0)=v_0. \end{equation} \tag{2.12}
The solution a^\varepsilon(\tau; v_0)=\Phi_{-\tau\varepsilon^{-1}\Lambda} v^\varepsilon(\tau; v_0) of (2.11), (2.12) also satisfies estimate (2.7), for each \varepsilon\in(0,1].

We recall that a C^1-diffeomorphism G of \mathbb{C}^{n} transforms a vector field V into the field G_*V, where (G_*V)(v)=dG(u)(V(u)) for u=G^{-1}v. In particular,

\begin{equation*} \bigl((\Phi_{\tau\varepsilon^{-1}\Lambda })_*P\bigr)(v) =\Phi_{\tau\varepsilon^{-1}\lambda_k}\circ P(\Phi_{-\varepsilon\tau\Lambda}v). \end{equation*} \notag
So equation (2.11) can be written as
\begin{equation*} da(\tau) =\bigl((\Phi_{\tau\varepsilon^{-1}\Lambda})_*P\bigr)(a(\tau))\,d\tau +\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau). \end{equation*} \notag

2.4. Compactness

For 0<\varepsilon\leqslant1 we denote by a^\varepsilon(\tau;v_0) the solution of equation (2.11) which is equal to v_0 at \tau=0. Then

\begin{equation*} a^\varepsilon(\tau;v_0) =\Phi_{\tau\varepsilon^{-1}\lambda_k}v^\varepsilon(\tau;v_0). \end{equation*} \notag
A unique solution v^\varepsilon(\tau;v_0) of (2.6) exists by Assumption 2.1, so the solution a^\varepsilon(\tau;v_0) also exists and is unique. Our goal is to examine its law
\begin{equation*} Q_\varepsilon :=\mathcal{D}(a^\varepsilon(\,\cdot\,;v_0)) \in\mathcal{P}(C([0,T];\mathbb{C}^n)) \end{equation*} \notag
as \varepsilon\to0. When v_0 is fixed, we will usually write a^\varepsilon(\tau;v_0) as a^\varepsilon(\tau).

Lemma 2.2. Under Assumption 2.1 the set of probability measures \{Q_\varepsilon, \,\varepsilon\in(0,1]\} is pre-compact in the weak topology in \mathcal{P}(C([0,T];\mathbb{C}^{n})).

Proof. We denote the random force in (2.11) by d\zeta^\varepsilon(\tau):
\begin{equation*} d\zeta^\varepsilon(\tau) :=\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau), \end{equation*} \notag
where \zeta^\varepsilon(\tau)=(\zeta_l^\varepsilon(\tau),\,l=1,\dots,n_1). For any k we have
\begin{equation*} \zeta^\varepsilon_k(\tau) =\int_0^\tau d\zeta_k^\varepsilon =\int_0^\tau e^{is\varepsilon^{-1}\lambda_k} \sum_{l=1}^{n_1}\Psi_{kl}\,d\beta^c_l(s). \end{equation*} \notag
So \zeta^\varepsilon(\tau) is a stochastic integral of a non-random vector function. Hence it is a Gaussian random process with zero mean value, and its increments over disjoint time intervals are independent. For each k
\begin{equation*} \mathsf{E}|\zeta_k^\varepsilon(\tau)|^2 =\int_0^\tau2 \sum_{l=1}^{n_1}|\Psi_{kl}|^2\,ds =:2C_k^\zeta\tau, \qquad C_k^\zeta=\sum_{l=1}^{n_1}|\Psi_{kl}|^2\geqslant0, \end{equation*} \notag
and \mathsf{E}\zeta_k^\varepsilon(\tau)\zeta_j^\varepsilon(\tau) =\mathsf{E}\bar\zeta_k^\varepsilon(\tau)\bar\zeta_j^\varepsilon(\tau)=0. Therefore, \zeta_k^\varepsilon(\tau)=C_k^\zeta \beta^c_k(\tau), where by Lévy’s theorem (see [14; p. 157]) \beta^c_k(\tau) is a standard complex Wiener process. However, the processes \zeta^\varepsilon_j and \zeta^\varepsilon_k with j\neq k are not necessarily independent.

By the basic properties of Wiener process, the curve

\begin{equation*} [0,T]\ni\tau\mapsto\zeta^\varepsilon(\omega,\tau)\in\mathbb{C}^n \end{equation*} \notag
is almost surely Hölder-continuous with exponent 1/3, and since C_k^\zeta does not depend on \varepsilon, we have (abbreviating C^{1/3}([0,T];\mathbb{C}^n) to C^{1/3})
\begin{equation*} \mathsf{P}\bigl(\zeta^\varepsilon(\,{\cdot}\,)\in \overline{B}_R(C^{1/3})\bigr) \to1 \quad \text{as }\ R\to\infty, \end{equation*} \notag
uniformly in \varepsilon. Let us write equation (2.11) as
\begin{equation*} da^\varepsilon(\tau) =V^\varepsilon(\tau)\,d\tau+d\zeta^\varepsilon(\tau). \end{equation*} \notag
By Assumption 2.1 and since |a^\varepsilon(\tau)|\equiv |v^\varepsilon(\tau)|, we have
\begin{equation*} \mathsf{E}\sup_{\tau\in[0,T]}|V^\varepsilon(\tau)| \leqslant\mathcal{C}^{m_0}(P)\, \mathsf{E}\Bigl(1+\sup_{\tau\in[0,T]}|v^\varepsilon(\tau)|\Bigr)^{m_0} \leqslant C(|v_0|)<\infty. \end{equation*} \notag
Therefore, by Chebyshev’s inequality,
\begin{equation*} \mathsf{P}\Bigl(\sup_{\tau\in[0,T]}|V^\varepsilon(\tau)|>R\Bigr) \leqslant C(|v_0|) R^{-1}, \end{equation*} \notag
uniformly in \varepsilon\in(0,1]. Since
\begin{equation*} a^\varepsilon(\tau) =v_0+\int_0^\tau V^\varepsilon(s)\,ds+\zeta^\varepsilon(\tau), \end{equation*} \notag
from the above we get that
\begin{equation} \mathsf{P}\bigl(\|a^\varepsilon(\,{\cdot}\,)\|_{1/3}>R\bigr) \to0 \quad\text{as }\ R\to\infty, \end{equation} \tag{2.13}
uniformly in \varepsilon\in(0,1]. By the Ascoli–Arzelà theorem the sets \overline{B}_R(C^{1/3}) are compact in C([0,T];\mathbb{C}^n), and in view of (2.13), for any \delta>0 there exists R_\delta such that
\begin{equation*} Q_\varepsilon \bigl(\overline{B}_{R_{\delta}}(C^{1/3})\bigr)\geqslant1-\delta \quad \forall\,\varepsilon>0. \end{equation*} \notag
So by Prohorov’s theorem the set of measures \{Q_\varepsilon,0<\varepsilon\leqslant1\} is pre-compact in \mathcal{P}(C([0,T];\mathbb{C}^n)). The lemma is proved.

By this lemma, for any sequence \varepsilon_l\to0 there exist a subsequence \varepsilon_l'\to0 and a measure Q_0\in\mathcal{P}(C([0,T];\mathbb{C}^n)) such that

\begin{equation} Q_{\varepsilon_l'} \rightharpoonup Q_0 \quad\text{as }\ \varepsilon_l'\to0. \end{equation} \tag{2.14}

3. Averaging vector fields with respect to the frequency vector

For a vector field \widetilde{P}\in\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n) we denote

\begin{equation*} Y_{\widetilde{P}}(a;t) =\bigl((\Phi_{t\Lambda})_*\widetilde{P}\bigr)(a) =\Phi_{t\Lambda}\circ\widetilde{P}(\Phi_{-t\Lambda }a), \qquad a\in\mathbb{C}^n, \quad t\in\mathbb{R}, \end{equation*} \notag
and for T'>0 we define the partial averaging \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'} of the vector field \widetilde{P} with respect to the frequency vector \Lambda as follows:
\begin{equation} \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a) =\frac{1}{T'} \int_0^{T'}Y_{\widetilde{P}}(a;t)\,dt =\frac{1}{T'} \int_0^{T'}\Phi_{t\Lambda}\circ\widetilde{P}(\Phi_{-t\Lambda}a)\,dt. \end{equation} \tag{3.1}

Lemma 3.1. For any T'>0

\begin{equation*} \langle\!\langle {\widetilde{P}} \rangle\!\rangle ^{T'}(a) \in\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n) \quad{\rm and}\quad \mathcal{C}^{m_0} ( \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}) \leqslant\mathcal{C}^{m_0}(\widetilde{P}) \end{equation*} \notag
(see (1.5)).

Proof. If a\in \overline{B}_R(\mathbb{C}^n), then \Phi_{-t\Lambda }a\in \overline{B}_R(\mathbb{C}^n) for each t. So
\begin{equation*} |Y_{\widetilde{P}}(a;t)| =|(\Phi_{t\Lambda})_* \widetilde{P}(a)| =|\widetilde{P}(\Phi_{-t\Lambda}a)|, \end{equation*} \notag
and thus
\begin{equation*} | \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a)| \leqslant\sup_{0\leqslant t\leqslant {T'}}|Y_{\widetilde{P}}(a;t)| \leqslant\mathcal{C}^{m_0}(\widetilde{P})(1+ R)^{m_0}. \end{equation*} \notag
Similarly, for any a_1,a_2\in\overline{B}_R(\mathbb{C}^n),
\begin{equation*} \begin{aligned} \, |Y_{\widetilde{P}}(a_1;t)-Y_{\widetilde{P}}(a_2;t)| & =|\widetilde{P}(\Phi_{-t\Lambda }a_1)-\widetilde{P}(\Phi_{-t\Lambda }a_2)|\\ & \leqslant\mathcal{C}^{m_0}(\widetilde{P})(1+R)^{m_0}|a_2-a_1| \quad \forall\,t\geqslant0, \end{aligned} \end{equation*} \notag
so that
\begin{equation*} \bigl| \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a_1)- \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a_1)\bigr| \leqslant\mathcal{C}^{m_0}(\widetilde{P})(1+R)^{m_0}|a_1-a_1|. \end{equation*} \notag
This proves the assertion.

We define the averaging of the vector field \widetilde{P} with respect to the frequency vector \Lambda by

\begin{equation} \langle\!\langle \widetilde{P} \rangle\!\rangle (a) =\lim_{T'\to\infty} \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a) =\lim_{T'\to\infty}\frac{1}{T'}\int_0^{T'}(\Phi_{t\Lambda})_*\widetilde{P}(a)\,dt. \end{equation} \tag{3.2}

Lemma 3.2. (1) The limit (3.2) exists for any a. Moreover, \langle\!\langle \widetilde{P} \rangle\!\rangle belongs to \operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n) and \mathcal{C}^{m_0}( \langle\!\langle \widetilde{P} \rangle\!\rangle ) \leqslant \mathcal{C}^{m_0}(\widetilde{P}).

(2) If a\in \overline{B}_R(\mathbb{C}^n), then the rate of convergence in (3.2) does not depend on a, but only depends on R.

This is the main lemma of deterministic averaging for vector fields. See [13; Lemma 3.1] for its proof2.

The averaged vector field \langle\!\langle \widetilde{P} \rangle\!\rangle is invariant with respect to the transformations \Phi_{\theta\Lambda}.

Lemma 3.3. For all a\in \mathbb{C}^n and \theta \in \mathbb{R},

\begin{equation*} \bigl(\Phi_{\theta\Lambda}\bigr)_* \langle\!\langle \widetilde{P} \rangle\!\rangle (a) \equiv \Phi_{\theta\Lambda}\circ \langle\!\langle \widetilde{P} \rangle\!\rangle \circ\Phi_{-\theta\Lambda}(a) = \langle\!\langle \widetilde{P} \rangle\!\rangle (a). \end{equation*} \notag

Proof. For definiteness let \theta>0. For any {T'}>0 we have
\begin{equation*} \begin{aligned} \, \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(\Phi_{-\theta\Lambda}(a)) & =\frac{1}{T'} \int_0^{T'}\Phi_{t\Lambda }\circ \widetilde{P}(\Phi_{-t\Lambda} \circ\Phi_{-\theta\Lambda}(a))\,dt \\ & =\frac{1}{T'}\int_0^{T'}\Phi_{t\Lambda} \circ\widetilde{P}(\Phi_{-(t+\theta)\Lambda}a)\,dt. \end{aligned} \end{equation*} \notag
Since \Phi_{t\Lambda }=\Phi_{-\theta\Lambda}\circ \Phi_{(t+\theta)\Lambda}, this equals
\begin{equation*} \frac{1}{T'}\, \Phi_{-\theta\Lambda}\biggl(\int_0^{T'}\Phi_{(t+\theta)\Lambda} \circ\widetilde{P}(\Phi_{-(t+\theta)\Lambda}a)\,dt\biggr) = \Phi_{-\theta\Lambda }\circ \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a) +O\biggl(\frac{1}{T'}\biggr). \end{equation*} \notag
Passing to the limit as T'\to\infty we obtain the assertion.

The statement below asserts that the averaged vector field \langle\!\langle {P} \rangle\!\rangle is at least as smooth as P.

Proposition 3.4. If P\in C^m(\mathbb{C}^n) for some m\in\mathbb{N}, then \langle\!\langle {P} \rangle\!\rangle \in C^m(\mathbb{C}^n) and | \langle\!\langle {P} \rangle\!\rangle |_{C^m(\overline{B}_R)}\leqslant|P|_{C^m(\overline{B}_R)} for all R>0.

Proof. First fix some R>0. Then there exists a sequence of polynomial vector fields \{P_{R,j},\,j\in\mathbb{N}\} (cf. subsection 3.1.3) such that |P_{R,j}-P|_{C^m(\overline{B}_R)}\to0 as j\to\infty. An easy calculation shows that
\begin{equation} | \langle\!\langle P_{R,j} \rangle\!\rangle ^T- \langle\!\langle P_{R,j} \rangle\!\rangle |_{C^m(\overline{B}_R)} \to0 \quad \text{as }\ T\to\infty, \end{equation} \tag{3.3}
for each j. Since the transformations \Phi_{t\Lambda} are unitary, differentiating the integral in (3.1) in a we get that
\begin{equation} | \langle\!\langle \widetilde{P} \rangle\!\rangle ^T|_{C^m(\overline{B}_R)} \leqslant|\widetilde{P}|_{C^m(\overline{B}_R)} \quad \forall\,T>0, \end{equation} \tag{3.4}
for any C^m-smooth vector field \widetilde{P}. Therefore,
\begin{equation} \begin{aligned} \, | \langle\!\langle P_{R,j} \rangle\!\rangle ^T- \langle\!\langle P \rangle\!\rangle ^T|_{C^m(\overline{B}_R)} & \leqslant|P_{R,j}-P|_{C^m(\overline{B}_R)} \notag\\ & =:\kappa_j\to 0 \quad \text{as }\ j\to\infty, \quad \forall\,T>0. \end{aligned} \end{equation} \tag{3.5}
So
\begin{equation*} | \langle\!\langle P_{R,j} \rangle\!\rangle ^T- \langle\!\langle P_{R,k} \rangle\!\rangle ^T|_{C^m(\overline{B}_R)} \leqslant 2\kappa_{j\wedge k} \quad \forall\,T>0. \end{equation*} \notag
From this estimate and (3.3) we find that
\begin{equation*} | \langle\!\langle P_{R,j} \rangle\!\rangle - \langle\!\langle P_{R,k} \rangle\!\rangle |_{C^m(\overline{B}_R)} \leqslant 2\kappa_{j\wedge k}. \end{equation*} \notag
Thus \{ \langle\!\langle P_{R,j} \rangle\!\rangle \} is a Cauchy sequence in C^m(\overline{B}_R). So it C^m-converges to a limiting field \langle\!\langle P_{R,\infty} \rangle\!\rangle . As P_{R,j} converges to P in C^m(\overline{B}_R), using (3.4) again, we find that | \langle\!\langle P_{R,\infty} \rangle\!\rangle |_{C^m(\overline{B}_R)}\leqslant|P|_{C^m(\overline{B}_R)}. But by Lemma 3.2 \langle\!\langle P_{R,\infty} \rangle\!\rangle must be equal to \langle\!\langle P \rangle\!\rangle . Since R>0 is arbitrary, the assertion of the proposition follows.

Finally, we note that if a vector field P is Hamiltonian, then its averaging \langle\!\langle P \rangle\!\rangle also is. Looking ahead we state the corresponding result here, despite the fact that the averaging of functions \langle\,\cdot\,\rangle is defined in subsection 3.2 below.

Proposition 3.5. If a locally Lipschitz vector field P is Hamiltonian, that is,

\begin{equation*} P(z) =i\frac{\partial}{\partial \bar{z}}H(z) \end{equation*} \notag
for some C^1-smooth function H, then \langle\!\langle P \rangle\!\rangle is also Hamiltonian and
\begin{equation*} \langle\!\langle P \rangle\!\rangle =i \frac{\partial}{\partial \bar{z}}\langle H\rangle. \end{equation*} \notag

For a proof see [13; Theorem 5.2].

3.1. Calculating averagings

3.1.1.

The frequency vector \Lambda= (\lambda_1,\dots,\lambda_n) is called completely resonant if its components \lambda_j are proportional to some \lambda>0, that is, if \lambda_j/\lambda \in\mathbb{Z} for all j. In this case all trajectories of the original linear system (2.1) are periodic, the operator \Phi_{t\Lambda} is 2\pi/\lambda-periodic in t and so

\begin{equation} \langle\!\langle \widetilde{P} \rangle\!\rangle (a) = \langle\!\langle \widetilde{P} \rangle\!\rangle ^{2\pi/\lambda}(a) =\frac{\lambda}{2\pi}\int_0^{2\pi/\lambda}(\Phi_{t\Lambda})_*\widetilde{P}(a)\,dt. \end{equation} \tag{3.6}

Completely resonant linear systems (2.1) and their perturbations (1.1) often occur in applications. In particular, they occur in non-equilibrium statistical physics. There the dimension D=2n is large, all the \lambda_j are equal, and the Wiener process W(t) in (1.1) can be very degenerate (it may have just two non-zero components). For example, see [9], where more references can be found.

3.1.2.

Consider the case opposite to the above and assume that the frequency vector \Lambda is non-resonant:

\begin{equation} \sum_{j=1}^nm_j\lambda_j\neq0 \quad \forall (m_1,\dots,m_n)\in\mathbb{Z}^n\setminus\{0\} \end{equation} \tag{3.7}
(that is, the real numbers \lambda_j are rationally independent). Then
\begin{equation} \langle\!\langle \widetilde{P} \rangle\!\rangle (a) =\frac{1}{(2\pi)^n}\int_{\mathbb{T}^n} (\Phi_w)_*\widetilde{P}(a)\,dw, \qquad \mathbb{T}^n=\mathbb{R}^n/(2\pi\mathbb{Z}^n). \end{equation} \tag{3.8}
Indeed, if \widetilde{P} is a polynomial vector field, then (3.8) follows easily from (3.2) by direct componentwise calculation. The general case is a consequence of this result since any vector field can be approximated by polynomial fields. Details are left to the reader (cf. Lemma 3.5 in [13], where \widetilde{P}^{{\rm res}} equals the right-hand side of (3.8) if the vector \Lambda is non-resonant).

The right-hand side of (3.8) is obviously invariant with respect to all rotations \Phi_{w'}, so it does not depend on the vector a, but only depends on the corresponding torus

\begin{equation} \{z\in\mathbb{C}^n\colon I_j(z)=I_j(a)\ \forall j\} \end{equation} \tag{3.9}
(see (1.2)) to which a belongs, and
\begin{equation} (\Phi_{w})_* \langle\!\langle \widetilde{P} \rangle\!\rangle (a) \equiv \langle\!\langle \widetilde{P} \rangle\!\rangle (a) \quad \forall\,w\in\mathbb{C}^n \quad\text{if } \Lambda \text{ is non-resonant}. \end{equation} \tag{3.10}
See Section 6 below for a discussion of equations (1.1) with non-resonant vectors \Lambda.

3.1.3.

If the field \widetilde{P} in (3.2) is polynomial, that is,

\begin{equation} \widetilde{P}_j(a) =\sum_{|\alpha|, |\beta| \leqslant N} C_j^{\alpha, \beta} a^\alpha \bar a^\beta, \qquad j=1,\dots,n, \end{equation} \tag{3.11}
for some N\in\mathbb{N}, where \alpha, \beta \in \mathbb{Z}_+^n, \displaystyle a^\alpha=\prod a_j^{\alpha_j} and \displaystyle |\alpha|=\sum |\alpha_j|, then \langle\!\langle \widetilde{P} \rangle\!\rangle =\widetilde{P}^{{\rm res}}. Here \widetilde{P}^{{\rm res}} is a polynomial vector field such that for each j, \widetilde{P}^{{\rm res}}_j(a) is given by the right-hand side of (3.11), where the sum is taken over all |\alpha|,|\beta|\leqslant N, satisfying \Lambda \cdot (\alpha-\beta)=\lambda_j. This easily follows from the explicit calculation of the integral in (3.1) (see [13; Lemma 3.5]).

3.2. Averaging functions

Similarly to definition (3.2), for a locally Lipschitz function f\in \operatorname{Lip}_m (\mathbb{C}^n, \mathbb{C}), m\geqslant0, we define its averaging with respect to a frequency vector \Lambda by

\begin{equation} \langle f\rangle(a) =\lim_{{T'}\to\infty}\frac{1}{T'}\int_0^{T'}f(\Phi_{-t\Lambda }a)\,dt, \qquad a\in\mathbb{C}^n. \end{equation} \tag{3.12}
Then using the same argument as above we obtain the following lemma.

Lemma 3.6. Let f\in \operatorname{Lip}_m (\mathbb{C}^n, \mathbb{C}). Then the following assertions are true.

(1) The limit (3.12) exists for every a, and for a\in \overline{B}_R(\mathbb{C}^n) the rate of convergence in (3.12) does not depend on a, but only depends on R.

(2) \langle f\rangle \in \operatorname{Lip}_{m}(\mathbb{C}^n,\mathbb{C}) and \mathcal{C}^m(\langle f\rangle ) \leqslant \mathcal{C}^m(f).

(3) If f is C^m-smooth for some m\in \mathbb{N}, then \langle f\rangle also is, and the C^m-norm of the latter is bounded by the C^m-norm of the former.

(4) The function \langle f\rangle commutes with the operators \Phi_{\theta\Lambda}, \theta\in\mathbb{R}, in the sense that \langle f\circ\Phi_{\theta\Lambda}\rangle=\langle f\rangle\circ\Phi_{\theta\Lambda}=\langle f\rangle.

If the vector \Lambda is non-resonant, then similarly to (3.8) we have

\begin{equation} \langle {f}\rangle(a) =\frac{1}{(2\pi)^n}\int_{\mathbb{T}^n} {f}(\Phi_{-w}a)\,dw. \end{equation} \tag{3.13}
The right-hand side of (3.13) is the averaging of the function f in angular brackets. It is constant on the tori (3.9).

4. Effective equation and the averaging theorem

In this section we show that the limiting measure Q_0 in (2.14) is independent of the choice of the sequence \varepsilon'_l\to0, so that \mathcal{D}(a^\varepsilon)\rightharpoonup Q_0 as \varepsilon\to0, and we represent Q_0 as the law of a solution of an auxiliary effective equation. The drift in this equation is the averaged drift in (2.6). Now we construct its dispersion.

The diffusion matrix for (2.11) is the complex n\times n matrix

\begin{equation*} \mathcal{A}^\varepsilon(\tau) =(\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi) \cdot(\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi)^*. \end{equation*} \notag
Setting
\begin{equation} \Phi_{\tau\varepsilon^{-1}\Lambda}\Psi = \bigl(e^{i\tau\varepsilon^{-1}\lambda_l}\Psi_{lj}\bigr) =:(\psi^\varepsilon_{lj}(\tau))=\psi^\varepsilon(\tau), \end{equation} \tag{4.1}
we have
\begin{equation*} \mathcal{A}^\varepsilon_{kj}(\tau) =\sum_{l=1}^{n_1}\psi^\varepsilon_{kl}(\tau)\overline{\psi^\varepsilon_{jl}}(\tau) =e^{i \tau\varepsilon^{-1}(\lambda_k-\lambda_j)} \sum_{l=1}^{n_1}\Psi_{kl}\overline{\Psi}_{jl}. \end{equation*} \notag
So for any \tau>0,
\begin{equation*} \frac{1}{\tau}\int_0^{\tau}\mathcal{A}^\varepsilon_{kj}(s)\,ds =\biggl(\sum_{l=1}^{n_1}\Psi_{kl}\overline{\Psi}_{jl}\biggr) \,\frac{1}{\tau} \int_0^\tau e^{is\varepsilon^{-1}(\lambda_k-\lambda_j)}\,ds, \end{equation*} \notag
and we immediately see that
\begin{equation} \frac{1}{\tau}\int_0^{\tau}\mathcal{A}^\varepsilon_{kj}(\tau)\,d\tau \to A_{kj} \quad \text{as }\ \varepsilon\to0, \end{equation} \tag{4.2}
where
\begin{equation} A_{kj} =\begin{cases}\displaystyle \sum_{l=1}^{n_1}\Psi_{kl}\overline{\Psi}_{jl} & \text{if }\lambda_k=\lambda_j,\\ 0 & \text{otherwise}. \end{cases} \end{equation} \tag{4.3}
Clearly, A_{kj}=\bar A_{jk}, so that A is a Hermitian matrix. If \lambda_k\neq\lambda_j for k\neq j, then
\begin{equation} A =\operatorname{diag}\{b_1,\dots,b_n\}, \qquad b_k=\sum_{l=1}^{n_1}|\Psi_{kl}|^2. \end{equation} \tag{4.4}

For any vector \xi\in\mathbb{C}^n, from (4.2) we obtain \langle A\xi,\xi\rangle\geqslant0 since it is obvious that \langle \mathcal{A}^\varepsilon(\tau)\xi,\xi\rangle=|\psi^\varepsilon(\tau)\xi|^2\geqslant0 for each \varepsilon. Therefore, A is a non-negative Hermitian matrix, and there exists another non-negative Hermitian matrix B (called the principal square root of A) such that BB^*=B^2=A. The matrix B is non-singular if \Psi is.

Example 4.1. If \Psi is a diagonal matrix \operatorname{diag}\{\psi_1,\dots,\psi_n\}, \psi_j\in\mathbb{R}, then \mathcal{A}^\varepsilon(\tau)=|\Psi|^2. In this case A=|\Psi|^2 and B=|\Psi|=\operatorname{diag}\{|\psi_1|,\dots,|\psi_n|\}.

In fact, it is not necessary that B be a Hermitian square matrix, and the argument below remains true if as B we take any complex n\times N matrix (for any N\in\mathbb{N}) satisfying the equation

\begin{equation*} BB^*=A. \end{equation*} \notag

Now we define the effective equation for (2.11) as follows:

\begin{equation} da_k- \langle\!\langle P \rangle\!\rangle _k(a)\,d\tau =\sum_{l=1}^nB_{kl}\,d\beta^c_l, \qquad k=1,\dots,n. \end{equation} \tag{4.5}
Here the matrix B is as above and \langle\!\langle P \rangle\!\rangle is the resonant averaging of the vector field P. We will usually consider this equation with the same initial condition as equations (2.6) and (2.11):
\begin{equation} a(0)=v_0. \end{equation} \tag{4.6}
Since the vector field \langle\!\langle P \rangle\!\rangle is locally Lipschitz and the dispersion matrix B is constant, it follows that a strong solution of (4.5), (4.6), if exists, is unique.

Note that the effective dispersion B in (4.5) is a square root of an explicit matrix, and by subsection 3.1.3, if the vector field P(v) is polynomial, then the effective drift \langle\!\langle P \rangle\!\rangle (a) is also given by an explicit formula.

Proposition 4.2. The limiting probability measure Q_0 in (2.14) is a weak solution of effective equation (4.5), (4.6).

We recall that a measure Q \in\mathcal{P}(C([0,T];\mathbb{C}^n)) is a weak solution of equation (4.5), (4.6) if Q=\mathcal{D}(\tilde{a}), where the random process \tilde{a}(\tau), 0\leqslant \tau\leqslant T, is a weak solution of (4.5), (4.6). (Concerning weak solutions of stochastic differential equations see, for example, [14; § 5.3].)

The proof of this result is prefaced by a number of lemmas. Till the end of this section we assume that Assumption 2.1 holds. As in section 3, we set

\begin{equation} Y(a;\tau\varepsilon^{-1}) := (\Phi_{\tau\varepsilon^{-1}\Lambda})_*P(a). \end{equation} \tag{4.7}
Then equation (2.11) for a^\varepsilon reads as
\begin{equation} da^\varepsilon(\tau)-Y(a^\varepsilon,\tau\varepsilon^{-1})\,d\tau =\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau). \end{equation} \tag{4.8}
Set
\begin{equation} \tilde y(a,\tau\varepsilon^{-1})=Y(a,\tau\varepsilon^{-1})- \langle\!\langle P \rangle\!\rangle (a) =(\Phi_{\tau\varepsilon^{-1}\Lambda})_*P(a)- \langle\!\langle P \rangle\!\rangle (a). \end{equation} \tag{4.9}
The following key lemma shows that integrals of \tilde y(a^\varepsilon,\tau\varepsilon^{-1}) with respect to \tau decrease with \varepsilon, uniformly in the interval of integrating.

Lemma 4.3. For a solution a^\varepsilon(\tau) of equation (2.11), (2.12) we have

\begin{equation*} \mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl| \int_0^{\tau}\tilde y(a^\varepsilon(s),s\varepsilon^{-1})\,ds \biggr| \to0 \quad\textit{as }\ \varepsilon\to0. \end{equation*} \notag

This lemma is proved at the end of this section.

Now let us introduce the natural filtered measurable space

\begin{equation} (\widetilde{\Omega},\mathcal{B},\{\mathcal{B}_\tau,0\leqslant\tau\leqslant T\}) \end{equation} \tag{4.10}
for the problem we consider, where \widetilde{\Omega} is the Banach space C([0,T];\mathbb{C}^n)=\{a:=a(\,{\cdot}\,)\}, \mathcal{B} is its Borel \sigma-algebra, and \mathcal{B}_\tau is the \sigma-algebra, generated by the random variables \{a(s)\colon 0\leqslant s\leqslant\tau)\}. Consider the process on \widetilde{\Omega} defined by the left-hand side of (4.5):
\begin{equation} N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a) =a(\tau)-\int_0^\tau { \langle\!\langle P \rangle\!\rangle }(a(s))\,ds, \qquad a\in\widetilde{\Omega}, \quad \tau\in[0,T]. \end{equation} \tag{4.11}
Note that for any 0\leqslant\tau\leqslant T, N^{ \langle\!\langle P \rangle\!\rangle }(\tau;\cdot) is a \mathcal{B}_\tau-measurable continuous functional on C([0,T];\mathbb{C}^n).

Lemma 4.4. The random process N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a) is a martingale on the space (4.10) with respect to the limiting measure Q_0 in (2.14).

Proof. Fix some \tau\in[0,T] and consider a \mathcal{B}_\tau-measurable function f^\tau\in C_b(\widetilde{\Omega}). We show that
\begin{equation} \mathsf{E}^{Q_0}\bigl(N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)\bigr) =\mathsf{E}^{Q_0}\bigl(N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a)\bigr) \quad \text{for any }\ \tau\leqslant t\leqslant T, \end{equation} \tag{4.12}
which implies the assertion. To establish this, first consider the process
\begin{equation*} N^{Y,\varepsilon}(\tau; a^\varepsilon) :=a^\varepsilon(\tau) -\int_0^\tau Y(a^\varepsilon,s\varepsilon^{-1})\,ds, \end{equation*} \notag
which is a martingale in view of (4.8). As
\begin{equation*} N^{Y,\varepsilon}(\tau;a^\varepsilon)-N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau;a^\varepsilon) =\int_0^\tau \bigl[ { \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(s))-Y(a^\varepsilon(s), s\varepsilon^{-1}) \bigr]\,ds, \end{equation*} \notag
by Lemma 4.3 we have
\begin{equation} \max_{0\leqslant\tau\leqslant T} \mathsf{E} \bigl| N^{Y,\varepsilon}(\tau; a^\varepsilon) - N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau; a^\varepsilon) \bigr| =o_\varepsilon(1). \end{equation} \tag{4.13}
Here and throughout this proof o_\varepsilon(1) is a quantity tending to zero with \varepsilon. Since N^{Y,\varepsilon} is a martingale, relation (4.13) implies that
\begin{equation*} \begin{aligned} \, \mathsf{E}\bigl(N^{{ \langle\!\langle P \rangle\!\rangle }}(t;a^\varepsilon)f^\tau(a^\varepsilon)\bigr) +o_\varepsilon(1) & =\mathsf{E}\bigl(N^{Y,\varepsilon}(t;a^\varepsilon)f^\tau(a^\varepsilon)\bigr) \\ & =\mathsf{E}\bigl(N^{Y,\varepsilon}(\tau;a^\varepsilon)f^\tau(a^\varepsilon)\bigr) \\ & =\mathsf{E} \bigl( N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau;a^\varepsilon)f^\tau(a^\varepsilon) \bigr) +o_\varepsilon(1). \end{aligned} \end{equation*} \notag
So
\begin{equation} \mathsf{E}^{ Q_\varepsilon} \bigl[ N^{{ \langle\!\langle P \rangle\!\rangle }}(t;a)f^\tau(a)-N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau;a)f^\tau(a) \bigr] =o_\varepsilon(1). \end{equation} \tag{4.14}
To obtain (4.12), in this relation we take a limit as \varepsilon\to0. To do this, for M>0 consider the function
\begin{equation*} G_M(t) =\begin{cases} t & \text{if }|t|\leqslant M,\\ M\operatorname{sgn}t & \text{otherwise}. \end{cases} \end{equation*} \notag
Since by Assumption 2.1 and Lemma 3.2
\begin{equation*} \mathsf{E}^{Q_\varepsilon} \Bigl(\sup_{\tau\in[0,T]}|N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)|^2\Bigr) \leqslant \mathsf{E}^{Q_\varepsilon} \Bigl[C_P\Bigl(1+\sup_{\tau\in[0,T]} |a(\tau)|^{2(m_0\vee1)}\Bigr)\Bigr] \leqslant C_{P,m_0}(|v_0|), \end{equation*} \notag
for any \varepsilon we have
\begin{equation} \mathsf{E}^{Q_\varepsilon} \bigl|(1-G_M)\circ \bigl( N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)-N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a) \bigr) \bigr| \leqslant C M^{-1}. \end{equation} \tag{4.15}
As Q_{\varepsilon'_l}\rightharpoonup Q_0, by Fatou’s lemma this estimate stays true for \varepsilon=0.

Relations (4.14) and (4.15) show that

\begin{equation*} \mathsf{E}^{ Q_\varepsilon} \bigl[ G_M\circ \bigl( N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)-N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a) \bigr) \bigr] =o_\varepsilon(1)+o_{M^{-1}}(1). \end{equation*} \notag
From this and the convergence (2.14) we derive the relation
\begin{equation*} \mathsf{E}^{Q_0} \bigl[ G_M\circ \bigl( N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)-N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a) \bigr) \bigr] =o_{M^{-1}}(1), \end{equation*} \notag
which in combination with (4.15)_{\varepsilon=0} implies (4.12) when we let M tend to \infty. The lemma is proved.

Definition 4.5. A measure Q on the space (4.10) is called a solution of the martingale problem for effective equation (4.5) with initial condition (4.6) if a(0)=v_0 Q-a. s. and

1) the process \{N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)\in\mathbb{C}^n,\,\tau\in[0,T]\} (see (4.11)) is a vector martingale on the filtered space (4.10) with respect to the measure Q;

2) for any k,j=1,\dots,n the process

\begin{equation} N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a)\, \overline{N^{ \langle\!\langle P \rangle\!\rangle }_j}(\tau;a) -2\int_0^\tau(B B^*)_{kj}\,ds, \qquad\tau\in[0,T] \end{equation} \tag{4.16}
(here BB^*=A) is a martingale on the space (4.10) with respect to the measure Q, as also is the process N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a)\,N^{ \langle\!\langle P \rangle\!\rangle }_j(\tau;a).

This is a classical definition expressed in complex coordinates. See [24] and [14; § 5.4], where we profited from [14; Remark 4.12] and the result of [14; Problem 4.13] since the vector field { \langle\!\langle P \rangle\!\rangle } in (4.5) is locally Lipschitz by Lemma 3.2. Note that condition 2) in Definition 4.5 implies that

\begin{equation*} \bigl\langle N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a), \overline{N^{ \langle\!\langle P \rangle\!\rangle }_j}(\tau;a) \bigr\rangle (\tau) =2\int_0^\tau(BB^*)_{kj}\,ds \end{equation*} \notag
and
\begin{equation*} \bigl\langle N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a), {N^{ \langle\!\langle P \rangle\!\rangle }_j}(\tau;a) \bigr\rangle (\tau) =0 \end{equation*} \notag
(see Appendix B).

We have the following assertion.

Lemma 4.6. The limiting measure Q_0 in (2.14) is a solution of the martingale problem for effective equation (4.5), (4.6).

Proof. Since condition 1) in Definition 4.5 was verified in Lemma 4.4, it remains to check condition 2). For the second term in (4.16), as \varepsilon\to0, we have
\begin{equation} \int_0^{\tau}\bigl(\psi^\varepsilon(s)(\psi^\varepsilon(s))^*\bigr)_{kj}\,ds =\int_0^\tau e^{i\varepsilon^{-1}(\lambda_k-\lambda_j)s}(\psi\psi^*)_{kj}\,ds \to\tau A_{kj}, \end{equation} \tag{4.17}
where the matrix (A_{kj}) is given by (4.3). We turn to the first term. By (2.11) and (4.1) we have
\begin{equation*} N^{Y,\varepsilon}(\tau) =v_0+\int_0^\tau \psi^\varepsilon(s) \,d\beta^c(s), \qquad \psi^\varepsilon_{lj}(s) =e^{i s\varepsilon^{-1}\lambda_l}\Psi_{lj}, \end{equation*} \notag
and therefore, by the complex Itô formula (see Appendix C) and Assumption 2.1, for any k,j\in\{1,\dots,n\} the process
\begin{equation} N_k^{Y,\varepsilon}(\tau) \overline{ N_j^{Y,\varepsilon}}(\tau) -2\int_0^\tau\bigl(\psi^\varepsilon(s)(\psi^\varepsilon(s))^*\bigr)_{kj}\,ds, \end{equation} \tag{4.18}
is a martingale. As in the verification of condition 1), we compare (4.16) with (4.18). To do this consider
\begin{equation*} \begin{aligned} \, & N_k^{ \langle\!\langle P \rangle\!\rangle }(\tau ;a^\varepsilon)\, \overline{N_j^{{ \langle\!\langle P \rangle\!\rangle }}}(\tau;a^\varepsilon) -N_k^{Y,\varepsilon}(\tau;a^\varepsilon)\, \overline{ N_j^{Y,\varepsilon}}(\tau;a^\varepsilon) \\ &\qquad =\biggl(a_k^\varepsilon(\tau)-\int_0^\tau{ \langle\!\langle P \rangle\!\rangle }_k(a^\varepsilon(s))\,ds\biggr) \biggl(\bar a_j^\varepsilon(\tau)-\int_0^\tau{ \langle\!\langle \overline{P} \rangle\!\rangle }_j(a^\varepsilon(s))\,ds\biggr) \\ &\qquad\qquad - \biggl(a_k^\varepsilon(\tau) -\int_0^\tau Y_k(a^\varepsilon(s),s\varepsilon^{-1})\,ds \biggr) \biggl(\bar a_j^\varepsilon(\tau) -\int_0^\tau\overline Y_j(a^\varepsilon(s),s\varepsilon^{-1})\,ds\biggr) \\ &\qquad =:M_{kj}(a^\varepsilon;\tau). \end{aligned} \end{equation*} \notag
Repeating closely the proof of (4.13) we get that
\begin{equation*} \sup_{0\leqslant \tau\leqslant T} \mathsf{E}\big| M_{kj}(a^\varepsilon; \tau)\big| =o_\varepsilon(1) \quad \text{as }\ \varepsilon\to0. \end{equation*} \notag
Since (4.18) is a martingale, this relation and (4.17) imply that (4.16) is a martingale by the same arguments by which relations (4.13) and the fact that N^{Y,\varepsilon}(\tau;a^\varepsilon) is a martingale imply that N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a) is too. To pass to a limit as \varepsilon\to0 the proof uses that the random variables like N_k^{Y,\varepsilon}(\tau;a^\varepsilon) \overline{N_j^{Y,\varepsilon}}(\tau;a^\varepsilon) are integrable uniformly in \varepsilon>0 by Assumption 2.1, where m'_0>m_0.

Similarly, for any k and j the process N_k^{ \langle\!\langle P \rangle\!\rangle }(\tau) {N_j^{ \langle\!\langle P \rangle\!\rangle }}(\tau) is also a martingale. The assertion of the lemma is established.

Now we can prove Proposition 4.2.

Proof of Proposition 4.2. It is well known that a solution of the martingale problem for a stochastic differential equation is a weak solution of it. Instead of referring to a corresponding theorem (see [24] or [14; § 5.4]), following [16] again, we give a short direct proof, based on another strong result from stochastic calculus. By Lemma 4.6 and the martingale representation theorem for complex processes (see Appendix B) we know that there exists an extension (\widehat{\Omega},\widehat{\mathcal{B}},\widehat{\mathsf{P}}) of the probability space (\widetilde{\Omega},\mathcal{B},Q_0) and on it there exist standard independent complex Wiener processes \beta^c_1(\tau),\dots,\beta^c_n(\tau) such that
\begin{equation*} da_j(\tau)-{ \langle\!\langle P \rangle\!\rangle }_j(a)\,d\tau =\sum_{l=1}^nB_{jl}\,d\beta^c_l(\tau), \quad j=1,\dots,n, \end{equation*} \notag
where the dispersion B is a non-negative Hermitian matrix satisfying BB^*=A. Therefore, the measure Q_0 is a weak solution of effective equation (4.5). We thus proved the assertion of the proposition.

By Lemma 3.2, in effective equation (4.5) the drift term { \langle\!\langle P \rangle\!\rangle } is locally Lipschitz. So its strong solution (if exists) is unique. By Proposition 4.2 the measure Q_0 is a weak solution of (4.5). Hence, by the Yamada–Watanabe theorem [14; § 5.3.D], [24; Chap. 8] a strong solution of the effective equation exists, and its weak solution is unique. Therefore, the limit Q_0=\lim_{\varepsilon_l\to0}Q_{\varepsilon_l} does not depend on the sequence \varepsilon_l\to0. So convergence holds as \varepsilon\to0, and thus we have established the following theorem.

Theorem 4.7. For any v_0\in\mathbb{C}^n the solution a^\varepsilon(\tau;v_0) of problem (2.11), (2.12) satisfies

\begin{equation} \mathcal{D}(a^\varepsilon(\,{\cdot}\,;v_0))\rightharpoonup Q_0 \quad \textit{in }\ \mathcal{P}(C([0,T];\mathbb{C}^n)) \quad \textit{as }\ \varepsilon\to0, \end{equation} \tag{4.19}
where the measure Q_0 is the law of a unique weak solution a^0(\tau;v_0) of effective equation (4.5), (4.6).

Remark 4.8. (i) A straightforward analysis of the proof of the theorem shows that it goes without changes if a^\varepsilon(\tau) solves (2.11) with initial data v_{\varepsilon0} converging to v_0 as \varepsilon\to0. So

\begin{equation} \begin{gathered} \, \mathcal{D}(a^\varepsilon(\,{\cdot}\,; v_{\varepsilon 0})) \rightharpoonup Q_0 \quad\text{in }\ \mathcal{P}(C([0,T];\mathbb{C}^n)) \quad\text{as }\ \varepsilon\to0, \\ \text{if }\ v_{\varepsilon 0 \to v_0}\ \ \text{when}\ \ \varepsilon\to0. \notag \end{gathered} \end{equation} \tag{4.20}

(ii) Setting \mathcal{D} (a^\varepsilon(\,{\cdot}\,; v_0))=Q_\varepsilon \in \mathcal{P}({C([0,T];\mathbb{C}^n)}) as before, we use Skorokhod’s representation theorem (see [2; § 6]) to find a sequence \varepsilon_j\to0 and processes \xi_j(\tau), 0\leqslant\tau\leqslant T, j=0,1,2,\dots, such that \mathcal{D}(\xi_0)=Q_0, \mathcal{D}(\xi_j) =Q_{\varepsilon_j}, and \xi_j\to\xi_0 in C([0,T];\mathbb{C}^n) almost surely. Then (2.7) and Fatou’s lemma imply that

\begin{equation} \mathsf{E}\| a^0\|^{2m_0'}_{C([0,T]; \mathbb{C}^n)} =\mathsf{E}^{Q_0} \| a\|^{2m_0'}_{C([0,T]; \mathbb{C}^n)} =\mathsf{E}\| \xi_0\|^{2m_0'}_{C([0,T]; \mathbb{C}^n)} \leqslant C_{m'_0}(|v_0|,T). \end{equation} \tag{4.21}

The result of Theorem 4.7 admits an immediate generalization to the case when the initial data v_0 in (2.12) are a random variable.

Amplification 4.9. Let v_0 be a random variable independent of the Wiener process \beta^c(\tau). Then the convergence (4.19) still holds.

Proof. It suffices to establish (4.19) when a^\varepsilon is a weak solution of the problem. Now let (\Omega',\mathcal{F}',\mathsf{P}') be another probability space and \xi_0^{\omega'} be a random variable on \Omega' which is distributed as v_0. Then a^{\varepsilon\omega}(\tau; \xi_0^{\omega'}) is a weak solution of (2.11), (2.12) defined on the probability space \Omega'\times\Omega. Take f to be a bounded continuous function on C([0,T];\mathbb{C}^n). Then by the above theorem, for each \omega'\in\Omega'
\begin{equation*} \lim_{\varepsilon\to0} \mathsf{E}^\Omega f(a^{\varepsilon \omega} (\,{\cdot}\,;\xi_0^{\omega'})) =\mathsf{E}^\Omega f(a^{0 \omega}(\,{\cdot}\,; \xi_0^{\omega'})). \end{equation*} \notag
Since f is bounded, by Lebesgue’s dominated convergence theorem we have
\begin{equation*} \begin{aligned} \, \lim_{\varepsilon\to0} \mathsf{E} f(a^\varepsilon(\,{\cdot}\,; v_0)) & =\lim_{\varepsilon\to0}\mathsf{E}^{\Omega'}\mathsf{E}^\Omega f(a^{\varepsilon\omega}(\,{\cdot}\,;\xi_0^{\omega'})) \\ & =\mathsf{E}^{\Omega'}\mathsf{E}^\Omega f(a^{0\omega}(\,{\cdot}\,;\xi_0^{\omega'})) =\mathsf{E} f(a^{0}(\,{\cdot}\,; v_0)). \end{aligned} \end{equation*} \notag
This implies the required convergence (4.19).

The convergence stated in the last amplification holds uniformly in the class of random initial data v_0 bounded almost surely by a fixed constant. To state the result we have to introduce a distance in the space of measures.

Definition 4.10. Let M be a Polish (that is, complete and separable) metric space. For any two measures \mu_1,\mu_2\in\mathcal{P}(M) we define the dual-Lipschitz distance between them by

\begin{equation*} \|\mu_1-\mu_2\|_{L,M}^* :=\sup_{f\in C_b (M),\, |f|_L\leqslant1} |\langle f ,\mu_1\rangle -\langle f ,\mu_2\rangle| \leqslant2, \end{equation*} \notag
where |f|_L=|f|_{L,M}=\operatorname{Lip}(f)+\|f\|_{C(M)}.

In the definition and below we set

\begin{equation} \langle f ,\mu\rangle :=\int_M f(m)\,\mu(dm). \end{equation} \tag{4.22}

Example 4.11. Consider the Polish spaces C([0,T];\mathbb{C}^n), \mathbb{C}^n and the mappings

\begin{equation*} \Pi_t\colon C([0,T];\mathbb{C}^n) \to \mathbb{C}^n, \quad a(\,{\cdot}\,) \mapsto a(t), \qquad 0\leqslant t\leqslant T. \end{equation*} \notag
Noting that | f\circ \Pi_t|_{L, \mathbb{C}^n} \leqslant |f|_{L, C([0,T];\mathbb{C}^n)} for each t we get that
\begin{equation} \|\Pi_t\circ \mu_1- \Pi_t\circ \mu_2\|_{L, \mathbb{C}^n}^* \leqslant \| \mu_1- \mu_2\|_{L, C([0,T];\mathbb{C}^n)}^* \end{equation} \tag{4.23}
for all \mu_1, \mu_2 \in \mathcal{P}(C([0,T];\mathbb{C}^n)) and all 0\leqslant t\leqslant T (where \Pi_t\circ \mu_j \in\mathcal{P}(\mathbb{C}^n) denotes the image of \mu_j under \Pi_t).

The dual-Lipschitz distance converts \mathcal{P}(M) into a complete metric space and induces on it a topology equivalent to the weak convergence of measures (see, for example, [8; § 11.3] and [5; § 1.7]).

Proposition 4.12. Under the assumptions of Amplification 4.9 let the random variable v_0 be such that |v_0| \leqslant R almost surely for some R>0. Then the rate of convergence in (4.19) with respect to the dual-Lipschitz distance depends only on R.

Proof. The proof of Amplification 4.9 shows that it suffices to verify that for non-random initial data v_0\in\overline{B}_R(\mathbb{C}^n) the rate of convergence in (4.19) depends only on R. Assume the opposite. Then there exist \delta>0, a sequence \varepsilon_j\to0, and vectors v_j\in\overline{B}_R(\mathbb{C}^n) such that
\begin{equation} \|\mathcal{D}(a^{\varepsilon_j}(\,{\cdot}\,; v_j)) -\mathcal{D}(a^0(\,{\cdot}\,; v_j))\|_{L,C([0,T];\mathbb{C}^n)}^* \geqslant \delta. \end{equation} \tag{4.24}
By the same argument as in the proof of Lemma 2.2, we know that the two sets of probability measures \{\mathcal{D}(a^{\varepsilon_j}(\,{\cdot}\,;v_j))\} and \{\mathcal{D}(a^0(\,{\cdot}\,;v_j))\} are pre-compact in C([0,T];\mathbb{C}^n). Therefore, there exists a sequence k_j\to\infty such that \varepsilon_{k_j}\to0, v_{k_j}\to v_0, and
\begin{equation*} \mathcal{D}(a^{\varepsilon_{k_j}}(\,{\cdot}\,;v_{k_j}))\rightharpoonup\widetilde{Q}_0, \quad \mathcal{D}(a^{0}(\,{\cdot}\,; v_{k_j}))\rightharpoonup Q_0 \quad \text{in }\ \mathcal{P}(C([0,T];\mathbb{C}^n)). \end{equation*} \notag
Then
\begin{equation} \|\widetilde{Q}_0-Q_0\|_{L,C([0,T];\mathbb{C}^n)}^* \geqslant\delta. \end{equation} \tag{4.25}
Since in the well-posed equation (4.5) the drift and dispersion are locally Lipschitz, the law \mathcal{D}(a^0(\,{\cdot}\,;v')) is continuous with respect to the initial condition v' (this is well known and can easily be proved using the estimate in Remark 4.8, (ii)). Therefore, Q_0 is the unique weak solution of the effective equation (4.5) with initial condition a^0(0)=v_0. By (4.20) the measure \widetilde{Q}_0 also is a weak solution of problem (4.5), (4.6). Hence Q_0=\widetilde{Q}_0. This contradicts (4.25) and proves the assertion.

We proceed to an obvious application of Theorem 4.7 to solutions v^\varepsilon(\tau;v_0) of the original equation (2.6). Consider the action mapping

\begin{equation*} (z_1,\dots,z_n)\mapsto (I_1,\dots,I_n)=:I \end{equation*} \notag
(see (1.2)). Since the interaction representation (2.8) does not change actions, from the theorem we obtain the following assertion.

Corollary 4.13. For any v_0,

\begin{equation} \mathcal{D}(I(v^\varepsilon(\,\cdot\,;v_0))) \rightharpoonup I\circ\mathcal{D}(a(\,{\cdot}\,;v_0)) \quad \textit{in }\ \mathcal{P}(C([0,T];\mathbb{R}_+^n)) \end{equation} \tag{4.26}
as \varepsilon\to0, where a(\,{\cdot}\,;v_0) is the unique weak solution of effective equation (4.5), (4.6).

Example 4.14. If the drift P in (2.6) is globally Lipschitz, that is, \operatorname{Lip}(P)\leqslant M for some M>0, then it is not difficult to see that Assumption 2.1 holds, so Theorem 4.7 and Corollary 4.13 apply. A more interesting example is discussed in Section 9 below.

Proof of Lemma 4.3. In this proof we denote by \mathcal{H}_k(r;c_1,\dots), k=1,2,\dots, non-negative functions of r>0 which tends to zero with r and depend on parameters c_1,\dots (the dependence of the \mathcal{H}_k on T and P is not indicated). Also, for an event Q we set \mathsf{E}_{Q}f(\xi)=\mathsf{E}\mathbf{1}_{Q}f(\xi).

For any M_1\geqslant1 we set

\begin{equation*} \mathcal{E}^1 =\mathcal{E}^{1\varepsilon}_{M_1} =\Bigl\{\omega\in\Omega\colon \sup_{0\leqslant\tau\leqslant T} |a^\varepsilon(\tau)| \leqslant M_1 \Bigr\}. \end{equation*} \notag
By Assumption 2.1 and Chebyshev’s inequality,
\begin{equation*} \mathsf{P}(\Omega\setminus \mathcal{E}^1 ) \leqslant\mathcal{H}_1(M_1^{-1}; |v_0|). \end{equation*} \notag
Recalling that \tilde y was defined in (4.9), by Lemma 3.2 we have
\begin{equation*} |\tilde y(a^\varepsilon(s),s\varepsilon^{-1})| \leqslant |Y(a^\varepsilon(s),s\varepsilon^{-1})|+|{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(s))| \leqslant 2 \mathcal{C}^{m_0}(P) | a^\varepsilon(s)|^{m_0}. \end{equation*} \notag
So, abbreviating \tilde y(a^\varepsilon(s),s\varepsilon^{-1}) to \tilde y(s), in view of (2.7) we have:
\begin{equation*} \begin{aligned} \, \mathsf{E}_{\Omega\setminus {\mathcal{E}^1}} \max_{\tau\in[0,T]} \biggl|\int_0^{ \tau}\tilde y(s)\,ds\biggr| & \leqslant\int_0^{T}\mathsf{E} \big(\mathbf{1}_{\Omega\setminus{\mathcal{E}^1}}|\tilde y(s)|\big)\,ds \\ & \leqslant 2\mathcal{C}^{m_0}(P) (\mathsf{P}(\Omega\setminus{\mathcal{E}^1}))^{1/2} \biggl(\int_0^{T}\mathsf{E}|a^\varepsilon(s)|^{2m_0}\,ds\biggr)^{1/2} \\ & \leqslant 2\mathcal{C}^{m_0}(P) (\mathcal{H}_1(M_1^{-1}; |v_0|))^{1/2} =:\mathcal{H}_2(M_1^{-1}; |v_0|). \end{aligned} \end{equation*} \notag
Now we must estimate \displaystyle\mathsf{E}_{\mathcal{E}^1}\max_{\tau\in[0,T]}\biggl|\int_0^\tau\tilde y(s)\,ds\biggr|. For any M_2\geqslant1 consider the event
\begin{equation*} \mathcal{E}^2 =\mathcal{E}^{2 \varepsilon}_{M_2} =\{\omega\in\Omega\colon\|a^\varepsilon\|_{1/3}\leqslant M_2\} \end{equation*} \notag
(see (1.4)). Then by (2.13)
\begin{equation*} \mathsf{P}(\Omega\setminus {\mathcal{E}^2}) \leqslant\mathcal{H}_3(M_2^{-1};|v_0|). \end{equation*} \notag
Therefore,
\begin{equation*} \begin{aligned} \, \mathsf{E}_{\Omega\setminus\mathcal{E}^2} \max_{\tau\in[0,T]} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| & \leqslant(\mathsf{P}(\Omega\setminus{\mathcal{E}^2}))^{1/2} \biggl(C_P\int_0^{T}\mathsf{E}|a^\varepsilon(s)|^{2m_0}\,ds\biggr)^{1/2} \\ & \leqslant\mathcal{H}_4(M_2^{-1};|v_0|)\,. \end{aligned} \end{equation*} \notag
It remains to bound \displaystyle\mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2}\max_{\tau\in[0,T]}\biggl|\int_0^{\tau} \tilde y(s)\,ds\biggr|.

We set

\begin{equation*} N=\biggl[\frac{T}{\sqrt{\varepsilon}}\biggr]+1,\qquad L=\frac TN. \end{equation*} \notag
Then C^{-1}\sqrt{\varepsilon}\leqslant L\leqslant C\sqrt{\varepsilon} and c^{-1}/\sqrt{\varepsilon}\leqslant N\leqslant c/\sqrt{\varepsilon} for some constants C and c. We consider a partition of the interval [0,T] by the points \tau_l=lL, l=0,\dots,N, and set
\begin{equation*} \eta_l=\int_{\tau_l}^{\tau_{l+1}}\tilde y(s)\,ds, \qquad l=0,\dots,N-1. \end{equation*} \notag
For any \tau\in[0,T] we find l=l(\tau) such that \tau\in[\tau_{l},\tau_{l+1}]. Then
\begin{equation*} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| \leqslant |\eta_1|+\dots+|\eta_l| +\biggl|\int_{\tau_l}^{\tau}\tilde y(s)\,ds\biggr|. \end{equation*} \notag
If \omega\in\mathcal{E}^1, then \displaystyle\biggl|\int_{\tau_l}^{\tau}\tilde y(s)\,ds\biggr|\leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}L. Therefore,
\begin{equation} \mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2} \max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| \leqslant 2 \mathcal{C}^{m_0}(P)M_1^{m_0}L +\sum_{l=0}^{N-1}\mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2}|\eta_l|, \end{equation} \tag{4.27}
and it remains to estimate the integrals \mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2}|\eta_l| for l=0,\dots,N-1. Observe that
\begin{equation*} \begin{aligned} \, |\eta_l| & \leqslant\biggl|\int_{\tau_l}^{\tau_{l+1}} \bigl[\tilde y(a^\varepsilon(s),s\varepsilon^{-1})-\tilde y(a^\varepsilon(\tau_l),s\varepsilon^{-1})\bigr]\,ds\biggr| \\ &\qquad +\biggl|\int_{\tau_l}^{\tau_{l+1}}\tilde y(a^\varepsilon(\tau_l),s\varepsilon^{-1})\,ds\biggr| =: |U_l^1|+|U_l^2|. \end{aligned} \end{equation*} \notag
Since \tilde y(a^\varepsilon,\tau\varepsilon^{-1})=(\Phi_{\tau\varepsilon^{-1}\Lambda})_* P(a^\varepsilon)-{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon) and P,{ \langle\!\langle P \rangle\!\rangle }\in\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n), for \omega\in{\mathcal{E}^1}\cap\mathcal{E}^2 the integrand in U_l^1 is bounded by
\begin{equation*} 2\mathcal{C}^{m_0}(P) M_1^{m_0}\sup_{\tau_l\leqslant s\leqslant\tau_{l+1}}|a^\varepsilon(s)-a^\varepsilon(\tau_l)| \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{1/3}. \end{equation*} \notag
So
\begin{equation*} |U_l^1| \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{4/3}. \end{equation*} \notag
Now consider the integral U_l^2. By the definition of \tilde y(a^\varepsilon,\tau\varepsilon^{-1}), we have
\begin{equation*} U_l^2 =\int_{\tau_l}^{\tau_{l+1}}Y(a^\varepsilon(\tau_l),s\varepsilon^{-1})\,ds -L{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(\tau_l)) =:Z^1+Z^2. \end{equation*} \notag
For the integral Z^1, making the change of variable s=\tau_l+\varepsilon x for x\in[0,L/\varepsilon] we have
\begin{equation*} Z^1=\varepsilon\int_0^{L/\varepsilon}Y(a^\varepsilon(\tau_l),\tau_l\varepsilon^{-1}+x)\,dx. \end{equation*} \notag
Since
\begin{equation*} Y(a^\varepsilon(\tau_l),\tau_l\varepsilon^{-1}+x) =\Phi_{\tau_l\varepsilon^{-1}\Lambda}\circ\Phi_{x\Lambda} P(\Phi_{-x\Lambda}(\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l))), \end{equation*} \notag
we have
\begin{equation*} \begin{aligned} \, Z^1 & =L\Phi_{\tau_l\varepsilon^{-1}\Lambda} \biggl( \frac{\varepsilon}{L} \int_0^{L/\varepsilon}\Phi_{x\Lambda} P(\Phi_{-x\Lambda}(\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l)))\,dx \biggr) \\ & =L\Phi_{\tau_l\varepsilon^{-1}\Lambda} \langle\!\langle P \rangle\!\rangle ^{L/\varepsilon} (\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l)) \end{aligned} \end{equation*} \notag
(see definition (3.1)). As L/\varepsilon\sim\varepsilon^{-1/2}\gg1 and |\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l)|=|a^\varepsilon(\tau_l)|\leqslant M_1 for \omega\in{\mathcal{E}^1}\cap\mathcal{E}^2, by Lemma 3.2 the partial averaging \langle\!\langle P \rangle\!\rangle ^{L/\varepsilon} is close to the complete averaging \langle\!\langle P \rangle\!\rangle . Thus,
\begin{equation*} \begin{aligned} \, & \big| Z^1 -L\Phi_{\tau_l\varepsilon^{-1}\Lambda}{ \langle\!\langle P \rangle\!\rangle } \big(\Phi_{-\tau_l\varepsilon^{-1}\Lambda}(a^\varepsilon(\tau_l))\big) \big| =\big|Z^1 -L(\Phi_{\tau_l\varepsilon^{-1}\Lambda})_* { \langle\!\langle P \rangle\!\rangle } (a^\varepsilon(\tau_l)) \big| \\ &\qquad =\big|Z^1 -L { \langle\!\langle P \rangle\!\rangle } (a^\varepsilon(\tau_l)) \big| \leqslant L\mathcal{H}_5(\sqrt{\varepsilon}; M_1, |v_0|), \end{aligned} \end{equation*} \notag
where we have used Lemma 3.3 to obtain the second equality. In view of equality Z^2=-L{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(\tau_l)) we have
\begin{equation*} |U_l^2| =| Z^1+Z^2| \leqslant L\mathcal{H}_5(\sqrt{\varepsilon};M_1,|v_0|). \end{equation*} \notag

Thus, we have obtained

\begin{equation*} \mathsf{E}_{\mathcal{E}^1\cap{\mathcal{E}^2}}|\eta_l| \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{4/3} +L\mathcal{H}_5(\sqrt{\varepsilon};M_1,|v_0|). \end{equation*} \notag
In combination with (4.27), this gives us the inequality
\begin{equation*} \begin{aligned} \, \mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2} \max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| & \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}L \\ &\qquad +2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{1/3} +\mathcal{H}_5(\sqrt{\varepsilon}; M_1, |v_0|). \end{aligned} \end{equation*} \notag
Therefore,
\begin{equation*} \begin{aligned} \, \mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggl| & \leqslant \mathcal{H}_2(M_1^{-1}; |v_0|)+\mathcal{H}_4(M_2^{-1}; |v_0|) \\ &\qquad +2\mathcal{C}^{m_0}(P) M_1^{m_0}(M_2+1)\varepsilon^{1/6} +\mathcal{H}_5(\sqrt{\varepsilon}; M_1,|v_0|). \end{aligned} \end{equation*} \notag
Now, for any \delta>0, we perform the following procedure:

1) choose M_1 sufficiently large so that \mathcal{H}_2(M_1^{-1}; |v_0|)\leqslant\delta;

2) choose M_2 sufficiently large so that \mathcal{H}_4(M_2^{-1}; |v_0|)\leqslant \delta;

3) finally, choose \varepsilon_\delta>0 sufficiently small so that

\begin{equation*} 2\mathcal{C}^{m_0}(P) M_1^{m_0}(M_2+1)\varepsilon^{1/3} +\mathcal{H}_5(\sqrt{\varepsilon}; M_1, |v_0|) \leqslant\delta \quad \text{if }\ 0<\varepsilon\leqslant\varepsilon_\delta. \end{equation*} \notag
We have seen that for any \delta>0,
\begin{equation*} \mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl| \int_0^{\tau}\tilde y(a^\varepsilon(s),s\varepsilon^{-1})\,ds \biggr| \leqslant 3\delta \quad \text{if }\ 0<\varepsilon\leqslant\varepsilon_\delta. \end{equation*} \notag
So
\begin{equation*} \mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\bigl[Y(a^\varepsilon(s),s\varepsilon^{-1}) -{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(s))\bigr]\,ds\biggr| \to0 \quad \text{as }\ \varepsilon\to0, \end{equation*} \notag
which completes the proof of Lemma 4.3.

5. Stationary solutions and mixing

In this section we study the relationship between stationary solutions of equation (2.6) and ones of the effective equation. We recall that a solution a(\tau), \tau\geqslant0, of equation (2.6) (or of effective equation (4.5)) is stationary if \mathcal{D}(a(\tau))\equiv\mu for all \tau\geqslant0 and some measure \mu\in\mathcal{P}(\mathbb{C}^n), which is called a stationary measure for the corresponding equation.

Throughout this section we assume that equation (2.6) satisfies the following stronger version of Assumption 2.1.

Assumption 5.1. (a) The drift P(v) is a locally Lipschitz vector filed, which belongs to \operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n) for some m_0\in\mathbb{N}.

(b) For any v_0\in\mathbb{C}^n equation (2.6) has a unique strong solution v^\varepsilon(\tau;v_0), \tau\geqslant0, which is equal to v_0 at \tau=0. There exists m_0'>(m_0\vee1) such that

\begin{equation} \mathsf{E} \sup_{T'\leqslant\tau\leqslant {T'}+1} |v^\varepsilon(\tau;v_0)|^{2m'_0} \leqslant C_{m'_0}(|v_0|) \end{equation} \tag{5.1}
for any T'\geqslant0 and \varepsilon\in(0,1], where C_{m_0'}(\,{\cdot}\,) is a continuous non-decreasing function.

(c) Equation (2.6) is mixing, that is, it has a stationary solution v^\varepsilon_{\mathrm{st}}(\tau), \mathcal{D}(v^\varepsilon_{\mathrm{st}}(\tau)) \equiv \mu^\varepsilon\in \mathcal{P}(\mathbb{C}^n), and

\begin{equation} \mathcal{D}(v^\varepsilon(\tau;v_0))\rightharpoonup\mu^\varepsilon \quad \text{in }\ \mathcal{P}(\mathbb{C}^n) \quad \text{as }\ \tau\to+\infty, \end{equation} \tag{5.2}
for every v_0.

Under Assumption 5.1 equation (2.6) defines a mixing Markov process in \mathbb{C}^n with transition probability \Sigma_\tau(v)\in\mathcal{P}(\mathbb{C}^n), \tau \geqslant0, v\in \mathbb{C}^n, where \Sigma_\tau(v)=\mathcal{D} v^\varepsilon(\tau; v); for example, see [14; § 5.4.C]. We denote by X the complete separable metric space X=C([0,\infty);\mathbb{C}^n) with the distance

\begin{equation} \operatorname{dist}(a_1, a_2) =\sum_{N=1}^\infty 2^{-N} \frac{\|a_1-a_2\|_{C([0,N];\mathbb{C}^n)}}{1+\|a_1-a_2\|_{C([0,N];\mathbb{C}^n)}}, \qquad a_1,a_2\in X, \end{equation} \tag{5.3}
and consider the continuous function g(a)=\sup_{0\leqslant t\leqslant 1} |a(t)|^{2m'_0} on X. Setting \mu^\tau_\varepsilon=\mathcal{D}(v^\varepsilon(\tau; 0)), by the Markov property we have
\begin{equation} \mathsf{E}\sup_{T'\leqslant\tau\leqslant T'+1}|v^\varepsilon(\tau;0)|^m =\int_{\mathbb{C}^n}\mathsf{E}g(v^\varepsilon(\,{\cdot}\,;v_0)) \,\mu^{T'}_\varepsilon(dv_0), \end{equation} \tag{5.4}
and
\begin{equation} \mathsf{E} \sup_{0\leqslant\tau\leqslant 1} |v^\varepsilon_{\mathrm{st}}(\tau;0)|^{m} =\int_{\mathbb{C}^n}\mathsf{E}g(v^\varepsilon(\,{\cdot}\,; v_0))\,\mu^\varepsilon(dv_0). \end{equation} \tag{5.5}
The left-hand side of (5.4) was estimated in (5.1). To estimate (5.5) we take the the limit as T'\to\infty on the right-hand side of (5.4), using (5.2). To do this we start with the following lemma.

Lemma 5.2. Let n_1,n_2\in \mathbb{N}, let \mathcal{B}\subset\mathbb{R}^{n_1} be a closed convex set which contains more than one point, and let F\colon \mathcal{B}\to\mathbb{R}^{n_2} be a Lipschitz mapping. Then F can be extended to a map \widetilde{F}\colon \mathbb{R}^{n_1} \to \mathbb{R}^{n_2} in such a way that

(a) \operatorname{Lip}(\widetilde{F})=\operatorname{Lip}(F);

(b) \widetilde{F}(\mathbb{R}^{n_1})=F(\mathcal{B}).

Proof. Let \Pi\colon \mathbb{R}^{n_1}\to\mathcal{B} be the projection taking each point in \mathbb{R}^{n_1} to a nearest point in \mathcal{B}. Then \operatorname{Lip}(\Pi)=1 (see Appendix D) and \widetilde{F}=F \circ \Pi is obviously a required extension of F. The lemma is proved.

Since \mathcal{C}^{m_0}(P) =: C_*<\infty, we see that for any M\in\mathbb{N} both the norm of the restriction of P to \overline{B}_M(\mathbb{C}^n) and its Lipschitz constant are bounded by (1+M)^{m_0} C_*. In view of the above lemma we can extend P|_{\overline{B}_M(\mathbb{C}^n)} to a Lipschitz mapping P^M\!\colon \mathbb{C}^n\to \mathbb{C}^n such that

\begin{equation*} \operatorname{Lip}(P^M) \leqslant (1+M)^{m_0} C_*, \qquad \sup| P^M(v)| \leqslant (1+M)^{m_0} C_*. \end{equation*} \notag
Given a solution v(\tau) of equation (2.6), consider the stopping time
\begin{equation*} \tau_M=\inf\{t\geqslant0\colon | v(t)| \geqslant M\} \end{equation*} \notag
and denote by v^{\varepsilon M} the stopped solution:
\begin{equation*} v^{\varepsilon M} (\tau; v_0)=v^\varepsilon(\tau\wedge \tau_M; v_0). \end{equation*} \notag
Then we note that the process v^{\varepsilon M} does not change if we replace P(v) by P^M(v) in (2.6). So v^{\varepsilon M} (\tau;v_0) is a stopped solution of a stochastic equation with Lipschitz coefficients, and thus the curve v^{\varepsilon M \omega}(\,{\cdot}\,; v_0)\in X depends continuously on v_0, for each M\in \mathbb{N}. Since
\begin{equation*} g(v^{\varepsilon M}) \leqslant g(v^\varepsilon) \quad \text{a. s.}, \end{equation*} \notag
in view of (5.2) and (5.1), for all M and N>0 we have
\begin{equation*} \int \mathsf{E}(N\wedge g)(v^{\varepsilon M} (\,\cdot\,; v)) \, \mu^\varepsilon(dv) =\lim_{{T'}\to\infty} \int \mathsf{E}(N\wedge g)(v^{\varepsilon M} (\,{\cdot}\,; v)) \,\mu^{T'}_\varepsilon(dv) \leqslant C_m(0) \end{equation*} \notag
(to obtain the last inequality from (5.1) we have used the Markov property). Passing to the limit as N\to\infty on the left-hand side and using the monotone convergence theorem we see that
\begin{equation} \int\mathsf{E}g(v^{\varepsilon M} (\,{\cdot}\,; v)) \, \mu^\varepsilon(dv) \leqslant C_m(0). \end{equation} \tag{5.6}
Since for every v, g(v^{\varepsilon M} (\,{\cdot}\,; v)) \nearrow g(v^\varepsilon (\,{\cdot}\,; v)) \leqslant\infty almost surely as M\to\infty, we can use the last theorem again to derive from (5.6) that
\begin{equation*} \int\mathsf{E}g(v^\varepsilon(\,{\cdot}\,;v))\,\mu^\varepsilon(dv) \leqslant C_m(0). \end{equation*} \notag
Recalling (5.5) we obtain the following assertion.

Lemma 5.3. The stationary solution v^\varepsilon_{\mathrm{st}}(\tau) satisfies (5.1) f with C_{m_0'}(|v_0|) replaced by C_{m_0'}(0).

Consider the interaction representation for v^\varepsilon_{\mathrm{st}}, v^\varepsilon_{\mathrm{st}}(\tau)=\Phi_{-\tau\varepsilon^{-1}\Lambda}a^\varepsilon(\tau) (note that a^\varepsilon is not a stationary process!). Then a^\varepsilon(\tau) satisfies equation (2.11), so for any N\in\mathbb{N} the system of measures \{\mathcal{D}(a^\varepsilon|_{[0,N]}),\,0<\varepsilon\leqslant1\} is tight in view of (5.1) (for the same reason as in section 2.4). We choose a sequence \varepsilon_l\to0 (depending on N) such that

\begin{equation*} \mathcal{D}(a^{\varepsilon_l}|_{[0,N]})\rightharpoonup Q_0 \quad \text{in }\ \mathcal{P}(C([0,N];\mathbb{C}^n)). \end{equation*} \notag
Applying the diagonal process and replacing \{\varepsilon_l\} by a subsequence, which we still denote by \{\varepsilon_l\}, we achieve that \mathcal{D} a^{\varepsilon_l} \rightharpoonup Q_0 in \mathcal{P}(X) (see (5.3)).

Since a^\varepsilon(0)=v_{\mathrm{st}}^\varepsilon(0), we have

\begin{equation*} \mu^{\varepsilon_l}\rightharpoonup \mu^0 :=Q_0\big|_{\tau=0}. \end{equation*} \notag
Let a^0(\tau) be a process in \mathbb{C}^n such that \mathcal{D}(a^0)=Q_0. Then
\begin{equation} \mathcal{D} (a^{\varepsilon_l}(\tau))\rightharpoonup \mathcal{D}(a^{0}(\tau)) \quad \forall\,0\leqslant \tau <\infty. \end{equation} \tag{5.7}
In particular, \mathcal{D}(a^0(0) )=\mu^0.

Proposition 5.4. (1) The limiting process a^0 is a stationary weak solution of effective equation (4.5), and \mathcal{D}(a^0(\tau))\equiv\mu^0, \tau\in[0,\infty). In particular, limit points as \varepsilon\to0 of the system of stationary measures \{\mu^\varepsilon,\,\varepsilon\in(0,1]\} are stationary measures of the effective equation.

(2) Any limiting measure \mu^0 is invariant under operators \Phi_{\theta\Lambda}, \theta\in\mathbb{R}. So

\begin{equation*} \mathcal{D}(\Phi_{\theta\Lambda}a^0(\tau))=\mu^0 \end{equation*} \notag
for all \theta\in\mathbb{R} and \tau\in[0,\infty).

Proof. (1) Using Lemma 5.3 and repeating the argument in the proof of Proposition 4.2 we obtain that a^0 is a weak solution of the effective equation. It remains to prove that it is stationary.

Take any bounded Lipschitz function f on \mathbb{C}^n and consider

\begin{equation*} \mathsf{E}\int_0^{1}f(v_{\mathrm{st}}^{\varepsilon_l}(\tau))\,d\tau =\mathsf{E}\int_0^1 f(\Phi_{-\tau\varepsilon_l^{-1}\Lambda}a^{\varepsilon_l}(\tau)) \,d\tau. \end{equation*} \notag
Using the same argument as in the proof of Lemma 4.3 (but applying it to the averaging of functions rather than vector fields) we obtain
\begin{equation} \begin{aligned} \, & \mathsf{E}\int_0^{1} f(v_{\mathrm{st}}^{\varepsilon_l}(\tau))\,d\tau -\mathsf{E}\int_0^{1}\langle f\rangle(a^{\varepsilon_l}(\tau))\,d\tau \notag \\ &\qquad =\mathsf{E}\int_0^{1} \bigl[f(\Phi_{-\tau\varepsilon_l^{-1}\Lambda}a^{\varepsilon_l}(\tau)) -\langle f\rangle(a^{\varepsilon_l}(\tau))\bigr]\,d\tau \to0 \quad \text{as }\ \varepsilon_l\to0. \end{aligned} \end{equation} \tag{5.8}
By Lemma 3.6
\begin{equation*} \langle f\rangle(a^{\varepsilon_l}(\tau)) =\langle f\rangle (\Phi_{\tau\varepsilon_l^{-1}\Lambda}v_{\mathrm{st}}^{\varepsilon_l}(\tau)) =\langle f\rangle (v_{\mathrm{st}}^{\varepsilon_l}(\tau)) \end{equation*} \notag
for every \tau. Since the process v_{\mathrm{st}}^{\varepsilon_l}(\tau) is stationary, it follows that
\begin{equation*} \mathsf{E}f (v_{\mathrm{st}}^{\varepsilon_l}(\tau)) =\text{Const} \quad\text{and}\quad \mathsf{E}\langle f\rangle (a^{\varepsilon_l}(\tau)) =\mathsf{E}\langle f\rangle (v_{\mathrm{st}}^{\varepsilon_l}(\tau)) =\text{Const}'. \end{equation*} \notag
So from (5.8) we obtain
\begin{equation} \mathsf{E}f(v_{\mathrm{st}}^{\varepsilon_l}(\tau)) -\mathsf{E}\langle f\rangle (a^{\varepsilon_l}(\tau)) \to0 \quad \text{as }\ \varepsilon_l\to0 \end{equation} \tag{5.9}
for all \tau.

For any \tau consider \tilde f_\tau\! =f\mathrel{\circ}\Phi_{\tau \varepsilon_l^{-1}\Lambda}. Then f(a^{\varepsilon_l}(\tau))=\tilde f_\tau (v^{\varepsilon_l}_{\mathrm{st}}(\tau)). Since \langle f \rangle= \langle \tilde f_\tau \rangle by assertion (4) of Lemma 3.6, applying (5.9) to \tilde f_\tau we obtain

\begin{equation*} \lim_{\varepsilon_l\to0} \mathsf{E}f(a^{\varepsilon_l}(\tau)) =\lim_{\varepsilon_l\to0} \mathsf{E}\tilde f_\tau(v_{\mathrm{st}}^{\varepsilon_l} (\tau)) =\lim_{\varepsilon_l\to0} \mathsf{E}\langle\tilde f_\tau \rangle(a^{\varepsilon_l} (\tau)) =\lim_{\varepsilon_l\to0} \mathsf{E}\langle f\rangle(a^{\varepsilon_l} (\tau)). \end{equation*} \notag
From this relation, (5.9), and (5.7) we find that \displaystyle\mathsf{E}f(a^0(\tau))=\int f(v) \,\mu^0(dv) for each \tau and every f as above. This implies the first assertion of the lemma.

(2) Passing to the limit in (5.9) with the use of (5.7) we have

\begin{equation*} \int f (v)\,\mu^0(dv) =\mathsf{E}\langle f\rangle(a^0(\tau)) \quad \forall\tau. \end{equation*} \notag
Using this relation for f:= f\circ \Phi_{\theta\Lambda} and then for f:=f we get that
\begin{equation*} \int f\circ \Phi_{\theta\Lambda} (v) \mu^0(dv)=\mathsf{E}\langle f \circ\Phi_{\theta\Lambda} \rangle (a^0(\tau))=\mathsf{E}\langle f \rangle (a^0(\tau))= \int f (v)\,\mu^0(dv), \end{equation*} \notag
for any \theta\in \mathbb{R} and any \tau, for every bounded Lipschitz function f. This implies the second assertion and completes the proof of the proposition.

If effective equation (4.5) is mixing, then it has a unique stationary measure. In this case the measure \mu^0 in Proposition 5.4 does not depend on a choice of the sequence \varepsilon_l\to0, and so \mu^\varepsilon \rightharpoonup\mu^0 as \varepsilon\to0. Therefore, we have the following result.

Theorem 5.5. If, in addition to Assumption 5.1, the effective equation is mixing and \mu^0 is its unique stationary measure, then

\begin{equation*} \mu^\varepsilon\rightharpoonup\mu^0 \quad\textit{in }\ \mathcal{P}(\mathbb{C}^n) \quad\textit{as }\ \varepsilon\to0. \end{equation*} \notag
Moreover, the measure \mu^0 is invariant under all operators \Phi_{\theta\Lambda}, and the law of the stationary solution of equation (2.6), as expressed in the interaction presentation, converges to the law of the stationary solution of effective equation (4.5).

We recall that Theorem 4.7 and Corollary 4.13 only ensure that on finite time intervals \tau\in [0,T] the actions of solutions of (2.6) converge in law, as \varepsilon\to0, to the actions of solutions of the effective equation with the same initial data. By contrast, the entire stationary measure for equation (2.6) converges to the stationary measure for the effective equation as \varepsilon\to0. This important fact was originally observed in [9] for a special class of equations (2.6).

Corollary 5.6. Under the assumption of Theorem 5.5, for any v_0\in\mathbb{C}^n we have

\begin{equation*} \lim_{\varepsilon\to0}\lim_{\tau\to\infty}\mathcal{D}(v^\varepsilon(\tau;v_0))=\mu^0. \end{equation*} \notag

Proof. Since \lim_{\tau\to\infty}\mathcal{D}(v^\varepsilon(\tau))=\mu^\varepsilon, the result follows from Theorem 5.5.

Remark 5.7. We decomplexify \mathbb{C}^n to obtain \mathbb{R}^{2n} and write the effective equation in the real coordinates \{x=(x_1,\dots,x_{2n})\}:

\begin{equation*} d x_j(\tau)-{ \langle\!\langle P \rangle\!\rangle }_j(x)\,d\tau =\sum_{l=1}^{2n}\mathcal{B}_{jl}\,dW_l(\tau), \qquad j=1,\dots,2n, \end{equation*} \notag
where the W_l are independent standard real Wiener processes. Then the stationary measure \mu^0\in\mathcal{P}(\mathbb{R}^{2n}) satisfies the stationary Fokker–Planck equation
\begin{equation*} \frac{1}{2}\sum_{l=1}^{2n}\sum_{j=1}^{2n} \frac{\partial^2}{\partial x_l\,\partial x_j}(\mathcal{B}_{lj}\mu^0) =\sum_{l=1}^{2n}\frac{\partial}{\partial x_l}( \langle\!\langle P \rangle\!\rangle _l(x)\mu^0) \end{equation*} \notag
in the sense of distributions. If the dispersion matrix \Psi is non-singular, then so is the diffusion \mathcal{B}, and since the drift \langle\!\langle P \rangle\!\rangle (x) is locally Lipschitz, by the standard theory of the Fokker–Planck equation we have \mu^0=\varphi(x)\,dx, where \varphi\in C^1(\mathbb{R}^{2n}). (For example, first of all, Theorem 1.6.8 from [3] implies that \mu^0=\varphi(x)\,dx, where \varphi is a Hölder function, and then, by the usual elliptic regularity, \varphi\in C^1.)

6. The non-resonant case

Assume that the frequency vector \Lambda=(\lambda_1,\dots,\lambda_n) is non-resonant (see (3.7)). In subsection 3.1.2 we saw that in this case the vector field \langle\!\langle P \rangle\!\rangle can be calculated via the averaging (3.8) and commutes with all rotations \Phi_w, w\in \mathbb{R}^n. For any j\in\{1,\dots,n\} set w^{j,t}:=(0,\dots0,t,0,\dots,0) (only the jth entry is non-zero). Consider \langle\!\langle P \rangle\!\rangle _1(z) and write it in the form \langle\!\langle P \rangle\!\rangle _1(z)=z_1R_1(z_1,\dots,z_n), for some complex function R_1. Since for w=w^{1,t} we have \Phi_w(z)=(e^{it}z_1,z_2,\dots,z_n), now the first component in relation (3.10) reads

\begin{equation*} e^{-it}z_1R_1(e^{-it}z,z_1,\dots,z_n) =e^{-it}z_1R_1(z_1,\dots,z_n), \end{equation*} \notag
for every t. So
\begin{equation*} R_1(e^{-it}z_1,z_2,\dots,z_n) \equiv R_1(z_1,\dots,z_n) \end{equation*} \notag
and R_1(z_1,\dots,z_n) depends only on |z_1|, rather than on z_1. In a similar way we verify that R_1(z_1,\dots,z_n) depends only on |z_2|,\dots,|z_n|. Therefore,
\begin{equation*} \langle\!\langle P \rangle\!\rangle _1(z)=z_1R_1(|z_1|,\dots,|z_n|). \end{equation*} \notag
The same is true for any \langle\!\langle P \rangle\!\rangle _j(z). Then we obtain the following statement.

Proposition 6.1. If Assumption 2.1 holds and the frequency vector \Lambda is non-resonant, then \langle\!\langle P \rangle\!\rangle satisfies (3.8), and

(1) { \langle\!\langle P \rangle\!\rangle }_j(a)=a_j R_j(|a_1|,\dots,|a_n|), j=1,\dots,n;

(2) the effective equation reads

\begin{equation} da_j(\tau)-a_jR_j(|a_1|,\dots,|a_n|)\,d\tau =b_j\,d\beta^c_j(\tau), \qquad j=1,\dots, n, \end{equation} \tag{6.1}
where \displaystyle b_j=\biggl(\,\sum_{l=1}^n|\Psi_{jl}|^2\biggr)^{1/2} (and a\mapsto(a_1R_1,\dots,a_nR_n) is a locally Lipschitz vector field);

(3) if a(\tau) is a solution of (6.1), then the vector of its actions

\begin{equation*} I(\tau)=(I_1,\dots,I_n)(\tau)\in \mathbb{R}_+^n \end{equation*} \notag
is a weak solution of the equation
\begin{equation} \begin{gathered} \, dI_j(\tau) -2I_j (\operatorname{Re}R_j) \big(\sqrt{2 I_1},\dots,\sqrt{2 I_n}\big)\,d\tau-b_j^2 \,d\tau = b_j \sqrt{2 I_j} \,d W_j(\tau), \\ I_j(0)=\frac12 |v_{0j}|^2, \qquad j=1,\dots,n, \notag \end{gathered} \end{equation} \tag{6.2}
where \{W_j\} are independent standard real Wiener processes;

(4) if, in addition, the assumptions of Theorem 5.5 are met and the matrix \Psi is non-singular, then the stationary measure \mu^0 has the form d\mu^0=p(I)\,dI\,d\varphi, where p is a continuous function on \mathbb{R}^n_+ which is C^1-smooth away from the boundary \partial \mathbb{R}^n_+.

Proof. Assertion (1) has been proved above, and (2) follows from it and (4.4).

(3) Writing the diffusion in effective equation (4.5) as in (6.1) and applying Itô’s formula (see Appendix C) to I_j=|a_j|^2/2 we get that

\begin{equation} dI_j(\tau) -\frac12\,\big(\bar a_ j \langle\!\langle P \rangle\!\rangle _j+a_j \overline{ \langle\!\langle P \rangle\!\rangle }_j\bigr)\,d\tau - b_j^2 \,d\tau = b_j \langle a_j(\tau), d\beta_j(\tau)\rangle =: b_j |a_j|\,d\xi_j(\tau), \end{equation} \tag{6.3}
where d\xi_j(\tau)= \langle a_j/| a_j|, d\beta_j(\tau)\rangle (see (2.2)) and for a_j=0 we set a_j/| a_j| to be 1. Using (1) we see that the left-hand side of (6.3) is the same as in (6.2). Since \bigl|a_j/| a_j| \bigr|(\tau)\equiv1 for each j, by Lévy’s theorem (for example, see [14; p. 157]) \xi(\tau)=(\xi_1,\dots,\xi_n)(\tau) is a standard n-dimensional Wiener process and (3) follows.

(4) By Theorem 5.5 the stationary measure \mu^0 is invariant under the action of all operators \Phi_{\theta\Lambda}, \theta\in\mathbb{R}. Since the curve \theta\mapsto\theta\Lambda\in\mathbb{T}^n is dense in \mathbb{T}^n, the measure \mu^0 is invariant under all operators \Phi_w, w\in\mathbb{T}^n. As the matrix \Psi is non-singular, we have d\mu^0=\widetilde{p}(z)\,dz by Remark 5.7, where \widetilde{p} is a C^1-smooth function (dz is the volume element in \mathbb{C}^n\simeq \mathbb{R}^{2n}). Let us write z_j=\sqrt{ 2I_j}\,e^{i\varphi_j}. Then d\mu^0=p(I,\varphi)dz. In the coordinates (I,\varphi) the operators \Phi_w has the form (I,\varphi)\mapsto(I,\varphi+w). Since \mu^0 is invariant under all of them, p does not depend on \varphi. So d\mu^0=p(I)\,dz=p(I)\,dI\,d\varphi and (4) holds.

The proposition is proved.

By assertion (3) of this proposition, in the non-resonant case equation (6.2) describes the asymptotic behaviour, as \varepsilon\to0, of the actions I_j of solutions of (2.6). But how regular is this equation? Let

\begin{equation*} r_j=|a_j|=\sqrt{2I_j}, \qquad 1\leqslant j\leqslant n, \end{equation*} \notag
denote the moduli of components of the vector a\in\mathbb{C}^n, consider the smooth polar coordinate mapping
\begin{equation*} \mathbb{R}_+^n\times \mathbb{T}^n \to \mathbb{C}^n, \qquad (r,\varphi)\mapsto (r_1e^{i\varphi_1},\dots,r_ne^{i\varphi_n}), \end{equation*} \notag
and extend it to a mapping
\begin{equation*} \Phi\colon \mathbb{R}^n \times \mathbb{T}^n \to \mathbb{C}^n, \end{equation*} \notag
defined by the same formula. The jth component of the drift in equation (6.2), written in the form (6.3) without the Itô term b_j^2\,d\tau, is \operatorname{Re}\bigl(a_j \overline{ \langle\!\langle P \rangle\!\rangle _j(a)}\bigr). By (3.8), in the polar coordinates we can express it as
\begin{equation*} \begin{aligned} \, & \frac1{(2\pi)^n} \int_{\mathbb{T}^n} \operatorname{Re}\bigl(r_j e^{i\varphi_j} \overline{ e^{iw_j} P_j(r,\varphi-w)}\bigr)\,dw \\ &\qquad =\frac{1}{(2\pi)^n} \int_{\mathbb{T}^n} \operatorname{Re}\bigl(r_j e^{i\theta_j} \overline P_j(r, \theta) \bigr)\,d\theta =: F_j(r), \qquad r\in\mathbb{R}^n, \end{aligned} \end{equation*} \notag
where F_j(r) is a continuous function, which vanishes with r_j. Since the integrand in the second integral does not change if for some l=1,\dots,n we replace r_l by -r_l and \theta_l by \theta_l+\pi, F_j(r) is even in each variable r_l. So it can be expressed as follows:
\begin{equation*} F_j (r_1,\dots,r_n) =f_j (r_1^2,\dots,r_n^2), \qquad f_j \in C(\mathbb{R}^n_+), \end{equation*} \notag
where f_j(x_1,\dots,x_n) vanishes with x_j. Now assume that the vector field P is C^2-smooth. In this case the integrand in the integral for F_j is C^2-smooth with respect to (r,\theta) \in \mathbb{R}^n\times \mathbb{T}^n, and F_j is a C^2-smooth function of r. Then by a result of Whitney (see Theorem 1 in [27] for s=1 and the remark concluding that paper) f_j extends to \mathbb{R}^n in such a way that f_j(x) is C^1-smooth in each variable x_l and (\partial/\partial x_l) f_j(x) is continuous on \mathbb{R}^n. So f_j is C^1-smooth. Since r_j^2=2I_j, we have established the following result.

Proposition 6.2. If the frequency vector \Lambda is non-resonant and P is C^2-smooth, then equation (6.2) can be written as

\begin{equation} dI_j(\tau)- G_j(I_1,\dots,I_n) \,d\tau-b_j^2 \,d\tau = b_j \sqrt{2 I_j} \,d W_j(\tau), \quad 1\leqslant j\leqslant n, \end{equation} \tag{6.4}
where G is a C^1-smooth vector field such that G_j(I) vanishes with I_j for each j.

We stress that, although, due to the square-root singularity in the dispersion, the averaged I-equation (6.4) is a stochastic equation without uniqueness of a solution, the limiting law \mathcal{D}(I(\,{\cdot}\,)) for the actions of solutions of (2.6) is still uniquely defined by Corollary 4.13.

7. Convergence uniform in time

In this section we investigate the convergence in distribution, uniformly in time, of solutions of (2.11) to ones of effective equation (4.5), with respect to the dual-Lipschitz metric (see Definition 4.10). These results are finite-dimensional versions of those in [11] for stochastic PDEs. Throughout this section the following assumption hold.

Assumption 7.1. The first two parts of Assumption 5.1 hold, and the following condition is fulfilled instead of (c).

  • {\rm (c')} Effective equation (4.5) is mixing with stationary measure \mu^0. For any solution a(\tau), \tau\geqslant0, of it such that \mathcal{D}(a(0))=:\mu and \langle|z|^{2m_0'},\mu(dz)\rangle=\mathsf{E}|a(0)|^{2m_0'}\leqslant M' for some M'>0 (recall the notation (4.22)) we have
    \begin{equation} \|\mathcal{D}(a(\tau))-\mu^0\|_{L,\mathbb{C}^n}^*\leqslant g_{M'}(\tau, d) \quad \forall \tau\geqslant0 \quad \text{if }\ \|\mu-\mu^0\|_{L,\mathbb{C}^n}^*\leqslant d\leqslant2. \end{equation} \tag{7.1}
    Here the function g\colon \mathbb{R}_+^3\to\mathbb{R}_+, (\tau, d, M) \mapsto g_M(\tau,d), is continuous, vanishes with d, converges to zero as \tau\to\infty, and for each fixed M\geqslant0 the function (\tau, d)\mapsto g_M(\tau, d) is uniformly continuous in d for (\tau,d)\in [0,\infty)\times[0,2] (so that g_M extends to a continuous function on [0,\infty]\times [0,2] that vanishes for \tau=\infty and for d=0).

We emphasize that now we assume mixing for the effective equation, but not for the original equation (2.6). Since Assumption 7.1 implies Assumption 2.1, the assertions in Section 4 hold for solutions of equations (2.11), which we analyze in this section, for any T>0.

Proposition 7.2. Assume that the first two parts of Assumption 5.1 hold, equation (4.5) is mixing with stationary measure \mu^0, and for each M>0 and any v^1,v^2\in \overline{B}_M(\mathbb{C}^n)

\begin{equation} \|\mathcal{D} a(\tau;v^1)-\mathcal{D} a(\tau;v^2) \|_{L,\mathbb{C}^n}^* \leqslant \mathfrak{g}_M(\tau), \end{equation} \tag{7.2}
where \mathfrak{g} is a continuous function of (M,\tau) that tends to zero as \tau \to\infty and is a non-decreasing function of M. Then condition {\rm (c')} holds for some function g.

The proposition is proved below, at the end of this section.

Note that (7.2) holds (for \mathfrak{g} replaced by 2\mathfrak{g}) if

\begin{equation} \|\mathcal{D} a(\tau;v^1)-\mu^0 \|_{L,\mathbb{C}^n}^* \leqslant \mathfrak{g}_M(\tau) \quad \forall\,v^1\in\overline{B}_M(\mathbb{C}^n). \end{equation} \tag{7.3}
Usually, a proof of mixing for (4.5) actually establishes (7.3). So condition {\rm (c')} is a rather mild restriction.

Example 7.3. If the assumptions of Proposition 9.4 below are fulfilled, then (7.2) is satisfied, since in this case (7.3) holds for \mathfrak{g}_M(\tau)=\overline{V}(M)e^{-c\tau}. Here c>0 is a constant and \overline{V}(M)=\max\{V(x)\colon x\in \overline{B}_M(\mathbb{C}^n)\}, where V(x) is the Lyapunov function as in Proposition 9.3; see, for example, [22; Theorem 2.5] and [20; § 3.3].

Theorem 7.4. Under Assumption 7.1, for any v_0\in\mathbb{C}^n

\begin{equation*} \lim_{\varepsilon\to0}\, \sup_{\tau\geqslant0} \|\mathcal{D}(a^\varepsilon(\tau;v_0))-\mathcal{D}(a^{0}(\tau;v_0))\|_{L,\mathbb{C}^n}^* =0, \end{equation*} \notag
where a^\varepsilon(\tau;v_0) and a^{0}(\tau;v_0) solve (2.11) and (4.5), respectively, for the same initial condition a^\varepsilon(0;v_0)=a^{0}(0;v_0)=v_0.

Proof. Since v_0 is fixed, we abbreviate a^\varepsilon(\tau; v_0) to a^\varepsilon(\tau). By (5.1)
\begin{equation} \mathsf{E}| a^\varepsilon(\tau)|^{2m'_0}\leqslant C_{m'_0}(|v_0|) =:M^* \quad \forall\, \tau\geqslant0. \end{equation} \tag{7.4}
By (7.4) and (4.19) we have3
\begin{equation} \mathsf{E}| a^{0}(\tau;v_0)|^{2m'_0} =\langle |a|^{2m'_0}, \mathcal{D} a^0(\tau;v_0)\rangle \leqslant M^* \quad \forall\,\tau\geqslant0. \end{equation} \tag{7.5}
Since \mathcal{D} a^0(\tau;0) \rightharpoonup \mu^0 as {\tau}\to\infty, from the above estimate for v_0=0 we get that
\begin{equation} \langle |a|^{2m'_0}, \mu^0 \rangle \leqslant C_{m'_0} (0) =: C_{m'_0}. \end{equation} \tag{7.6}
For later use we note that, since we have only used parts (a) and (b) of Assumption 5.1 and the fact that equation (4.5) is mixing to derive estimates (7.5) and (7.6), these two estimates hold under the assumptions of Proposition 7.2.

The constants in the estimates below depend on M^*, but this dependence is usually not indicated. For any T\geqslant0 we denote by a_T^0(\tau) a weak solution of effective equation (4.5) such that

\begin{equation*} \mathcal{D} a^0_T(0) =\mathcal{D} a^\varepsilon(T). \end{equation*} \notag
Note that a^0_T(\tau) depends on \varepsilon and that a^0_0(\tau)=a^0(\tau; v_0).

Lemma 7.5. Fix any \delta>0. Then the following assertions are true.

(1) For any T>0 there exists \varepsilon_1 =\varepsilon_1(\delta,T)>0 such that if \varepsilon\leqslant \varepsilon_1, then

\begin{equation} \sup_{\tau\in[0,T]} \|\mathcal{D}(a^\varepsilon(T'+\tau)) -\mathcal{D}(a_{T'}^{0}(\tau))\|_{L,\mathbb{C}^n}^* \leqslant\frac{\delta}{2} \quad \forall\, T'\geqslant0. \end{equation} \tag{7.7}

(2) Choose T^*=T^*(\delta)>0 such that g_{M^*}(T,2)\leqslant\delta/4 for each T\geqslant T^*. Then there exists \varepsilon_2=\varepsilon_2(\delta)>0 such that if \varepsilon\leqslant\varepsilon_2 and \|\mathcal{D}(a^\varepsilon(T'))-\mu^0\|_{L,\mathbb{C}^n}^*\leqslant\delta for some T'\geqslant0, then also

\begin{equation} \|\mathcal{D}(a^\varepsilon(T'+T^*))-\mu^0\|_{L,\mathbb{C}^n}^* \leqslant\delta, \end{equation} \tag{7.8}
and
\begin{equation} \sup_{\tau\in[T',T'+T^*]} \| \mathcal{D}(a^\varepsilon(\tau))-\mu^0\|_{L,\mathbb{C}^n}^* \leqslant\frac{\delta}{2} +\sup_{\tau\geqslant0}g_{M^*}(\tau,\delta). \end{equation} \tag{7.9}

Below we abbreviate \|\cdot\|^*_{L,\mathbb{C}^n} to \|\cdot\|^*_L. Given a measure \nu\in\mathcal{P}(\mathbb{C}^n), denote by a^\varepsilon(\tau;\nu) a weak solution of equation (2.11) such that \mathcal{D} (a^\varepsilon(0)) =\nu, and define a^0(\tau;\nu) similarly. Since equation (2.11) defines a Markov process in \mathbb{C}^n (for example, see [14; § 5.4.C] and [17; § 3.3]), we have

\begin{equation*} \mathcal{D} a^\varepsilon(\tau;\nu) =\int_{\mathbb{C}^n}\mathcal{D}a^\varepsilon(\tau;v)\,\nu(dv), \end{equation*} \notag
and a similar relation holds for \mathcal{D}a^0(\tau;\nu).

Proof of Lemma 7.5. Set \nu^\varepsilon=\mathcal{D} (a^\varepsilon(T')). Then
\begin{equation} \mathcal{D} (a^\varepsilon(T'+\tau)) =\mathcal{D} (a^\varepsilon(\tau; \nu^\varepsilon)), \qquad \mathcal{D} (a^0_{T'}(\tau)) = \mathcal{D} (a^0(\tau; \nu^\varepsilon)). \end{equation} \tag{7.10}
By (7.4), for any \delta>0 there exists K_\delta>0 such that for each \varepsilon, \nu^\varepsilon(\mathbb{C}^n \setminus \overline{B}_{K_\delta} )\leqslant \delta/8, where \overline{B}_{K_\delta}:=\overline{B}_{K_\delta}(\mathbb{C}^n). So
\begin{equation*} \nu^\varepsilon =A^\varepsilon \nu^\varepsilon_\delta +\bar A^\varepsilon \bar\nu^\varepsilon_\delta, \qquad A^\varepsilon=\nu^\varepsilon(\overline{B}_{K_\delta}), \quad \bar A^\varepsilon =\nu^\varepsilon(\mathbb{C}^n\setminus\overline{B}_{K_\delta}), \end{equation*} \notag
where \nu^\varepsilon_\delta and \bar\nu^\varepsilon_\delta are the conditional probabilities \nu^\varepsilon(\,\cdot \mid \overline{B}_{K_\delta}) and \nu^\varepsilon(\,\cdot \mid\mathbb{C}^n\setminus\overline{B}_{K_\delta}). Accordingly,
\begin{equation} \mathcal{D} (a^\kappa (\tau; \nu^\varepsilon) ) = A^\varepsilon \mathcal{D} (a^\kappa (\tau; \nu^\varepsilon_\delta) ) +\bar A^\varepsilon \mathcal{D} (a^\kappa (\tau; \bar\nu^\varepsilon_\delta)), \end{equation} \tag{7.11}
where \kappa=\varepsilon or \kappa=0. Therefore,
\begin{equation*} \begin{aligned} \, \|\mathcal{D}(a^\varepsilon (\tau; \nu^\varepsilon)) -\mathcal{D}(a^0 (\tau; \nu^\varepsilon))\|_L^* & \leqslant A^\varepsilon\|\mathcal{D}(a^\varepsilon(\tau;\nu^\varepsilon_\delta)) -\mathcal{D} (a^0 (\tau; \nu^\varepsilon_\delta))\|_L^* \\ &\qquad +\bar A^\varepsilon\|\mathcal{D}(a^\varepsilon(\tau;\bar\nu^\varepsilon_\delta)) -\mathcal{D}(a^0(\tau;\bar\nu^\varepsilon_\delta))\|_L^*. \end{aligned} \end{equation*} \notag
The second term on the right-hand side is obviously bounded by 2\bar A^\varepsilon\leqslant\delta/4. On the other hand, by Proposition 4.12 and (4.23) there exists \varepsilon_1>0, depending only on K_\delta and T, such that for 0\leqslant \tau\leqslant T and \varepsilon\in(0,\varepsilon_1] the first term on the right-hand side is \leqslant\delta/4. In view of (7.10) this proves the first assertion.

To prove the second assertion we choose \varepsilon_2= \varepsilon_1(\delta/2, T^*(\delta)). Then from (7.7), (7.4), (7.1), and the definition of T^*, for \varepsilon\leqslant\varepsilon_2 we obtain

\begin{equation*} \begin{aligned} \, \|\mathcal{D}(a^\varepsilon(T'+T^*))-\mu^0\|_L^* & \leqslant\|\mathcal{D}(a^\varepsilon(T'+T^*)) -\mathcal{D}(a^{0}_{T'}(T^*))\|_L^* \\ &\qquad +\|\mathcal{D}(a^{0}_{T'}(T^*))-\mu^0\|_L^*\leqslant \delta. \end{aligned} \end{equation*} \notag
This proves (7.8). Next, in view of (7.7) and (7.1), (7.5),
\begin{equation*} \begin{aligned} \, \sup_{\theta\in[0,T^*]}\|\mathcal{D}(a^\varepsilon(T'+\theta))-\mu^0\|_L^* & \leqslant\sup_{\theta\in[0,T^*]} \|\mathcal{D}(a^\varepsilon(T'+\theta)) -\mathcal{D}(a_{T'}^{0}(\theta))\|_L^* \\ &\qquad +\sup_{\theta\in[0,T^*]}\| \mathcal{D}(a_{T'}^{0}(\theta))-\mu^0\|_L^* \\ & \leqslant\frac{\delta}{2}+\max_{\theta \in[0,T^*] }g_{M^*} (\theta, \delta). \end{aligned} \end{equation*} \notag
This implies (7.9). The lemma is proved.

Now we continue the proof of the theorem. Fix an arbitrary \delta>0 and take some \delta_1, 0<\delta_1\leqslant \delta/4. In the proof below the functions \varepsilon_1, \varepsilon_2, and T^* are as in Lemma 7.5.

(i) By the definition of T^*, (7.1), and (7.4),

\begin{equation} \|\mathcal{D}(a^0_{T'}(\tau))-\mu^0\|_L^* \leqslant \delta_1 \quad \forall\, \tau\geqslant T^*(\delta_1), \end{equation} \tag{7.12}
for any T'\geqslant0. We abbreviate T^*(\delta_1) to T^*.

(ii) By (7.7), if \varepsilon\leqslant \varepsilon_1=\varepsilon_1(\delta_1, T^*)>0, then

\begin{equation} \sup_{0 \leqslant \tau \leqslant T^*} \bigl\|\mathcal{D}(a^\varepsilon(\tau)) -\mathcal{D}(a^0(\tau;v_0))\bigr\|_L^* \leqslant \frac{\delta_1}2. \end{equation} \tag{7.13}
In particular, in view of (7.12) for T'=0,
\begin{equation} \|\mathcal{D}(a^\varepsilon(T^*))-\mu^0 \|_L^* < 2\delta_1. \end{equation} \tag{7.14}

(iii) By (7.14) and (7.8) for \delta:=2\delta_1 and T'=nT^*, n=1,2,\dots , we obtain recursively the inequalities

\begin{equation} \|\mathcal{D}(a^\varepsilon(nT^*))-\mu^0\|_L^* \leqslant 2\delta_1 \quad \forall\, n\in\mathbb{N}, \end{equation} \tag{7.15}
for \varepsilon\leqslant\varepsilon_2=\varepsilon_2(2\delta_1).

(iv) Now by (7.15) and (7.9) for \delta:=2\delta_1, for any n\in\mathbb{N} and 0\leqslant \theta\leqslant T^* we have

\begin{equation} \|\mathcal{D}(a^\varepsilon (nT^* +\theta))-\mu^0 \|_L^* \leqslant \delta_1+\sup_{\theta\geqslant0} g_{M^*} (\theta, 2\delta_1) \end{equation} \tag{7.16}
if \varepsilon\leqslant\varepsilon_2(2\delta_1).

(v) Finally, let \varepsilon\leqslant\varepsilon_\# (\delta_1) =\min\{\varepsilon_1(\delta_1, T^*),\varepsilon_2(2\delta_1)\}; then by (7.13) (if \tau\leqslant T^*) and by (7.12)+(7.16) (if \tau\geqslant T^*) we have

\begin{equation*} \|\mathcal{D}(a^\varepsilon(\tau))-\mathcal{D}(a^0 (\tau;v_0))\|_L^* \leqslant 2\delta_1+\sup_{\theta\geqslant0} g_{M^*}(\theta, 2\delta_1) \quad \forall\, \tau\geqslant0. \end{equation*} \notag
By the assumption imposed on the function g_{M} in {\rm (c')}, g_{M}(t, d) is uniformly continuous in d and vanishes at d=0. So there exists \delta^*=\delta^*(\delta), which we may assume to be \leqslant \delta/4, such that if \delta_1 =\delta^*, then g_{M^*}(\theta, 2\delta_1) \leqslant \delta/2 for every \theta\geqslant0. Then by the above estimate
\begin{equation*} \|\mathcal{D}(a^\varepsilon(\tau))-\mathcal{D} a^0 (\tau;v_0))\|_L^* \leqslant \delta \quad \text{if }\ \varepsilon \leqslant\varepsilon_*(\delta) :=\varepsilon_\# (\delta^*(\delta))>0, \end{equation*} \notag
for every positive \delta. Theorem 7.4 is proved.

Since the interaction representation does not change actions, for the action variables of solutions of the original equations (2.6) we have the following assertion.

Corollary 7.6. Under the assumptions of Theorem 7.4 the actions of a solution v^\varepsilon(\tau; v_0) of (2.6) that equals v_0 at \tau=0 satisfy

\begin{equation*} \lim_{\varepsilon\to0} \sup_{\tau\geqslant0}\|\mathcal{D}(I(v^\varepsilon(\tau;v_0))) -\mathcal{D}(I(a^{0}(\tau;v_0)))\|_{L,\mathbb{C}^n}^* =0. \end{equation*} \notag

In [9; Theorem 2.9] the assertion of this corollary was proved for a class of systems (2.6). The proof in [9] is based on the observation that the mixing rate in the corresponding equation (2.6) is uniform in \varepsilon for 0<\varepsilon\leqslant1. This is a delicate property, which is more difficult to establish than (\mathrm{c}') in Assumption 7.1. We also note that Theorem 7.4 immediately implies that if equations (2.11) are mixing with stationary measures \mu^\varepsilon, then \mu^\varepsilon \rightharpoonup \mu^0. Cf. Theorem 5.5.

Proof of Proposition 7.2. In this subsection we write solutions a^0(\tau; v) of effective equation (4.5) as a(\tau;v). We prove the assertion of Proposition 7.2 in four steps.

(i) At this step, for any non-random v^1, v^2 \in \overline{B}_M(\mathbb{C}^n) we use the notation a_j(\tau) := a(\tau; v^j), j=1,2, and examine the distance \|\mathcal{D}(a_1(\tau))-\mathcal{D}(a_2(\tau))\|_L^* as a function of \tau and |v^1-v^2|. Set w(\tau)=a_1(\tau) -a_2(\tau) and assume that |v^1-v^2|\leqslant \bar d for some \bar d\geqslant0. Then

\begin{equation*} \dot w ={ \langle\!\langle P \rangle\!\rangle }(a_1)-{ \langle\!\langle P \rangle\!\rangle }(a_2). \end{equation*} \notag
Since by Lemma 3.2 and Assumption 2.1, (a),
\begin{equation*} |{ \langle\!\langle P \rangle\!\rangle }(a_1(\tau))-{ \langle\!\langle P \rangle\!\rangle }(a_2(\tau))| \leqslant C |w(\tau)|\, X(\tau), \ \text{where } X(\tau)=1+|a_1(\tau)|^{ m_0}\vee |a_2(\tau)|^{m_0}, \end{equation*} \notag
we have (d/d\tau)|w|^2\leqslant 2CX(\tau)|w|^2, where |w(0)|\leqslant \bar d. So
\begin{equation} |w(\tau)| \leqslant\bar d \exp\biggl(C\int_0^\tau X(l)\,dl\biggr). \end{equation} \tag{7.17}
Set Y(T)=\sup_{0\leqslant \tau \leqslant T } |X(\tau)|. By (5.1) estimate (2.7) holds for C_{m_0'}(|v_0|, T)=C_{m_0'}(M) (T+1). Hence we have
\begin{equation*} \mathsf{E}Y(T) \leqslant (C_{m'_0}(M)+1) (T+1) \end{equation*} \notag
by Remark 4.8, (ii) (since m_0'>(m_0\vee1)).

For K>0 denote the event \{Y(T)\geqslant K\} by \Omega_K(T). Then

\begin{equation*} \mathsf{P}(\Omega_K(T))\leqslant (C_{m'_0}(M)+1)(T+1)K^{-1}, \end{equation*} \notag
and
\begin{equation*} \int_0^\tau X(l)\,dl \leqslant \tau K \end{equation*} \notag
for \omega\notin \Omega_K(T). From this and (7.17) we see that if f is such that |f|\leqslant1 and \operatorname{Lip}(f)\leqslant1, then
\begin{equation} \begin{aligned} \, & \mathsf{E}\bigl(f(a_1(\tau))-f(a_2(\tau))\bigr) \leqslant 2\mathsf{P}(\Omega_K(\tau))+\bar d e^{C\tau K} \notag \\ &\qquad = 2(C_{m'_0}(M)+1) (\tau +1) K^{-1}+\bar d e^{C\tau K} \quad \forall\,K>0. \end{aligned} \end{equation} \tag{7.18}
Let us denote by g^1_M(\tau,\bar d) the function on the right-hand side for K=\ln\ln(\bar d^{-1}\vee 3). This is a continuous function of (\tau,\bar d,M)\in\mathbb{R}_+^3, which vanishes for \bar d=0. By (7.2) and (7.18),
\begin{equation} \begin{aligned} \, & \|\mathcal{D}(a(\tau;v^1))-\mathcal{D}(a(\tau;v^2)) \|_L^* = \| \mathcal{D}(a_1(\tau))-\mathcal{D}(a_2(\tau)) \|_L^* \notag \\ &\qquad\qquad \leqslant\mathfrak{g}_M(\tau) \wedge g_M^1(\tau,\bar d) \wedge 2 =: g_M^2(\tau,\bar d) \quad \text{if }\ |v^1-v^2|\leqslant \bar d. \end{aligned} \end{equation} \tag{7.19}
The function g_M^2 is continuous in the variables (\tau,\bar d, M), vanishes with \bar d, and tends to zero as \tau \to\infty since \mathfrak g_M(\tau) does.

(ii) At this step we consider a solution a^0(\tau;\mu)=:a(\tau;\mu) of effective equation (4.5) for \mathcal{D}(a(0))=\mu as in Assumption 7.1, (\mathrm{c}') and examine the left-hand side of (7.1) as a function of \tau. For any M>0 consider the conditional probabilities

\begin{equation*} \mu_M=\mathsf{P}(\cdot\mid \overline{B}_M(\mathbb{C}^n)) \quad\text{and}\quad \overline{\mu}_M =\mathsf{P}(\cdot\mid \mathbb{C}^n\setminus \overline{B}_M(\mathbb{C}^n)). \end{equation*} \notag
Then
\begin{equation} \mathcal{D}(a(\tau;\mu)) =A_M\mathcal{D}(a(\tau;\mu_M) ) +\bar A_M\mathcal{D}(a(\tau;\overline{\mu}_M)), \end{equation} \tag{7.20}
where
\begin{equation*} A_M=\mu(\overline{B}_M(\mathbb{C}^n))\quad \text{and}\quad \bar A_M= \mu(\mathbb{C}^n\setminus \overline{B}_M(\mathbb{C}^n)) \end{equation*} \notag
(cf. (7.11)). As \mathsf{E}|a(0)|^{2m_0'} \leqslant M', thus \bar A_M=\mathsf{P}\{a(0)>M\}\leqslant M'/M^{2m_0'}. Since equation (4.5) is assumed to be mixing, we have
\begin{equation*} \|\mathcal{D} (a(\tau; 0)) -\mu^0\|_L^* \leqslant \overline{g}(\tau), \end{equation*} \notag
where \overline{g}\geqslant0 is a continuous function which tends to 0 as \tau\to\infty. So in view of (7.2),
\begin{equation*} \|\mathcal{D} (a(\tau; v)) -\mu^0\|_L^* \leqslant \mathfrak{g}_M(\tau)+\overline{g}(\tau) =:\widetilde{g}_M(\tau) \quad \forall\,v\in \overline{B}_M(\mathbb{C}^n). \end{equation*} \notag
Thus,
\begin{equation*} \begin{aligned} \, \|\mathcal{D} (a(\tau;\mu_M)) -\mu^0\|_L^* & =\biggl\| \int[\mathcal{D} (a(\tau; v))]\,\mu_M(dv) -\mu^0 \biggr\|_L^* \\ & \leqslant\int\|\mathcal{D} (a(\tau; v)) -\mu^0\|_L^*\,\mu_M(dv) \leqslant \widetilde{g}_M(\tau). \end{aligned} \end{equation*} \notag
Therefore, by (7.20),
\begin{equation*} \begin{aligned} \, \|\mathcal{D}(a(\tau;\mu))-\mu^0\|_L^* & \leqslant A_M \|\mathcal{D}(a(\tau;\mu_M))-\mu^0\|_L^* +\bar A_M \|\mathcal{D}(a(\tau;\overline{\mu}_M))-\mu^0\|_L^* \\ & \leqslant \|\mathcal{D}(a(\tau;\mu_M))-\mu^0\|_L^*+2\bar A_M \leqslant \widetilde{g}_M(\tau) +2\,\frac{M'}{M^{2m'_0}} \end{aligned} \end{equation*} \notag
for any M>0 and \tau\geqslant0. Let M_1(\tau)>0 be a continuous non-decreasing function, growing to infinity with \tau, and such that \widetilde{g}_{M_1(\tau)}(\tau)\to0 as \tau\to\infty (it exists since \widetilde{g}_M(\tau) is a continuous function of (M,\tau), tending to zero as \tau\to\infty for each fixed M). Then
\begin{equation} \|\mathcal{D}(a(\tau;\mu))-\mu^0\|_L^* \leqslant 2\,\frac{M'}{M_1(\tau)^{2m_0'}} +\widetilde{g}_{M_1(\tau)}(\tau) =:\widehat{g}_{M'}(\tau). \end{equation} \tag{7.21}
Clearly, \widehat{g}_{M'}(\tau)\geqslant0 is a continuous function on \mathbb{R}_+^2, which converges to 0 as \tau\to\infty.

(iii) Now we examine the left-hand side of (7.1) as a function of \tau and d. Recall that the Kantorovich distance between two measures \nu_1 and \nu_2 on \mathbb{C}^n is

\begin{equation*} \|\nu_1-\nu_2\|_{\mathrm{K}} =\sup_{\operatorname{Lip}(f)\leqslant1} (\langle f, \nu_1\rangle-\langle f,\nu_2\rangle) \leqslant\infty. \end{equation*} \notag
Obviously \|\nu_1-\nu_2\|_L^* \leqslant\|\nu_1 -\nu_2\|_{\mathrm{K}}. By (7.6) and the assumption on \mu the 2m_0'-moments of \mu and \mu^0 are bounded by M'\vee C_{m_0'}, so that
\begin{equation} \|\mu-\mu^0\|_{\mathrm{K}}\leqslant\widetilde{C} (M'\vee C_{m_0'}) ^{\gamma_1} d^{\gamma_2} :=D, \qquad \gamma_1=\frac{1}{2m'_0}, \quad \gamma_2=\frac{2m'_0-1}{2m_0'} \end{equation} \tag{7.22}
(see [5; § 11.4] and [26; Chap. 7]). Hence by the Kantorovich–Rubinstein theorem (see [26], [5]) there exist random variables \xi and \xi_0, defined on a new probability space (\Omega',\mathcal{F}',\mathsf{P}'), such that \mathcal{D}(\xi)=\mu, \mathcal{D}(\xi_0)=\mu^0, and
\begin{equation} \mathsf{E}\,|\xi_1 -\xi_0| =\|\mu- \mu^0\|_{\mathrm{K}}. \end{equation} \tag{7.23}
Then using (7.19) and denoting by a_{\mathrm{st}}(\tau) a stationary solution of equation (4.5) such that \mathcal{D}a_{\mathrm{st}}(\tau))\equiv\mu^0, we have
\begin{equation*} \begin{aligned} \, \|\mathcal{D} (a(\tau))-\mu^0\|_L^* & =\|\mathcal{D} (a(\tau; a(0)) )- \mathcal{D} (a_{\mathrm{st}}(\tau))\|_L^* \\ & \leqslant \mathsf{E}^{\omega'} \| \mathcal{D}(a(\tau; \xi^{\omega'}) ) - \mathcal{D} (a(\tau; \xi_0^{\omega'} )) \|_L^* \\ & \leqslant \mathsf{E}^{\omega'} g_{\overline{M}}^2(\tau, |\xi^{\omega'} -\xi_0^{\omega'}|), \qquad \overline{M} =\overline{M}^{\omega'} =|\xi^{\omega'}|\vee |\xi_0^{\omega'}|. \end{aligned} \end{equation*} \notag
As \mathsf{E}^{\omega'}\, {\overline{M}}^{2m'_0} \leqslant 2 (M'\vee C_{m_0'}) by (7.5) and the assumption on \mu, setting Q'_K=\{\overline{M}\geqslant K\}\subset\Omega', for any K>0 we have
\begin{equation*} \mathsf{P}^{\omega'} (Q'_K) \leqslant 2K^{-2m_0'} (M'\vee C_{m_0'}). \end{equation*} \notag
Since g_M^2\leqslant 2 and for \omega'\notin Q'_K we have |\xi^{\omega'}|,|\xi_0^{\omega'}|\leqslant K, it follows that
\begin{equation*} \| \mathcal{D} (a(\tau) )- \mu^0\|_L^* \leqslant 4K^{-2m_0'} (M'\vee C_{m_0'}) +\mathsf{E} ^{\omega'} g_K^2(\tau, |\xi^{\omega'} -\xi_0^{\omega'}|). \end{equation*} \notag
Now let \Omega'_r =\{ |\xi^{\omega'} -\xi_0^{\omega'}| \geqslant r\}. Then \mathsf{P}^{\omega'}\Omega'_r\leqslant D r^{-1} by (7.23) and (7.22). Hence
\begin{equation} \|\mathcal{D}(a(\tau))-\mu^0\|_L^* \leqslant 4K^{-2m'_0} (M'\vee C_{m'_0})+2D r^{-1}+g_K^2(\tau,r) \quad \forall\,\tau\geqslant0, \ \ \forall\, K,r>0. \end{equation} \tag{7.24}

(iv) The end of the proof. Let g_0(s) be a positive continuous function on \mathbb{R}_+ such that g_0(s)\to\infty as s\to+\infty and |C_{m_0'}(g_0(s))(\ln\ln s)^{-1/2}|\leqslant 2C_{m_0'}(0) for s\geqslant3. Taking r=D^{1/2} and choosing K=g_0(r^{-1}) on the right-hand side of (7.24) we denote this right-hand side by g_{M'}^3(\tau,r) (so that we have substituted D=r^2 and K=g_0(r^{-1}) into (7.24)). By (7.24) and the definition of g_M^2 (see (7.18) and (7.19)) we have

\begin{equation*} \begin{aligned} \, g_{M'}^3(\tau,r) & \leqslant 4(g_0(r^{-1}))^{-2m'_0}(M'\vee C_{m'_0})+2r \\ &\qquad +2\bigl(C_{m'_0}(g(r^{-1}))+1\bigr)(\ln\ln(r^{-1}\vee 3))^{-1} +r\exp(C\tau\ln\ln (r^{-1}\vee3)). \end{aligned} \end{equation*} \notag
By the choice of g_0, as r\to0, the first, second, and fourth terms converge to zero. The third term is \leqslant4(C_{m'_0}(0)+1)(\ln\ln(r^{-1}))^{-1/2} for r\leqslant 1/3, so it also tends to zero with r. Hence g_{M'}^3(\tau,r) defines a continuous function on \mathbb{R}_+^3 which vanishes with r. Using the expression for D in (7.22) we can write r=D^{1/2} as r=R_{M'}(d), where R is a continuous function \mathbb{R}_+^2 \to \mathbb{R}_+, which is non-decreasing in d and vanishes with d. Setting g^4_{M'} (\tau, d)=g^3_{M'} (\tau, R_{M'}(d \wedge 2)), from the above we obtain
\begin{equation*} \| \mathcal{D}(a(\tau))-\mu^0\|_L^* \leqslant g^4_{M'} (\tau, d ) \quad\text{if }\ \| \mu -\mu^0\|_L^* \leqslant d\leqslant2. \end{equation*} \notag
Finally, recalling (7.21) we arrive at (7.1) for g=g^5, where
\begin{equation*} g^5_{M'} (\tau, d) =g^4_{M'} (\tau, d) \wedge \widehat{g}_{M'}(\tau) \wedge 2. \end{equation*} \notag
The function g^5 is continuous, vanishes with d, and converges to zero as \tau\to\infty. For any fixed M'>0 this convergence is uniform in d due to the term \widehat{g}_{M'}(\tau). So for fixed M'>0 the function (\tau,d)\mapsto g_{M'}^5(\tau,d) extends to a continuous function on the compact set [0,\infty]\times[0,2], where it vanishes for \tau=\infty. Thus, g_{M'}^5 is uniformly continuous in d. Proposition 7.2 is proved.

8. Averaging for systems with general noises

In this section we sketch a proof of Theorem 4.7 for equations (1.1) with the general stochastic term \sqrt\varepsilon\,\mathcal{B}(v)\,dW. The proof follows the argument in Section 4, but an extra difficulty appears in the case of equations with non-additive degenerate noises.

Consider the v-equation (2.6) with a general (possibly non-additive) noise and decomplexify it by writing the components v_k(\tau) as (\widetilde{v}_{2k-1}(\tau), \widetilde{v}_{2k}(\tau))\in\mathbb{R}^2, k=1,\dots,n. Now a solution v(\tau) is a vector in \mathbb{R}^{2n}, and the equation reads

\begin{equation} dv(\tau) +\varepsilon^{-1} Av(\tau) \,d\tau =P(v(\tau))\,d\tau+\mathcal{B}(v(\tau)) \,d\beta(\tau), \qquad v(0)=v_0\in\mathbb{R}^{2n}. \end{equation} \tag{8.1}
Here A is a block-diagonal matrix as in Section 2, \mathcal{B}(v) is a real 2n\times n_2 matrix, and \beta(\tau)=(\beta_1(\tau),\dots,\beta_{n_2}(\tau)), where \{\beta_j(\tau)\} are independent standard real Wiener processes. Note that in the real coordinates in \mathbb{R}^{2n} \simeq\mathbb{C}^n, for w\in\mathbb{R}^n the operator \Phi_w in (2.10) is given by the block-diagonal matrix such that its jth diagonal block, j=1,\dots, n, is the 2\times2 matrix of rotation through an angle of w_j.

In this section we make the following assumption.

Assumption 8.1. The drift P belongs to \operatorname{Lip}_{m_0}(\mathbb{R}^{2n},\mathbb{R}^{2n}), the matrix function \mathcal{B}(v) belongs to \operatorname{Lip}_{m_0} \bigl(\mathbb{R}^{2n},\operatorname{Mat}(2n\times n_2)\bigr), equation (8.1) is well posed, and its solutions satisfy (2.7).

Going over to the interaction representation v(\tau)=\Phi_{\tau \varepsilon^{-1} \Lambda} a(\tau) we rewrite the equation as

\begin{equation} da(\tau) =\Phi_{\tau \varepsilon^{-1} \Lambda} P(v(\tau))\,d\tau +\Phi_{\tau \varepsilon^{-1} \Lambda} \mathcal{B}(v(\tau))\,d\beta(\tau), \qquad a(0)=v_0. \end{equation} \tag{8.2}
As in Section 4 we will see that, as \varepsilon\to0, the asymptotic behaviour of the distributions solutions of the equation is described by an effective equation. As before, the effective drift is \langle\!\langle P \rangle\!\rangle (a). To calculate the effective dispersion, as in the proof of Lemma 4.6, we consider the martingale
\begin{equation*} N^{Y,\varepsilon} :=a^\varepsilon(\tau)-\int_0^\tau Y(a^\varepsilon(s),s\varepsilon^{-1})\,ds =v_0+\int_0^\tau\mathcal{B}^{\Lambda}(a^\varepsilon(s);s\varepsilon^{-1})\,d\beta(s), \end{equation*} \notag
where Y is defined in (4.7) and \mathcal{B}^{\Lambda}(a;t)=\Phi_{t\Lambda}\mathcal{B}(\Phi_{-t\Lambda}a). By Itô’s formula, for i,j=1,\dots,n the process
\begin{equation*} N_i^{Y,\varepsilon}(\tau) N_j^{Y,\varepsilon}(\tau) -\int_0^\tau \mathcal{A}_{ij}^\Lambda(a^\varepsilon(s);s\varepsilon^{-1})\,ds, \qquad (\mathcal{A}_{ij}^\Lambda(a;t)) =\mathcal{B}^{\Lambda}(a;t)\mathcal{B}^{\Lambda*}(a;t), \end{equation*} \notag
where \mathcal{B}^{\Lambda*} is the transpose of \mathcal{B}^\Lambda, is also a martingale. By straightforward analogy with Lemma 3.2, the limit
\begin{equation*} \mathcal{A}^0(a) :=\lim_{T\to\infty}\frac{1}{T}\int_0^T\mathcal{A}^{\Lambda}(a;t)\,dt \end{equation*} \notag
exists and belongs to \operatorname{Lip}_{2m_0}(\mathbb{R}^{2n},\operatorname{Mat}(2n\times2n)). Then, by analogy with Lemma 4.3,
\begin{equation*} \mathsf{E} \biggl| \int_0^{\tau}\mathcal{A}^{\Lambda}(a^\varepsilon(s);s\varepsilon^{-1})\,ds -\int_0^\tau\mathcal{A}^0(a^\varepsilon(s))\,ds \biggr| \to0 \qquad\text{as }\ \varepsilon\to0, \end{equation*} \notag
for any \tau\geqslant0. From this we conclude as in Section 4 that now as the effective diffusion we should take \mathcal{A}^0(a), which is a non-negative symmetric matrix. Denoting its principal square root by \mathcal{B}^0(a)=\mathcal{A}^0(a)^{1/2}, as in Section 4 we verify that any limit measure Q_0 as in (2.14) is a solution of the martingale problem for the effective equation
\begin{equation} da(\tau)-{ \langle\!\langle P \rangle\!\rangle }(a(\tau))\,d\tau =\mathcal{B}^0(a(\tau))\,d\beta(\tau), \qquad a(0)=v_0, \end{equation} \tag{8.3}
and so it is a weak solution of this equation. If the noise in (8.1) is additive, then \mathcal{B}^0 is a constant matrix, (8.3) has a unique solution, and considering (8.3) as the (modified) effective equation Theorem 4.7 remains true for solutions of equation (8.2). In particular, the theorem applies to equation (2.5) with general additive random forces (2.4) (but then the effective dispersion matrix is given by a more complicated formula than in Section 4).

Similarly, if the diffusion in (8.1) is non-degenerate, namely,

\begin{equation} | \mathcal{B}(v) \mathcal{B}^*(v) \xi|\geqslant \alpha|\xi| \quad \forall v, \ \ \forall\,\xi\in\mathbb{R}^{2n}, \end{equation} \tag{8.4}
for some \alpha>0, then the matrix \mathcal{B}^{\Lambda}(a, \tau) also satisfies (8.4) for all a and \tau, that is, \langle \mathcal{A}^{\Lambda}(a;s\varepsilon^{-1})\xi,\xi\rangle\geqslant \alpha |\xi|^2. Thus \mathcal{A}^0(a)\geqslant\alpha\mathbb{I}, and so \mathcal{B}^0(a)=\mathcal{A}^0(a)^{1/2} is a locally Lipschitz matrix function of a (for example, see [24; Theorem 5.2.2]). So (8.3) has a unique solution again, and Theorem 4.7 remains true for equation (8.1) (and the effective equation of the form (8.3)).

To treat equations (8.1) with degenerate non-additive noises we express the matrix \mathcal{A}^0(a) in the form

\begin{equation*} \mathcal{A}^0(a) =\lim_{T\to\infty}\frac{1}{T} \int_0^T \bigl(\Phi_{t\Lambda}\mathcal{B}(\Phi_{-t\Lambda}a)\bigr) \cdot \bigl(\Phi_{t\Lambda}\mathcal{B}(\Phi_{-t\Lambda}a)\bigr)^* \,dt. \end{equation*} \notag
For the same reason as in Proposition 3.4,
\begin{equation*} |\mathcal{A}^0|_{C^2(B_R)} \leqslant C|\mathcal{B}|_{C^2(B_R)}^2 \quad \forall R>0. \end{equation*} \notag
Now using [24; Theorem 5.2.3] we get that
\begin{equation} \operatorname{Lip}\bigl(\mathcal{B}^0(a)|_{\overline{B}_R}\bigr) \leqslant C|\mathcal{A}^0|_{C^2(B_{R+1})}^{1/2} \leqslant C_1|\mathcal{B}|_{C^2(B_{R+1})} \quad \forall R>0. \end{equation} \tag{8.5}
So the matrix-function \mathcal{B}^0(a) is locally Lipschitz continuous, (8.3) has a unique solution and the assertion of Theorem 4.7 remains true for equation (8.1). We have obtained the following result.

Theorem 8.2. Suppose that Assumption 8.1 holds and one of the following three options is true for the matrix function \mathcal{B}(v) in (8.1):

(a) it is v-independent;

(b) it satisfies the non-degeneracy condition (8.4);

(c) it is a C^2-smooth matrix-function of v.

Then for any v_0\in \mathbb{R}^{2n} the solution a^\varepsilon(\tau;v_0) of equation (8.2) satisfies

\begin{equation*} \mathcal{D}(a^\varepsilon(\,{\cdot}\,;v_0))\rightharpoonup Q_0 \quad\textit{in }\ \mathcal{P}(C([0,T];\mathbb{C}^n)) \quad\textit{as }\ \varepsilon\to0, \end{equation*} \notag
where Q_0 is the law of the unique weak solution of effective equation (8.3).

An obvious analogue of Corollary 4.13 holds for solutions of (8.1).

9. A sufficient condition for Assumptions 2.1, 5.1, and 7.1

In this section we derive a condition which implies Assumptions 2.1, 5.1, and 7.1. Thus, when it is met, all theorems in sections 4, 5, and 7 apply to equation (2.6).

Consider a stochastic differential equation on \mathbb{R}^l:

\begin{equation} dx =b(x)\,d\tau+\sigma(x)\,d\beta(\tau), \qquad x\in\mathbb{R}^l, \quad\tau\geqslant0, \end{equation} \tag{9.1}
where \sigma(x) is an l\times k matrix and \beta(\tau) is a standard Wiener processes in \mathbb{R}^k. We assume the following.

Assumption 9.1. The drift b(x) and dispersion \sigma(x) are locally Lipschitz in x, and \mathcal{C}^m(b),\mathcal{C}^m(\sigma)\leqslant C<\infty for some m\geqslant0.

The diffusion a(x)=\sigma(x)\sigma^\top(x) is a non-negative symmetric l\times l matrix. Consider the differential operator

\begin{equation*} \mathscr{L}(v(x)) =\sum_{j=1}^lb_j(x)\frac{\partial v}{\partial x_j} +\frac{1}{2} \sum_{i=1}^l \sum_{j=1}^la_{ij}(x) \frac{\partial^2 v}{\partial x_i\,\partial x_j}. \end{equation*} \notag
We have the following result from [17; Theorem 3.5] concerning the well-posedness of equation (9.1).

Theorem 9.2. Let Assumption 9.1 hold, and suppose that there exists a non-negative function V(x)\in C^2(\mathbb{R}^l) such that for some positive constant c

\begin{equation*} \mathscr{L}(V(x))\leqslant cV(x) \quad \forall\,\tau\geqslant0, \ \ \forall\,x\in\mathbb{R}^l, \end{equation*} \notag
and
\begin{equation*} \inf_{|x|>R}V(x)\to\infty \quad\textit{as }\ R\to\infty. \end{equation*} \notag
Then for any x_0 \in\mathbb{R}^l equation (9.1) has a unique strong solution X(\tau) with initial condition X(0)=x_0. Furthermore, the process X(\tau) satisfies
\begin{equation*} \mathsf{E}V(X(\tau))\leqslant e^{c\tau} V(x_0) \quad \forall\,\tau\geqslant0. \end{equation*} \notag

The function V is called a Lyapunov function for equation (9.1). In terms of it a sufficient condition for mixing in (9.1) is given by the following statement.

Proposition 9.3. Assume that, in addition to Assumption 9.1,

(1) the drift b satisfies

\begin{equation} \langle b(x), x\rangle\leqslant-{\alpha_1}|x|+{\alpha_2} \quad \forall\,x\in\mathbb{R}^l, \end{equation} \tag{9.2}
for some constants {\alpha_1}>0 and {\alpha_2}\geqslant0, where \langle\cdot,\cdot\rangle is the standard inner product in \mathbb{R}^l;

(2) the diffusion matrix a(x)=\sigma(x)\sigma^\top(x) is uniformly non-degenerate, that is,

\begin{equation} \gamma_2\mathbb{I} \leqslant a(x)\leqslant \gamma_1 \mathbb{I} \quad \forall\,x\in\mathbb{R}^l, \end{equation} \tag{9.3}
for some \gamma_1\geqslant\gamma_2>0.

Then for any c'>0 equation (9.1) has a smooth Lyapunov function V(x) which is equal to e^{c'|x|} for |x|\geqslant1, estimate (5.1) holds true for its solutions for every m\in\mathbb{N}, and the equation is mixing.

In Appendix A we show how one can derive this proposition from the abstract results in [17]. Moreover, it can be proved that under the assumptions of the proposition the equation is exponentially mixing and (7.3) holds (see Example 7.3).

Let us decomplexify \mathbb{C}^n to obtain \mathbb{R}^{2n} and identify equation (2.6) with a real equation (9.1), where l=2n (and x=v). Then

\begin{equation*} b(v) \cong(b_j(v)=-i\varepsilon^{-1}\lambda_jv_j+P_j(v),\:j=1,\dots,n), \end{equation*} \notag
where b_j\in\mathbb{C}\cong \mathbb{R}^2 \subset \mathbb{R}^{2n}. Since in complex terms the real inner product has the form \displaystyle\langle v,w\rangle=\operatorname{Re}\sum v_j\bar w_j, we have
\begin{equation*} \langle b(v),v\rangle =\langle P(v),v\rangle. \end{equation*} \notag
So for equation (2.6) condition (9.2) is equivalent to
\begin{equation} \langle P(v),v\rangle \leqslant-\alpha_1|v|+\alpha_2 \quad \forall\,v\in\mathbb{C}^n \end{equation} \tag{9.4}
for some positive constant \alpha_1>0 and non-negative constant \alpha_2\geqslant0.

Now consider effective equation (4.5). Since in (2.11) the drift is

\begin{equation*} Y(a,\tau\varepsilon^{-1}) = (\Phi_{\tau\varepsilon^{-1}\Lambda})_*P(a), \end{equation*} \notag
under the assumption (9.4) we have
\begin{equation*} \langle Y(a,\tau\varepsilon^{-1}),a\rangle = \langle P(\Phi_{\tau\varepsilon^{-1}\Lambda}a), \Phi_{\tau\varepsilon^{-1}\Lambda}a\rangle \leqslant -\alpha_1 |\Phi_{-\tau\varepsilon^{-1}\Lambda}a|+\alpha_2 =-\alpha_1|a|+\alpha_2 \end{equation*} \notag
for all \varepsilon. Therefore, \langle\!\langle P \rangle\!\rangle satisfies
\begin{equation*} \langle { \langle\!\langle P \rangle\!\rangle } (a),a\rangle =\lim_{T\to\infty}\frac{1}{T}\int_0^T \langle Y(a,\tau\varepsilon^{-1}),a\rangle\,d\tau \leqslant-\alpha_1|a|+\alpha_2. \end{equation*} \notag
We see that assumption (9.4) implies the validity of condition (9.2) also for the effective equation.

As we have pointed out, if the dispersion matrix \Psi is non-singular, then the dispersion B in the effective equation is also non-degenerate. The corresponding diffusion matrix is non-singular too, and condition (9.3) holds for it. Thus we obtained the following statement.

Proposition 9.4. If the dispersion matrix \Psi in (2.6) is non-singular, the drift satisfies P\in\operatorname{Lip}_{m_0} for some m_0\in\mathbb{N}, and (9.4) holds for some constants {\alpha_1}>0 and {\alpha_2}\geqslant0, then the assumptions of Theorem 5.5 hold, and so do also the assumptions of Theorems 4.7 and 7.4.

Appendix A. Proof of Proposition 9.3

By condition (9.3) the diffusion a is uniformly bounded. So there exist positive constants k_1 and k_2 such that

\begin{equation} \operatorname{Tr}(a(x))\leqslant k_1, \quad \|a(x)\|\leqslant k_2 \quad \forall\,x\in\mathbb{R}^l. \end{equation} \tag{A.1}
Set V(x)=e^{c'f(x)}, where c' is a positive constant and f(x) is a non-negative smooth function which is equal to |x| for |x|\geqslant1 and such that its first and second derivatives are bounded by 3. Then
\begin{equation*} \frac{\partial V(x)}{\partial x_j}={c'} V(x)\partial_{x_j}f(x), \quad \frac{\partial^2V(x)}{\partial x_j\,\partial x_j} =c'V(x)\,\partial_{x_ix_j}f(x)+{c'}^2V(x)\,\partial_{x_i}f(x)\,\partial_{x_j}f(x). \end{equation*} \notag
Therefore, we have
\begin{equation*} \mathscr{L}(V(x)) =c'V(x) \mathcal{K}(c',x), \end{equation*} \notag
where
\begin{equation*} \mathcal{K}(c',x) =\sum_{j=1}^lb_j(x)\,\partial_{x_j}f(x) +\frac{1}{2}\sum_{i,j}\!a_{ij}(x)\,\partial_{x_ix_j}f(x) +\frac{1}{2}c'\sum_{i,j}\!a_{ij}(x)\,\partial_{x_i}f(x)\,\partial_{x_j}f(x). \end{equation*} \notag
From (9.2) and (A.1) it is obvious that
\begin{equation} \begin{cases} |\mathcal{K}(c',x)|\leqslant (c'+1)C & \text{if }|x|<1, \\\displaystyle \mathcal{K}(c',x) \leqslant-\alpha_1+\frac{\alpha_2}{|x|}+\frac{C}{|x|}+c'C & \text{if }|x|\geqslant1, \end{cases} \end{equation} \tag{A.2}
where C>0 is a constant depending on k_1, k_2, and \sup_{|x|\leqslant1}|b(x)|. Then we obtain the inequality
\begin{equation*} \mathscr{L}(V(x)) \leqslant cV(x) \quad \forall\,x\in\mathbb{R}^l, \end{equation*} \notag
where c=c'(\alpha_2+(c'+1)C). Clearly, \inf_{|x|>R}V(x)\to\infty as R\to\infty. So V(x) is a Lyapunov function for equation (9.1). Then by Theorem 9.2, for any x_0\in \mathbb{R}^l this equation has a unique solution X(\tau)=X(\tau;x_0) equal to x_0 at \tau=0, which satisfies
\begin{equation*} \mathsf{E}e^{c'f(X(\tau))} \leqslant e^{c\tau} e^{c'f(x_0)} \quad\forall\,\tau\geqslant0. \end{equation*} \notag

Let us apply Itô’s formula to the process F(X(\tau))=e^{\eta'f(X(\tau))}, where 0<\eta'\leqslant c'/2 is a constant to be determined below. Then

\begin{equation*} \begin{aligned} \, dF(X) & =\mathscr{L}(F(X))\,d\tau +\eta' F(X)\langle\nabla f(x),\sigma^\top(X)\,dW\rangle\\ & =\eta'F(X)\mathcal{K}(\eta',X)\,d\tau +\eta'F(X)\langle\nabla f(x),\sigma^\top(X)\,dW\rangle. \end{aligned} \end{equation*} \notag
By (A.2), choosing \eta'=\min\{\alpha_1/(4C),c'/2\}, we have
\begin{equation*} F(X)\mathcal{K}(\eta',X) \leqslant -\frac{\alpha_1}{2}F(X)+C_0(\alpha_1,\eta',k_1,k_2) \end{equation*} \notag
uniformly in X. Then
\begin{equation} dF(X) \leqslant\biggl(-\frac{\alpha_1}{2}\eta'F(X)+C_0\biggr)\,d\tau +\eta'F(X)\langle\nabla f(x),\sigma^\top(X)\,dW\rangle, \end{equation} \tag{A.3}
where the positive constant C_0 depends on k_1, k_2, \alpha_1, \eta', and \alpha_2. Taking expectation and applying Gronwall’s lemma we obtain
\begin{equation} \mathsf{E}e^{\eta'f(X(\tau))} \leqslant e^{-\alpha_1\eta'\tau/2} e^{\eta'f(x_0)}+C_1, \qquad \tau\geqslant0, \end{equation} \tag{A.4}
where C_1>0 depends on the same parameters as C_0.

Now fix some T\geqslant0 and for \tau\in[T,T+1] consider relation (A.3), where F(X) is replaced by \widetilde{F}(X)=e^{\widetilde{\eta} f(X)} for 0<\widetilde{\eta} \leqslant\eta'/2, and integrate it from T to \tau:

\begin{equation} \begin{aligned} \, \widetilde{F}(X(\tau)) & \leqslant \widetilde{F}(X(T))+C_0 +\widetilde{\eta} \int_T^\tau \widetilde{F}(X) \langle \nabla f(x),\sigma^\top(s, X)\,dW\rangle \notag \\ & =:\widetilde{F}(X(T))+C_0+\mathcal{M}(\tau). \end{aligned} \end{equation} \tag{A.5}
In view of (A.4), \mathcal{M}(\tau) is a continuous square-integrable martingale. Therefore, by Doob’s inequality
\begin{equation*} \begin{aligned} \, \mathsf{E}\sup_{T\leqslant \tau\leqslant T+1} |\mathcal{M}(\tau)|^2 & \leqslant 4 \mathsf{E} |\mathcal{M}(T+1)|^2 \leqslant C \int_T^{T+1} \mathsf{E}\widetilde{F}^2 (X(s))\,ds\\ & \leqslant C \int_T^{T+1} \mathsf{E}F(X(s))\,ds \leqslant C', \end{aligned} \end{equation*} \notag
where C' depends on k_1, k_2, \alpha_1, \eta', \alpha_2, and |x_0|. From this and inequalities (A.5) and (A.4) it follows that
\begin{equation*} \mathsf{E} \sup_{T\leqslant \tau\leqslant T+1} e^{\widetilde{\eta} f(X(\tau))} \leqslant C'', \end{equation*} \notag
where C'' depends on the same parameters as C'. This bound implies that the solutions X(\tau) satisfy estimate (5.1) in Assumption 5.1 for every m\geqslant0.

To prove the proposition it remains to show that, under the assumptions imposed, equation (9.1) is mixing. By [17; Theorem 4.3] we just need to verify that there exists an absorbing ball B_R=\{|x|\leqslant R\} such that for any compact set K\subset\mathbb{R}^l\setminus B_{R},

\begin{equation} \sup_{x_0\in K}\mathsf{E}\tau(x_0)<\infty, \end{equation} \tag{A.6}
where \tau(x_0) is the hitting time of B_R by the trajectory X(\tau;x_0). Indeed, let x_0\in K\subset \mathbb{R}^l\setminus B_{R} for some R>0 to be determined later. We set \tau_M:=\min\{\tau(x_0), M\}, M>0. Applying Itô’s formula to the process F(\tau,X(\tau))=e^{\eta'\alpha_1\tau/4}|X(\tau)|^2 and using (A.4) we find that
\begin{equation*} dF(\tau, X(\tau)) =\biggl(\frac{\eta'\alpha_1}{4}F(\tau,X(\tau)) +\mathscr{L}(F(\tau,X(\tau)))\biggr)\,d\tau +d\mathcal{M}(\tau), \end{equation*} \notag
where \mathcal{M}(\tau) is the corresponding stochastic integral. By (A.1), (A.4), and (9.2) we have
\begin{equation*} \begin{aligned} \, \mathsf{E}e^{\eta'\alpha_1\tau_M/4}|X(\tau_M)|^2 +\mathsf{E}\int_0^{\tau_M}e^{\eta'\alpha_1 s/4}&(2\alpha_1|X(\tau)|-C_3)\,ds \\ &\leqslant |x_0|^2+2e^{\eta'f(x_0)} =:\gamma(x_0), \end{aligned} \end{equation*} \notag
where C_3>0 depends on \alpha_1, \alpha_2, k_1, and k_2. Since |X(s)|\geqslant R for 0\leqslant s\leqslant \tau_M, we get that
\begin{equation*} \mathsf{E}\biggl(C_3 \int_0^{\tau_M}e^{\eta'\alpha_1 s/4}\,ds\biggr) \leqslant \gamma(x_0) \end{equation*} \notag
for R\geqslant {C_3/\alpha_1}. Therefore, \mathsf{E}\tau_M \leqslant\gamma(x_0)/C_3. Letting M\to\infty we verify (A.6) for R\geqslant C_3/\alpha_1. This completes the proof of Proposition 9.3.

Appendix B. Representation of martingales

Let \{M_k(t),\,t\in[0,T]\}, k=1,\dots,d, be continuous square-integrable martingales on a filtered probability space (\Omega,\mathcal{F},\mathsf{P},\{\mathcal{F}_t\}). We recall that their brackets (or their cross-variational process) is an \{\mathcal{F}_t\}-adapted continuous matrix-valued process of bounded variation \langle M_k,M_j\rangle(t), 1\leqslant k,j\leqslant d, vanishing at t=0 almost surely and such that for all k, j the process M_k(t) M_j(t)-\langle M_k,M_j\rangle(t) is an \{\mathcal{F}_t\}-martingale; see [14; Definition 1.5.5 and Theorem 1.5.13].

Theorem B.1 [14; Theorem 3.4.2]. Let (M_k(t), 1\leqslant k\leqslant d) be a vector of martingales as above. Then there exists an extension (\widetilde{\Omega},\widetilde{\mathcal{F}},\widetilde{\mathsf{P}},\{\widetilde{\mathcal{F}}_t\}) of the probability space on which independent standard Wiener processes W_1(t),\dots,W_d(t) are defined, and there exists a measurable adapted matrix X=(X_{k j}(t))_{k,j=1,\dots,d}, t\in[0,T], such that \mathsf{E}\int_0^T\|X(s)\|^2\,ds<\infty and the following representations hold \widetilde{\mathsf{P}}-almost surely:

\begin{equation*} M_k(t)-M_k(0) =\sum_{j=1}^d\int_0^tX_{kj}(s)\,dW_j(s), \qquad 1\leqslant k\leqslant d, \quad t\in[0,T], \end{equation*} \notag
and
\begin{equation*} \langle M_k,M_j\rangle(t) =\sum_{l=1}^d\int_0^tX_{kl}(s)X_{jl}(s)\,ds, \qquad 1\leqslant k,j\leqslant d, \quad t\in[0,T]. \end{equation*} \notag

Now let (N_1(t),\dots, N_d(t))\in\mathbb{C}^d be a vector of complex continuous square-integrable martingales. Then

\begin{equation*} N_j(t)=N^+_j(t)+i N^-_j(t), \end{equation*} \notag
where \bigl(N^+_1(t),N^-_1(t),\dots, N^+_d(t),N^-_d(t) \bigr)\in\mathbb{R}^{2d} is a vector of real continuous martingales. The brackets \langle N_i, N_j\rangle and \langle N_i, \overline{N}_j\rangle are defined by linearity. For example,
\begin{equation*} \langle N_i, N_j\rangle = \langle N_i^+, N_j^+\rangle - \langle N_i^-, N_j^-\rangle +i \langle N_i^+, N_j^-\rangle +i \langle N_i^-, N_j^+\rangle. \end{equation*} \notag
(There is no need to define the brackets \langle \overline{N}_i, \overline{N}_j\rangle and \langle \overline{N}_i, N_j\rangle since these are just the processes complex conjugate to \langle N_i, N_j\rangle and \langle N_i, \overline{N}_j\rangle, respectively.) Equivalently, \langle N_i, N_j\rangle can be defined as the unique adapted continuous complex process of bounded variation that vanishes at zero, such that N_i N_j-\langle N_i, N_j\rangle is a martingale. The brackets \langle N_i, \overline{N}_j\rangle can be defined similarly. The above result implies a representation theorem for complex continuous martingales. Below we present a special case of it which is relevant for our work.

Corollary B.2. Suppose that all brackets \langle N_i, N_j\rangle(t) and \langle \overline{N}_i, \overline{N}_j\rangle(t) vanish, while the brackets \langle N_i, \overline{N}_j\rangle(t), 1\leqslant i,j\leqslant d, are almost surely absolutely continuous complex processes. Then there exist an adapted process \Psi(t) taking values in complex d\times d matrices and satisfying \displaystyle\mathsf{E} \int_0^T\|\Psi(t)\|^2 \,dt<\infty and independent standard complex Wiener processes \beta^c_1(t),\dots,\beta^c_d(t), all of which are defined on an extension of the original probability space, such that

\begin{equation*} N_j(t)-N_j(0) =\sum_{k=1}^d\int_0^t\Psi_{jk}(s)\,d\beta^c_k(s) \quad \forall\,0\leqslant t \leqslant T, \quad j=1,\dots,d, \end{equation*} \notag
almost surely. Moreover, \langle N_i, N_j\rangle(t) \equiv 0 and
\begin{equation*} \langle N_i, \overline{N}_j\rangle(t)=2 \int_0^t (\Psi \Psi^*)_{ij} (s) \,ds, \quad 1\leqslant i,j\leqslant d. \end{equation*} \notag

Appendix C. Itô’s formula for complex processes

Consider a complex Itô process v(t)\in\mathbb{C}^n defined on a filtered probability space:

\begin{equation} dv(t) =g(t)\,dt+M^1(t)\,dB(t) +M^2(t) \,d\overline{B}(t). \end{equation} \tag{C.1}
Here v(t) and g(t) are adapted processes in \mathbb{C}^n, M^1 and M^2 are adapted processes in the space of complex n\times N matrices, B(t)=(\beta^c_1(t),\dots,\beta^c_N(t)), and \overline{B}(t)=(\bar\beta^c_1(t),\dots,\bar\beta^c_N(t)), where \{\beta^c_j\} are independent standard complex Wiener processes. We recall that, given a C^1-smooth function f on \mathbb{C}^n, we have
\begin{equation*} \frac{\partial f}{\partial z_j} =\frac12 \biggl(\frac{\partial f}{\partial x_j}-i\,\frac{\partial f}{\partial y_j}\biggr) \quad\text{and}\quad \frac{\partial f}{\partial \bar{z}_j} =\frac12 \biggl(\frac{\partial f}{\partial x_j}+i\,\frac{\partial f}{\partial y_j}\biggr). \end{equation*} \notag
If f is a polynomial in z_j and \bar{z}_j, then \partial f/\partial z_j and \partial f/\partial \bar{z}_j can be calculated as if z_j and \bar{z}_j were independent variables.

The processes g, M^1, M^2 and the function f(t,v) in the theorem below are assumed to satisfy the usual conditions for the applicability of Itô’s formula (for example, see [14]), which we do not repeat here.

Theorem C.1. Let f(t,v) be a C^2-smooth complex function. Then

\begin{equation} \begin{aligned} \, & df(t,v(t)) =\biggl\{\frac{\partial f}{\partial t} +d_vf(t,v) g+d_{\bar v} f(t,v) \overline{g} \notag\\ &\qquad +\operatorname{Tr}\biggl[ \bigl(M^1(M^2)^\top+M^2(M^1)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial v} +\bigl(\overline{M}^1(\overline{M}^2)^\top+\overline{M}^2(\overline{M}^1)^\top\bigr) \frac{\partial^2 f}{\partial \bar v\,\partial \bar v} \notag\\ &\qquad\qquad +2\bigl(M^1(\overline{M}^1)^\top+M^2(\overline{M}^2)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial \bar v} \biggr] \biggr\} \,dt \notag\\ &\qquad + d_vf(M^1\,dB+M^2\,d\overline{B})+ d_{\bar v}f(\overline{M}^1\,d\overline{B}+\overline{M}^2\,dB). \end{aligned} \end{equation} \tag{C.2}
Here \displaystyle d_vf(t,v) g=\sum\frac{\partial f}{\partial v_j}g_j, \displaystyle d_{\bar v}f(t,v) \overline{g} =\sum\frac{\partial f}{\partial \bar v_j}\overline{g}_j, \displaystyle \frac{\partial^2 f}{\partial v\,\partial v} is the matrix with entries \displaystyle \frac{\partial^2 f}{\partial v_j\,\partial v_k}, and so on. If the function f is real valued, then d_{\bar v}f(v)=\overline{d_vf(v)}, and the Itô term, given by the second and third lines of (C.2), reeds
\begin{equation*} 2\operatorname{Re} \operatorname{Tr} \biggl\{ \bigl(M^1(M^2)^\top+M^2(M^1)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial v} +\bigl(M^1(\overline{M}^1)^\top+M^2(\overline{M}^2)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial \bar v} \biggr\}. \end{equation*} \notag

To prove this result we can express v(t) as an Itô process in \mathbb{R}^{2d} in terms of real Wiener processes \operatorname{Re}W_j(t) and \operatorname{Im}W_j(t), apply the usual Itô formula to f(t,v(t)), and then rewrite the result back in terms of complex Wiener processes. The corresponding straightforward calculation is rather heavy, and it is not easy to do it without mistake. Below we suggest a better way to derive the formula.

Proof. The linear part of formula (C.2), given by its first and fourth lines, follows from the real case by linearity. It remains to prove that the Itô term has the form of the expression in the second and third lines. From the real formula we see that the Itô term is linear in \partial^2 f/\partial v \partial v, \partial^2 f/\partial \bar v \partial \bar v, and \partial^2 f/\partial v \partial \bar v, with coefficients quadratic in the matrices M^1 and M^2. So it can be written as
\begin{equation} \biggl\{ \operatorname{Tr}\biggl(Q^1 \frac{\partial^2 f}{\partial v\,\partial v}\biggr) +\operatorname{Tr}\biggl(Q^2 \frac{\partial^2 f}{\partial \bar v\,\partial \bar v}\biggr) +\operatorname{Tr}\biggl(Q^3 \frac{\partial^2 f}{\partial v\,\partial \bar v}\biggr) \biggr\} \,dt, \end{equation} \tag{C.3}
where the Q^j are complex n\times n matrices quadratic in M^1 and M^2. We must show that they have the form specified in (C.2). To do this we note that, since the processes \beta^c_j are independent and have the form (2.3), for all j and l the brackets of these processes have the following form:
\begin{equation} \langle \beta^c_j, \beta^c_l \rangle=\langle \bar\beta^c_j, \bar\beta^c_l \rangle=0, \qquad \langle \beta^c_j, \bar\beta^c_l \rangle=\langle \bar\beta^c_j, \beta^c_l \rangle=2 \delta_{j,l} t. \end{equation} \tag{C.4}
Now let g=0 and v(0)=0 in (C.1) and let M^1 and M^2 be constant matrices. Then
\begin{equation*} v(t)=M^1 B(t)+M^2 \overline{B}(t). \end{equation*} \notag
Taking f(v)=v_{i_1} v_{i_2} and using (C.4) we see that
\begin{equation*} \begin{aligned} \, f(v(t)) & =\biggl(\sum_j M^1_{i_1 j} B_j(t)+\sum_j M^2_{i_1 j} \overline{B}_j(t)\biggr) \cdot \biggl(\sum_j M^1_{i_2 j} B_j(t)+\sum_j M^2_{i_2 j} \overline{B}_j(t)\biggr) \\ & =\bigl[ \big(M^1(M^2)^\top\big)_{i_1 i_2} +\big(M^2 (M^1)^\top\big)_{i_1 i_2} \bigr] \,2t+\text{(a martingale)}. \end{aligned} \end{equation*} \notag
Since the linear component with respect to t must be equal to (Q_{i_1i_2}+Q_{i_2 i_1})t by (C.3), we have
\begin{equation*} Q^1 =M^1 (M^2)^\top+M^2 (M^1)^\top. \end{equation*} \notag
In a similar way, considering f(v)=\bar v_{i_1} \bar v_{i_2} we find that
\begin{equation*} Q^2 =\overline{M}^1 (\overline{M}^2)^\top+\overline{M}^2 (\overline{M}^1)^\top, \end{equation*} \notag
while setting f(v)=v_{i_1} \bar v_{i_2} leads to the equality
\begin{equation*} 2\bigl[\bigl(M^1 (\overline{M}^1)^\top\bigr)_{i_1 i_2}+\bigl(M^2(\overline{M}^2)^\top\bigr)_{i_1 i_2}\bigr]=Q^3_{i_1 i_2}, \end{equation*} \notag
so that
\begin{equation*} Q^3 =2\bigl[M^1(\overline{M}^1)^\top+M^2(\overline{M}^2)^\top\bigr]. \end{equation*} \notag
This completes the proof of (C.2). The second assertion of the theorem follows by straightforward calculation. Theorem C.1 is proved.

Appendix D. Projections onto convex sets

Lemma D.1. Let \mathcal{B} be a closed convex subset of a Hilbert space X of finite or infinite dimension. Assume that \mathcal{B} contains at least two points, and let \Pi\colon X\to \mathcal{B} be a projection sending any point of X to a nearest point in \mathcal{B}. Then \operatorname{Lip}(\Pi)=1.

Proof. Let A,B\in X, and let a=\Pi A and b=\Pi B belong to \mathcal{B}. If A,B\in\mathcal{B}, then a=A and B=b. So \operatorname{Lip}(\Pi)\geqslant 1, and it remains to show that
\begin{equation*} \|a-b\|\leqslant\|A-B\| \quad \forall\, A\ne B. \end{equation*} \notag
If a=b, then the assertion is trivial. Otherwise consider the vectors \xi=b-a, l^a=A-a, and l^b= B-b, and introduce an orthonormal basis (e_1,e_2,\dots) in X such that e_1 =\xi/|\xi|. Then \xi=(\xi_1,\xi_2,\dots), where \xi_1 =|\xi| and \xi_j=0 for j\geqslant2. Since a is a point in [a,b]\subset X closest to A, we have l^a_1=l^a\cdot e_1 \leqslant0. Similarly, l^b_1 \geqslant0. Thus,
\begin{equation*} \|B-A\| =\|\xi+l^b-l^a\| \geqslant|\xi_1+l_1^b-l_1^a| \geqslant \xi_1 =\| b-a\|, \end{equation*} \notag
and the assertion is proved.

Note that an analogue of the statement of this lemma for a Banach space X fails in general.

References

1. В. И. Арнольд, В. В. Козлов, А. И. Нейштадт, Математические аспекты классической и небесной механики, 2-е изд., перераб. и доп., Едиториал УРСС, М., 2002, 416 с.; англ. пер.: V. I. Arnold, V. V. Kozlov, A. I. Neishtadt, Mathematical aspects of classical and celestial mechanics, Encyclopaedia Math. Sci., 3, Dynamical systems. III, 3rd rev. ed., Springer-Verlag, Berlin, 2006, xiv+518 с.  crossref  mathscinet  zmath
2. П. Биллингсли, Сходимость вероятностных мер, Наука, М., 1977, 351 с.  mathscinet; пер. с англ.: P. Billingsley, Convergence of probability measures, Wiley Ser. Probab. Statist. Probab. Statist., 2nd ed., John Wiley & Sons, Inc., New York, 1999, x+277 с.  crossref  mathscinet  zmath
3. В. И. Богачев, Н. В. Крылов, М. Рёкнер, С. В. Шапошников, Уравнения Фоккера–Планка–Колмогорова, НИЦ “Регулярная и хаотическая динамика”, М.–Ижевск, 2013, 592 с.; англ. пер.: V. I. Bogachev, N. V. Krylov, M. Röckner, S. V. Shaposhnikov, Fokker–Planck–Kolmogorov equations, Math. Surveys Monogr., 207, Amer. Math. Soc., Providence, RI, 2015, xii+479 с.  crossref  mathscinet  zmath
4. Н. Н. Боголюбов, Ю. А. Митропольский, Асимптотические методы в теории нелинейных колебаний, 2-е изд., Физматгиз, М., 1958, 408 с.  mathscinet  zmath; англ. пер.: N. N. Bogoliubov, Yu. A. Mitropolsky, Asymptotic methods in the theory of non-linear oscillations, Int. Monogr. Adv. Math. Phys., Hindustan Publishing Corp., Delhi; Gordon and Breach Science Publishers, Inc., New York, 1961, v+537 с.  mathscinet  zmath
5. A. Boritchev, S. Kuksin, One-dimensional turbulence and the stochastic Burgers equation, Math. Surveys Monogr., 255, Amer. Math. Soc., Providence, RI, 2021, vii+192 pp.  mathscinet  zmath
6. В. Ш. Бурд, Метод усреднения на бесконечном промежутке и некоторые задачи теории колебаний, ЯрГУ, Ярославль, 2013, 416 с.
7. Jinqiao Duan, Wei Wang, Effective dynamics of stochastic partial differential equations, Elsevier, Amsterdam, 2014, xii+270 pp.  crossref  mathscinet  zmath
8. R. M. Dudley, Real analysis and probability, Cambridge Stud. Adv. Math., 74, 2nd ed., Cambridge Univ. Press, Cambridge, 2002, x+555 pp.  crossref  mathscinet  zmath
9. A. Dymov, “Nonequilibrium statistical mechanics of weakly stochastically perturbed system of oscillators”, Ann. Henri Poincaré, 17:7 (2016), 1825–1882  crossref  mathscinet  zmath  adsnasa
10. А. Д. Вентцель, М. И. Фрейдлин, Флуктуации в динамических системах под действием малых случайных возмущений, Наука, M., 1979, 424 с.  mathscinet  zmath; англ. пер.: M. I. Freidlin, A. D. Wentzell, Random perturbations of dynamical systems, Grundlehren Math. Wiss., 260, 2nd ed., Springer-Verlag, New York, 1998, xii+430 с.  crossref  mathscinet  zmath
11. G. Huang, S. Kuksin, “On averaging and mixing for stochastic PDEs”, J. Dynam. Differential Equations, 2022, Publ. online  crossref
12. Guan Huang, S. Kuksin, A. Maiocchi, “Time-averaging for weakly nonlinear CGL equations with arbitrary potentials”, Hamiltonian partial differential equations and applications, Fields Inst. Commun., 75, Fields Inst. Res. Math. Sci., Toronto, ON, 2015, 323–349  crossref  mathscinet  zmath
13. В. Иан, С. Б. Куксин, Ю. Ву, “Усреднение Крылова–Боголюбова”, УМН, 75:3(453) (2020), 37–54  mathnet  crossref  mathscinet  zmath; англ. пер.: Wenwen Jian, S. B. Kuksin, Yuan Wu, “Krylov–Bogolyubov averaging”, Russian Math. Surveys, 75:3 (2020), 427–444  crossref  adsnasa
14. I. Karatzas, S. E. Shreve, Brownian motion and stochastic calculus, Grad. Texts in Math., 113, 2nd ed., Springer-Verlag, New York, 2005, xxiii+470 pp.  crossref  mathscinet  zmath
15. Р. З. Хасьминский, “О случайных процессах, определяемых дифференциальными уравнениями с малым параметром”, Теория вероятн. и ее примен., 11:2 (1966), 240–259  mathnet  mathscinet  zmath; англ. пер.: R. Z. Has'minski, “On stochastic processes defined by differential equations with a small parameter”, Theory Probab. Appl., 11:2 (1966), 211–228  crossref
16. Р. З. Хасьминский, “О принципе усреднения для стохастических дифференциальных уравнений Ито”, Kybernetika (Prague), 4:3 (1968), 260–279  mathscinet  zmath
17. Р. З. Хасьминский, Устойчивость систем дифференциальных уравнений при случайных возмущениях их параметров, Наука, М., 1969, 367 с.  mathscinet  zmath; англ. пер.: R. Khasminskii, Stochastic stability of differential equations, Stoch. Model. Appl. Probab., 66, 2nd ed., Springer, Heidelberg, 2012, xviii+339 с.  crossref  mathscinet  zmath
18. Yu. Kifer, Large deviations and adiabatic transitions for dynamical systems and Markov processes in fully coupled averaging, Mem. Amer. Math. Soc., 201, № 944, Amer. Math. Soc., Providence, RI, 2009, viii+129 pp.  crossref  mathscinet  zmath
19. S. Kuksin, A. Maiocchi, “Resonant averaging for small-amplitude solutions of stochastic nonlinear Schrödinger equations”, Proc. Roy. Soc. Edinburgh Sect. A, 148:2 (2018), 357–394  crossref  mathscinet  zmath
20. A. Kulik, Ergodic behavior of Markov processes. With applications to limit theorems, De Gruyter Stud. Math., 67, De Gruyter, Berlin, 2018, x+256 pp.  crossref  mathscinet  zmath
21. Shu-Jun Liu, M. Krstic, Stochastic averaging and stochastic extremum seeking, Comm. Control Engrg. Ser., Springer, London, 2012, xii+224 pp.  crossref  mathscinet  zmath
22. J. C. Mattingly, A. M. Stuart, D. J. Higham, “Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise”, Stochastic Process. Appl., 101:2 (2002), 185–232  crossref  mathscinet  zmath
23. А. В. Скороход, Асимптотические методы теории стохастических дифференциальных уравнений, Наук. думка, Киев, 1987, 328 с.  mathscinet  zmath; англ. пер.: A. V. Skorokhod, Asymptotic methods in the theory of stochastic differential equations, Transl. Math. Monogr., 78, Amer. Math. Soc., Providence, RI, 1989, xvi+339 с.  crossref  mathscinet  zmath
24. D. W. Stroock, S. R. S. Varadhan, Multidimensional diffusion processes, Grundlehren Math. Wiss., 233, Springer-Verlag, Berlin–New York, 1979, xii+338 pp.  mathscinet  zmath
25. А. Ю. Веретенников, “О принципе усреднения для систем стохастических дифференциальных уравнений”, Матем. сб., 181:2 (1990), 256–268  mathnet  mathscinet  zmath; англ. пер.: A. Yu. Veretennikov, “On the averaging principle for systems of stochastic differential equations”, Math. USSR-Sb., 69:1 (1991), 271–284  crossref  adsnasa
26. C. Villani, Optimal transport. Old and new, Grundlehren Math. Wiss., 338, Springer-Verlag, Berlin, 2009, xxii+973 pp.  crossref  mathscinet  zmath
27. H. Whitney, “Differentiable even functions”, Duke Math. J., 10 (1943), 159–160  crossref  mathscinet  zmath; Collected papers, v. 1, Contemp. Mathematicians, Birkhäuser Boston, Inc., Boston, MA, 1992, 309–310  crossref  mathscinet  zmath

Образец цитирования: G. Huang, S. B. Kuksin, “Averaging and mixing for stochastic perturbations of linear conservative systems”, УМН, 78:4(472) (2023), 3–52; Russian Math. Surveys, 78:4 (2023), 585–633
Цитирование в формате AMSBIB
\RBibitem{HuaKuk23}
\by G.~Huang, S.~B.~Kuksin
\paper Averaging and mixing for stochastic perturbations of linear conservative systems
\jour УМН
\yr 2023
\vol 78
\issue 4(472)
\pages 3--52
\mathnet{http://mi.mathnet.ru/rm10081}
\crossref{https://doi.org/10.4213/rm10081}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4687807}
\zmath{https://zbmath.org/?q=an:1533.34050}
\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2023RuMaS..78..585H}
\transl
\jour Russian Math. Surveys
\yr 2023
\vol 78
\issue 4
\pages 585--633
\crossref{https://doi.org/10.4213/rm10081e}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001146060800001}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85174008893}
Образцы ссылок на эту страницу:
  • https://www.mathnet.ru/rus/rm10081
  • https://doi.org/10.4213/rm10081
  • https://www.mathnet.ru/rus/rm/v78/i4/p3
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Успехи математических наук Russian Mathematical Surveys
    Статистика просмотров:
    Страница аннотации:787
    PDF русской версии:37
    PDF английской версии:92
    HTML русской версии:222
    HTML английской версии:330
    Список литературы:82
    Первая страница:23
     
      Обратная связь:
     Пользовательское соглашение  Регистрация посетителей портала  Логотипы © Математический институт им. В. А. Стеклова РАН, 2025