Loading [MathJax]/jax/element/mml/optable/SuppMathOperators.js
Sbornik: Mathematics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Forthcoming papers
Archive
Impact factor
Guidelines for authors
License agreement
Submit a manuscript

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Mat. Sb.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Sbornik: Mathematics, 2024, Volume 215, Issue 4, Pages 543–571
DOI: https://doi.org/10.4213/sm9958e
(Mi sm9958)
 

This article is cited in 2 scientific papers (total in 2 papers)

Widths and rigidity

Yu. V. Malykhinab

a Steklov Mathematical Institute of Russian Academy of Sciences, Moscow, Russia
b Lomonosov Moscow State University, Moscow, Russia
References:
Abstract: We consider the Kolmogorov widths of finite sets of functions. Any orthonormal system of N functions in L2 is rigid, that is, it cannot be well approximated by linear subspaces of dimension essentially smaller than N. This is not true for weaker metrics: it is known that in every Lp for p<2 the first N Walsh functions can be o(1)-approximated by a linear space of dimension o(N).
We present some sufficient conditions for rigidity. We prove that the independence of functions (in the probabilistic meaning) implies rigidity in L1 and even in L0, the metric that corresponds to convergence in measure. In the case of Lp for 1<p<2 the condition is weaker: any Sp-system is Lp-rigid.
Also we obtain some positive results, for example, that the first N trigonometric functions can be approximated by very low-dimensional spaces in L0, and by subspaces generated by o(N) harmonics in Lp for p<1.
Bibliography: 34 titles.
Keywords: Kolmogorov width, averaged width, vc-dimension, matrix rigidity.
Funding agency Grant number
Russian Science Foundation 22-11-00129
This research was performed at Lomonosov Moscow State University and supported by the Russian Science Foundation under grant no. 22-11-00129, https://rscf.ru/en/project/22-11-00129/.
Received: 30.05.2023 and 29.12.2023
Bibliographic databases:
Document Type: Article
MSC: Primary 41A46; Secondary 46B20, 60A10
Language: English
Original paper language: Russian

§ 1. Introduction

In this paper we consider the Kolmogorov widths of finite sets of functions. Recall that the Kolmogorov width dn(K,X) is defined as the minimum distance supxKρ(x,Qn)X from the set K to n-dimensional subspaces Qn of the space X. This classical notion goes back to the paper [1] by Kolmogorov (1936). Widths are fundamental approximation characteristics of sets, and the behaviour of the sequence (dn(K,X))n=1 for functional classes and other finite-dimensional sets was actively studied by many authors. The books [2], Chs. 13 and 14, and [3] are standard references on the subject; see also a recent survey [4], § 4.3.

We will informally say that a set is rigid if it cannot be well approximated by low-dimensional subspaces. The starting point for us is the well-known result that any orthonormal system of functions φ1,,φN is rigid in L2(0,1) (of course, this holds true in each Euclidean space):

dn({φ1,,φN},L2)=1nN.
(This equality for widths was first used by Stechkin; see [5] for details.)

The situation changes if we consider weaker metrics.

Theorem A [6]. Let w1,w2, be the Walsh system in the Paley numeration. Then for any p[1,2) there exists δ=δ(p)>0 such that the following inequality holds for sufficiently large N:

dn({w1,,wN},Lp[0,1])

This theorem appeared as a corollary of methods introduced in complexity theory for the related problem of matrix rigidity. The rigidity function of a matrix A is defined as the Hamming distance from A to the matrices of prescribed rank:

\begin{equation*} \operatorname{Rig}(A,n) :=\min_{\operatorname{rank} B\leqslant n} \#\{(i,j)\colon A_{i,j}\ne B_{i,j}\}. \end{equation*} \notag
Matrix rigidity was introduced by Valiant in 1977 as a way to lower bounds for linear circuit complexity (for example, the inequality \operatorname{Rig}(A_N,\varepsilon N)\geqslant N^{1+\varepsilon} for some fixed \varepsilon>0 leads to superlinear lower bounds for circuit complexity); see the survey [7] for more details.

It was a surprising result by Alman and Williams [8] that the family of Walsh–Hadamard matrices is not rigid, that is, the rank of these matrices can significantly be reduced by changing a small number of elements. Theorem A is based on their construction.

In this paper we obtain sufficient conditions for systems of functions to be rigid in metrics weaker than L_2.

Average widths and L_1-rigidity

Our lower bounds on widths are based on the approximation properties of random vectors in \mathbb{R}^N. Let \xi_1,\dots,\xi_N be independent random variables such that \mathsf{E}\xi_i=0 and \mathsf{E}|\xi_i| = 1. We prove that for any n-dimensional space Q_n\subset\mathbb{R}^N,

\begin{equation*} \mathsf{E}\rho(\xi,Q_n)_{\ell_1^N} \geqslant c(\varepsilon)N \quad\text{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag
Let us introduce the notation
\begin{equation*} d_n^{\mathrm{avg}}(\xi,X) :=\inf_{\dim Q_n\leqslant n}\mathsf{E}\rho(\xi,Q_n)_{X}; \end{equation*} \notag
we call this quantity the average Kolmogorov width of a random vector \xi. If fact, average widths were originally considered by Voronin and Temirgaliev in [9]. This notion was further studied by Maiorov [10], Creutzig [11] and others. We discuss the notation d_n^\mathrm{avg} and the properties of average widths in § 2.2. There is a simple equality
\begin{equation*} d_n^{\mathrm{avg}}(\xi,\ell_1^N)=Nd_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_1(\Omega)), \end{equation*} \notag
which links the rigidity of finite-dimensional vectors and the rigidity of finite function systems. Note that the averaging on the right-hand side is over a finite set of elements of L_1, that is, we minimize \frac1N\sum_{k=1}^N\rho(\xi_k,Q_n) over the subspaces of L_1.

Thus, we can say that independence implies rigidity.

Theorem 1.1. Let \xi_1,\dots,\xi_N be independent random variables such that \mathsf{E}\xi_i=0 and \mathsf{E}|\xi_i| = 1. Then

\begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_1(\Omega))=N^{-1}d_n^{\mathrm{avg}}(\xi,\ell_1^N) \geqslant c(\varepsilon)>0 \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag

This also provides rigidity for the standard widths, because d_n \geqslant d_n^\mathrm{avg}. A classical example of an independent system is the Rademacher system.

The same result holds if we replace independence by unconditionality:

\begin{equation*} \operatorname{Law}(\xi_1,\dots,\xi_N)=\operatorname{Law}(\pm\xi_1,\dots,\pm\xi_N). \end{equation*} \notag

Approximation in the space of measurable functions

Consider the space L_0(\mathcal X,\mu) of all measurable functions on some measure space (\mathcal X,\mu). Convergence in measure is equivalent to convergence in certain metrics, say, \displaystyle\int_{\mathcal X}\frac{|f-g|}{1+|f-g|}\,d\mu or \sup\{\varepsilon\colon \mu(|f-g|\geqslant\varepsilon)\geqslant\varepsilon\}. We will use the latter. Of course, this metric is weaker than the usual L_p-metrics. The approximation in L_0 is much poorer studied than in L_p but we believe that this subject contains interesting problems.

In § 3.2 we prove that independence implies rigidity even in the case of the L_0-metric.

Theorem 1.2. For any \varepsilon\in(0,1) there exists \delta>0 such that if \xi_1,\dots,\xi_N are independent random variables satisfying \inf_{c\in\mathbb{R}}\|\xi_i-c\|_{L_0}\geqslant\varepsilon, i=1,\dots,N, then

\begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_0)\geqslant\delta \quad\textit{if } n\leqslant \delta N. \end{equation*} \notag

This is also a corollary to a finite-dimensional result. In fact, there is an exponential bound

\begin{equation*} \mathsf{P}(\rho(\xi,Q_n)_{L_0^N} \leqslant \delta)\leqslant 2\exp(-\delta N), \qquad\dim Q_n\leqslant n. \end{equation*} \notag
Let us mention a recent result due to Kashin [12].

Theorem B. There exist absolute constants 0<\gamma_0<1 and c_1,c_2>0 such that for N=1,2,\dots, for any orthonormal basis \Psi=\{\psi_k\}_{k=1}^N in \mathbb{R}^N and any linear subspace Q\subset\mathbb{R}^N, \dim Q\leqslant c_1 N, the following inequality holds

\begin{equation*} \mathsf{P}\biggl\{\varepsilon\in\{-1,1\}^N\colon \rho\biggl(\sum_{k=1}^N\varepsilon_k\psi_k,Q\biggr)_{L_0^N}\leqslant c_2\biggr\} \leqslant \gamma_0^N. \end{equation*} \notag

It is known that lacunary systems \varphi(k_jx) behave like independent variables; see [13], for example. The rigidity property is not an exception. We will prove that lacunary systems \{\varphi(k_1x),\dots,\varphi(k_Nx)\} are rigid in L_0 if \min k_{j+1}/k_j is sufficiently large (see Statement 3.3).

Another source of rigid systems is random matrices. It is well known that an N\times N square matrix \mathcal E consisting of random signs is very rigid with high probability: \operatorname{Rig}(\mathcal E,\delta N)\geqslant \delta N^2. We observe that this is true in weaker L_0-metric:

\begin{equation*} \mathsf{P}\Bigl(\,\min_{\operatorname{rank} B\leqslant\delta N}\|\mathbf{\mathcal E}-B\|_{L_0^{N\times N}}<\delta\Bigr)\leqslant 2\exp(-\delta N^2). \end{equation*} \notag
This gives us an example of an L_0-rigid orthonormal system \{\varphi_1,\dots,\varphi_N\} of functions that are piecewise constant on the intervals ((j-1)/N, j/N).

There is another metric in L_0(\mathcal X,\mu), namely, \mu\{x\colon f(x)\ne g(x)\}. We denote it by L_{\mathrm{H}}; the distance \|f-g\|_{L_{\mathrm{H}}} is an analogue of the Hamming distance. (It is called the ‘indicator distance’ in probability theory.) This metric appears in function theory in the context of ‘correction’ theorems. For example, Luzin’s theorem states that C is dense in L_{\mathrm{H}}, and Menshov’s theorem implies that the set of functions with uniformly convergent Fourier series is dense in L_{\mathrm{H}}. Note that matrix rigidity is a particular case of average Kolmogorov widths in the normalized Hamming distance L_{\mathrm{H}}^N. Let A be an M\times N real matrix and W_A=\{A_1,\dots,A_M\}\subset\mathbb{R}^N be the set of rows of A. Then

\begin{equation*} \operatorname{Rig}(A,n)=MN d_n^{\mathrm{avg}}(W_A,L_{\mathrm{H}}^N). \end{equation*} \notag

Some of our results can be stated in matrix terms; in some cases the matrix language is even more convenient. However, in this paper we prefer to stick to the functional-theoretic language; see also [14].

The S_{p'}-property

In the case 1<p<2 we establish a less restrictive sufficient condition for rigidity.

Theorem 1.3. Let \varphi_1,\dots,\varphi_N be an orthonormal system in a space L_2(\mathcal X,\mu), \mu(\mathcal X)=1. Let 1<p<2, and suppose that \{\varphi_k\} has the S_{p'}-property, that is,

\begin{equation*} \biggl\|\sum_{k=1}^N a_k\varphi_k \biggr\|_{p'} \leqslant B|a| \quad\forall\, a\in\mathbb{R}^N \end{equation*} \notag
for some B\geqslant1. Then this system is rigid in L_p:
\begin{equation*} d_n^{\mathrm{avg}}(\{\varphi_1,\dots,\varphi_N\},L_p)_p \geqslant c(B,\varepsilon,p)>0 \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag

Note that Bourgain’s theorem [15] states that any uniformly bounded orthonormal system has a S_{p'}-subsystem of size N^{2/p'}; this subsystem is rigid in L_p. Hence an o(1)-approximation of a uniformly bounded orthonormal system in L_p requires a dimension of at least n\geqslant N^{2/p'}/2. In particular, in Theorem A we must indeed have polynomial dependence between n and N.

The analogue of Theorem 1.1 also holds true for 1<p<2 (we do not prove it here), but it fails for p>2.

Positive approximation results

Let G be a finite Abelian group and \widehat{G} be the dual group of characters \chi\colon G\to\mathbb C^*.

It follows from [16] that characters are not rigid in L_{\mathrm{H}}:

\begin{equation*} d_n(\widehat{G},L_{\mathrm{H}})\leqslant N^{-1+c\varepsilon} \quad\text{if } n\geqslant N\exp(-c(\varepsilon)\log^{c_2}N), \quad N:=|G|. \end{equation*} \notag
Such a small error of approximation, N^{-1+c\varepsilon}, is natural to see in the context of Valiant’s result on circuit complexity. However, we are interested in approximation in different ‘regimes’. We consider systems generated by specific characters, namely, the Walsh and trigonometric systems. First we show that they admit good approximation by very low-dimensional spaces.

Theorem 1.4. The following inequalities hold:

\begin{equation*} d_n(\{w_1,\dots,w_N\},L_{\mathrm{H}}[0,1])\leqslant \exp(-c\log^{1/4}N) \quad\textit{if } n\geqslant\exp(C\log^{2/3}N) \end{equation*} \notag
and
\begin{equation*} d_n(\{e(kx)\}_{k=-N}^N, L_0(\mathbb T))\leqslant \exp(-c\log^{0.2}N) \quad\textit{if } n\geqslant\exp(C\log^{0.9}N). \end{equation*} \notag

It is known that any harmonic can be approximated by other harmonics in L_p, p<1.

Theorem C [17]. For any 0<p<1, uniformly in M and N\geqslant M,

\begin{equation*} \inf\biggl\|\cos Mx-\sum_{|k|\leqslant N,\,k\ne M}c_ke(kx)\biggr\|_p \asymp (N-M+1)^{1-1/p}. \end{equation*} \notag

We bound the trigonometric widths d_n^{\mathrm{T}} of the first N trigonometric functions, that is, approximate them by trigonometric subspaces Q_n=\{\sum_{k\in \Lambda}c_ke(kx)\}, {|\Lambda|=n}.

Theorem 1.5. For any p\in(0,1) and for sufficiently large N we have

\begin{equation*} d_n^{\mathrm{T}}(\{e(kx)\}_{k=-N}^N,L_p) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N) \end{equation*} \notag
and
\begin{equation*} d_n^{\mathrm{T}}\biggl(\biggl\{e\biggl(\frac{kx}{N}\biggr)\biggr\}_{k\in \mathbb Z_N}, L_p^N\biggr) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N). \end{equation*} \notag

Organization of the paper

In § 2 we provide necessary definitions and some needed facts. In §§ 35 we prove the results mentioned in § 1, as well as some other results. In § 6 we discuss some remaining questions on the subject.

Acknowledgement

The author thanks Konstantin Sergeevich Ryutin, Sergei Vladimirovich Astashkin and Boris Sergeevich Kashin for fruitful discussions.

§ 2. Preliminaries

2.1. Metrics

Let (\mathcal X,\mu) be a measure space, \mu(\mathcal X)<\infty. As usual, we identify functions (or random variables) that differ on a set of zero measure.

We consider classical L_p-spaces for 0<p<\infty:

\begin{equation*} \|f\|_p :=\|f\|_{L_p} :=\biggl(\int_{\mathcal X}|f|^p\,d\mu\biggr)^{1/p}. \end{equation*} \notag

The space L_0(\mathcal X,\mu) consists of all measurable functions. Convergence in measure in L_0 can be metricised by several metrics. Let us describe the Ky–Fan metric that we use. Set

\begin{equation*} \|f\|_{L_0} :=\sup\bigl\{\varepsilon\geqslant0\colon \mu\{x\colon |f(x)|\geqslant\varepsilon\}\geqslant\varepsilon\bigr\}. \end{equation*} \notag
Then \|f-g\|_{L_0} is indeed a metric. We also use the following notation:
\begin{equation*} \|f\|_{L_{\mathrm{H}}} :=\mu \{x\colon f(x)\ne0\}. \end{equation*} \notag
Note that \|f\|_p \geqslant \|f\|_{L_0}^{1+1/p} and \|f\|_{L_{\mathrm{H}}}\geqslant\|f\|_{L_0}.

Let N\in\mathbb N. Then the space \mathbb{R}^N is equipped with the usual norms \ell_p^N corresponding to the counting measure on \{1,\dots,N\}:

\begin{equation*} \|x\|_p :=\|x\|_{\ell_p^N} :=\biggl(\sum_{k=1}^N |x_k|^p\biggr)^{1/p}. \end{equation*} \notag
We denote the unit ball in \ell_p^N by B_p^N. Note that the meaning of \|\cdot\|_p is clear from the context: it is either the norm \|\cdot\|_{L_p}, if the argument is a function, or the norm \|\cdot\|_{\ell_p^N} if the argument is a vector. In the Euclidean case we write |\cdot|.

We let L_p^N denote the L_p-spaces for the normalized (probability) measure on \{1,\dots,N\}. Obviously, \|x\|_{L_p^N} = N^{-1/p}\|x\|_{\ell_p^N}, 0<p\leqslant\infty.

The quantity \|x\|_{L_{\mathrm{H}}^N} is the share of nonzero coordinates of a vector, and the corresponding distance is the normalized Hamming distance. Note that \lim_{p\to+0}\|x\|_p^p = \#\{k\colon x_k\ne0\} = N\|x\|_{L_{\mathrm{H}}^N}.

In the case p=0 we do not work with \ell_N^0, but only with L_0^N:

\begin{equation*} \|x\|_{L_0^N}=\max\bigl\{\varepsilon\colon \#\{i\colon |x_i|\geqslant \varepsilon\} \geqslant \varepsilon N\bigr\}. \end{equation*} \notag
We avoid the notation \|\cdot\|_0.

2.2. Widths

Let X be a linear space equipped with a metric (or a more general distance functional) \rho. Recall that the Kolmogorov width of order n\in\mathbb Z_+ of a set K\subset X is defined by

\begin{equation*} d_n(K,X) :=\inf_{Q_n\subset X} \sup_{x\in K}\rho(x,Q_n)_X, \end{equation*} \notag
where \rho(x,Q)_X :=\inf_{y\in Q} \rho(x,y) and the infimum is taken over the linear subspaces of X of dimension at most n. Usually, X is a normed space and \rho(x,y)=\|x-y\|_X; note that this is not required in Kolmogorov’s original paper, and we will indeed use widths in a more general context.

Average widths

Let X be a linear space with metric and \mu be a Borel measure on X. For p\in(0,\infty) we define the p-averaged Kolmogorov width of \mu in X by

\begin{equation*} d_n^{\mathrm{avg}}(\mu,X)_p :=\inf_{Q_n\subset X} \biggl(\int_X \rho(x,Q_n)_X^p \,d\mu(x)\biggr)^{1/p}. \end{equation*} \notag
If p=1, then we write simply d_n^{\mathrm{avg}}(\mu,X).

From now on we assume that \mu is a probability measure. Equivalently, we can consider the widths of a (Borel) random vector \xi with values in X:

\begin{equation*} d^{\mathrm{avg}}_n(\xi, X)_p :=\inf_{Q_n\subset X} (\mathsf{E} \rho(\xi,Q_n)_X^p)^{1/p}. \end{equation*} \notag
If K is a subset of X and it is clear what a ‘random point in K’ means, then the width d_n^{\mathrm{avg}}(K,X)_p is defined as the width of such a random point. For example, for a finite K we have
\begin{equation*} d^{\mathrm{avg}}_n(\{x_1,\dots,x_N\}, X)_p :=\inf_{Q_n\subset X} \biggl(\frac1N \sum_{i=1}^N \rho(x_i,Q_n)_X^p\biggr)^{1/p}. \end{equation*} \notag

Notation

Let us discuss the new notation d_n^{\mathrm{avg}}(\xi,X)_p. Some authors write d_n^{(\mathrm{a})} instead of d_n^{\mathrm{avg}}. This can be confusing because d_n^{\mathrm{a}} denotes absolute widths (see [18]). Also the use of \mathrm{avg} is standard in average settings in information-based complexity; see [19]. Some authors write d_n^{(\mathrm{a})}(X,\mu), but when we deal with sets and random variables the notation d_n^{\mathrm{avg}}(\xi,X) seems to be more intuitive (the object whose width is considered goes first).

Properties

The most basic inequality that we always have in mind is

\begin{equation*} d_n(K,X) \geqslant d_n^{\mathrm{avg}}(\xi,X)_p\quad\text{if $\mathsf{P}(\xi\in K)=1$.} \end{equation*} \notag
We will use the following inequality, which is a simple consequence of the convexity of the L_p-norm:
\begin{equation} d_{n_1+n_2}^{\mathrm{avg}}(\xi_1+\xi_2,X)_p \leqslant d_{n_1}^{\mathrm{avg}}(\xi_1,X)_p + d_{n_2}^{\mathrm{avg}}(\xi_2,X)_p,\quad 1\leqslant p<\infty. \end{equation} \tag{2}

Statement 2.1. Let \xi_1,\dots,\xi_N\colon\Omega\to\mathbb{R} be some random variables such that \mathsf{E}|\xi_i|^p<\infty for all i. Then for p\in[1,\infty) and n\in\mathbb Z_+ we have

\begin{equation} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_p(\Omega))_p= N^{-1/p}d_n^{\mathrm{avg}}((\xi_1,\dots,\xi_N),\ell_p^N)_p. \end{equation} \tag{3}

Note that in the width on the left-hand side the averaging is over the finite set of ‘points’ \xi_1,\dots,\xi_N in the space L_p(\Omega); and on the right-hand side we see the averaged width of the random vector \xi=(\xi_1,\dots,\xi_N) in \mathbb{R}^N. In the case of discrete \Omega equality (3) is a simple consequence of the equality of the row-rank and the column-rank of a matrix. Let us give the full proof for completeness.

Proof. Consider the error of an approximation \xi_k\approx \eta_k:
\begin{equation} \sum_{k=1}^N \|\xi_k-\eta_k\|_{L_p(\Omega)}^p=\sum_{k=1}^N \mathsf{E}|\xi_k-\eta_k|^p=\mathsf{E} \|\xi-\eta\|_{\ell_p^N}^p. \end{equation} \tag{4}
Note that the random variables \eta_1,\dots,\eta_N lie in a subspace of dimension at most n of L_p(\Omega) if and only if there is a subspace Q\subset\mathbb{R}^N, \dim Q\leqslant n, such that
\begin{equation*} \eta=(\eta_1,\dots,\eta_N)\in Q \quad \text{almost surely}. \end{equation*} \notag
Therefore,
\begin{equation*} \mathsf{E} \|\xi-\eta\|_{\ell_p^N}^p \geqslant \mathsf{E}\rho(\xi,Q)_{\ell_p^N}^p \geqslant d_n^{\mathrm{avg}}(\xi,\ell_p^N)_p^p. \end{equation*} \notag
This gives us ‘\geqslant’ inequality in (3).

To prove the reverse inequality we have to construct \eta from an (almost) optimal subspace Q\subset\mathbb{R}^N. Note that for any \varepsilon>0 there is a Borel-measurable (for example, piecewise-constant) map \pi\colon\mathbb{R}^N\to Q, such that

\begin{equation*} \|x-\pi(x)\|_{\ell_p^N} \leqslant \rho(x,Q)_{\ell_p^N}+\varepsilon, \qquad x\in\mathbb{R}^N. \end{equation*} \notag
Take \eta := \pi(\xi); this is a random vector because \pi is Borel. Moreover, \eta\in Q almost surely, so \eta_1,\dots,\eta_N span an at most n-dimensional space in L_p(\Omega). It remains to substitute x=\xi into the previous inequality, take the p-expectation and use (4) to finish the proof.

An average width in Hilbert space

We present a formula for the 2-averaged width of a random vector in Hilbert space. This formula is a generalization of the Eckart–Young theorem on matrix approximation in the Frobenius norm. Also, it is a reformulation of Ismagilov’s theorem [20]. However, our terminology is very convenient, so this formulation makes sense. Our proof repeats Ismagilov’s and is presented for completeness.

Let H be a separable Hilbert space (of finite or infinite dimension) and \xi be a random vector in H such that \mathsf{E}|\xi|^2<\infty. Consider the correlation operator of \xi defined by \langle Au,v\rangle = \mathsf{E}\langle\xi,u\rangle\langle\xi,v\rangle. It is known that A is a compact positive selfadjoint operator (see, for example, [21], Ch. 3, § 2). A sequence (\lambda_n) is the sequence of eigenvalues of A if and only if there is an orthonormal basis \{\varphi_n\} (an eigenbasis of A) such that

\begin{equation} \mathsf{E}\langle\xi,\varphi_k\rangle^2=\lambda_k, \qquad \mathsf{E}\langle\xi,\varphi_k\rangle\langle\xi,\varphi_l\rangle=0 \quad\text{for } k\ne l. \end{equation} \tag{5}
We arrange the eigenvalues in decreasing order: \lambda_1\geqslant\lambda_2\geqslant\dots\geqslant0.

Statement 2.2. We have d_n^\mathrm{avg}(\xi,H)_2 = (\sum_{k>n}\lambda_k)^{1/2}.

Proof. Fix a basis (5) and put \xi_k:=\langle\xi,\varphi_k\rangle. We bound the width from below. Consider some approximating space Q_n with orthonormal basis \{\psi_1,\dots,\psi_n\}. Then
\begin{equation*} \mathsf{E}\rho(\xi,Q_n)^2=\mathsf{E}|\xi|^2 - \mathsf{E}|P_{Q_n}\xi|^2. \end{equation*} \notag
We have \mathsf{E}|\xi|^2=\sum_{k\geqslant1}\mathsf{E}\xi_k^2=\sum_{k\geqslant 1}\lambda_k, so it is sufficient to find an upper bound for the projection:
\begin{equation*} \begin{aligned} \, \mathsf{E}|P_{Q_n}\xi|^2 &=\sum_{j=1}^n \mathsf{E}\langle \xi,\psi_j\rangle^2= \sum_{j=1}^n\mathsf{E} \biggl(\sum_{k\geqslant 1}\xi_k\langle \psi_j,\varphi_k\rangle\biggr)^2 \\ &=\sum_{j=1}^n \sum_{k\geqslant1} \lambda_k \langle \psi_j,\varphi_k\rangle^2=\sum_{k\geqslant 1}\lambda_k \mu_k, \quad\text{where } \mu_k :=\sum_{j=1}^n \langle\psi_j,\varphi_k\rangle^2. \end{aligned} \end{equation*} \notag
Note that the numbers (\mu_k) belong to [0,1] and their sum equals n. By the monotonicity of the \lambda_n the sum \sum_{k\geqslant1}\lambda_k\mu_k is maximum if \mu_k = 1 for k\leqslant n and \mu_k=0 for k>n. We have obtained a lower bound for the width. Equality is attained at the subspace Q_n = \operatorname{span}\{\varphi_1,\dots,\varphi_n\}.

The statement is proved.

It is often convenient to work with singular values \sigma_k:=\lambda_k^{1/2} instead of eigenvalues. One can rewrite (5) in terms of the Schmidt decomposition:

\begin{equation*} \xi=\sum_{k\geqslant 1}\sigma_k \eta_k \varphi_k, \qquad \mathsf{E}\eta_k\eta_l=\delta_{k,l}, \qquad\langle\varphi_k,\varphi_l\rangle=\delta_{k,l}. \end{equation*} \notag
Then d_n^\mathrm{avg}(\xi,H)_2 = (\sum_{k>n}\sigma_k^2)^{1/2}.

We arrive at the Eckart–Young theorem if we consider finite sets. Let the matrix X consist of columns x^1,\dots,x^N\in\mathbb{R}^M; then

\begin{equation*} d_n^{\mathrm{avg}}(\{x^1,\dots,x^N\},\ell_2^M)_2= N^{-1/2}\biggl(\sum_{k>n}\sigma_k(X)^2\biggr)^{1/2}. \end{equation*} \notag
Indeed, the singular value decomposition of X gives use a basis \{\varphi_k\}_{k=1}^M such that the coordinate vectors (\langle x^1,\varphi_k\rangle,\dots,\langle x^N,\varphi_k\rangle) are orthogonal and have length \sigma_k(X). For \xi being a random choice of x^i we have \mathsf{E}\langle\xi,\varphi_k\rangle^2=N^{-1}\sigma_k^2(X) and \mathsf{E}\langle\xi,\varphi_k\rangle\langle\xi,\varphi_l\rangle=0, hence \sigma_k(\xi) = N^{-1/2}\sigma_k(X).

Finally, an orthonormal system \varphi_1,\dots,\varphi_N in a Euclidean space E corresponds to the identity matrix X (in appropriate coordinates), so we have (see also (1))

\begin{equation*} d_n(\{\varphi_1,\dots,\varphi_N\},E)= d_n^{\mathrm{avg}}(\{\varphi_1,\dots,\varphi_N\},E)_2=\sqrt{1-\frac nN}. \end{equation*} \notag

The trigonometric width

Let \Phi be a subset of X. If we approximate by subspaces spanned by elements of \Phi, then we obtain the notion of \Phi-width:

\begin{equation*} d_n^\Phi(K,X) :=\inf_{\varphi_1,\dots,\varphi_n\in\Phi}\sup_{x\in K}\rho(x,\operatorname{span}\{\varphi_i\}_{i=1}^n)_X. \end{equation*} \notag
However, we need only the case when \Phi is the trigonometric system \{e(kx)\}_{k\in\mathbb Z} in the continuous case or \{e(kx/N)\}_{k\in\mathbb Z_N} in the discrete case. In this situation the width is called the trigonometric width and is denoted by d_n^{\mathrm{T}}.

Usually, we consider real spaces (and the linear dimension over \mathbb R). When complex-valued functions are approximated, one can also consider linear spaces over \mathbb C. The distinction here is not important for us because in all cases where complex-valued functions are approximated we are interested in the dimension n up to its order of magnitude.

2.3. Miscellaneous

The material in this subsection is contained in [22], §§ 2.2, 2.8 and 8.8.

Probability

We use the probabilistic notation \mathsf{P} for the probability, \mathsf{E} for the expectation and \operatorname{Law} for the distribution. We also use the conditional expectation \mathsf{E}(\xi\mid\eta), but only for \eta with a finite number of values.

We make use of standard tail estimates. Let X_i, \mathsf{E} X_i=0, be independent random variables.

Hoeffding’s inequality

Let a_i\leqslant X_i\leqslant b_i almost surely; then

\begin{equation*} \mathsf{P}\biggl(\biggl|\sum_{i=1}^k X_i\biggr| \geqslant \lambda \sigma\biggr) \leqslant2\exp(-2\lambda^2), \qquad \sigma^2 :=\sum_{i=1}^k (b_i-a_i)^2. \end{equation*} \notag

Bernstein’s inequality

Let |X_i|\leqslant M almost surely, then

\begin{equation*} \mathsf{P}\biggl(\biggl|\sum_{i=1}^k X_i\biggr| \geqslant t\biggr) \leqslant 2\exp\biggl(-\frac{t^2/2}{\sigma^2+Mt/3}\biggr), \qquad \sigma^2 :=\sum_{i=1}^n \mathsf{E} X_i^2. \end{equation*} \notag

The \operatorname{VC}-dimension

Let us recall the definition of the combinatorial dimension of a class \mathcal F of functions f\colon I\to\mathbb{R}. We say that a set J\subset I is t-shattered (t>0) by \mathcal F, if there is a function h\colon J\to\mathbb{R}, such that for any choice of signs \sigma\colon J\to\{-1,1\} one can find a function f_\sigma\in\mathcal F such that

\begin{equation} \min_{x\in J}\sigma(x)(f_\sigma(x)-h(x))\geqslant t. \end{equation} \tag{6}
Then the combinatorial dimension \operatorname{vc}(\mathcal F,t) is the maximum cardinality of a set t-shattered by \mathcal F.

If \mathcal F is convex and centrally symmetric, then one can assume that h(x)\equiv 0. Indeed, we can replace \{f_\sigma\} by \{\frac12(f_\sigma-f_{-\sigma})\}.

We will use the simple (known) fact that

\begin{equation} \text{if } \operatorname{vc}(\mathcal F,t)> n, \quad\text{then } d_n(\mathcal F,\ell_\infty(I))\geqslant t. \end{equation} \tag{7}
Indeed, let J be a t-shattered set, |J|=n+1, and suppose that there is an approximation by an n-dimensional subspace: \mathcal F\ni f\approx g, g\in Q_n, \|f-g\|_\infty < t. Then there exists a nontrivial linear functional \Lambda(f) := \sum_{x\in J}\lambda(x)f(x) such that \Lambda(g)\equiv 0 for g\in Q_n. Assume that \Lambda(h)\geqslant 0, where h is from (6). Consider a function f such that \operatorname{sign}(\lambda(x))(f(x)-h(x))\geqslant t for all \{x\in J\colon \lambda(x)\ne0\}. Then for the approximating function we have \operatorname{sign}(\lambda(x))(g(x)-h(x))>0, hence \Lambda(g)=\Lambda(g-h)+\Lambda(h)>0. We obtain a contradiction. The case \Lambda(h)\leqslant0 is analogous.

We will also make use of the classical VC-bound for classes (that is, in fact, families of subsets) \mathcal F\colon I\to\{-1,1\}^k:

\begin{equation} |\mathcal F|>\binom{k}{0}+\binom{k}{1}+\dots+\binom{k}{d} \quad\Longrightarrow\quad \operatorname{vc}(\mathcal F,1) > d. \end{equation} \tag{8}
Thus, when the left-hand inequality in (8) holds, taking the projection of the class onto the set J, |J|>d, we obtain the whole cube \{-1,1\}^J.

There is a useful bound on binomial coefficients:

\begin{equation} \binom{k}{0}+\dots+\binom{k}m \leqslant \biggl(\frac{ek}m\biggr)^m, \qquad 0\leqslant m \leqslant k. \end{equation} \tag{9}
Note that the function (ek/x)^x is increasing for x\in(0,k].

We will often consider 1-periodic functions, that is, functions on \mathbb T:=\mathbb R/\mathbb Z. We also use the notation e(\theta):=\exp(2\pi i\theta).

We denote positive constants by c,c_1,c_2,C,\dots. If there is dependence on some parameters, then we specify it explicitly: c_p,c(\varepsilon),\dots .

§ 3. Sufficient conditions for rigidity in weak metrics

3.1. Rigidity in L_1

For definiteness we put \operatorname{sign}(x)\!=\!1 for x\!\geqslant\! 0 and {\operatorname{sign}(x)\!=\!-1} for x<0.

Theorem 3.1. Let \xi=(\xi_1,\dots,\xi_N) be a random vector in \mathbb{R}^N such that \mathsf{E}|\xi_i|\geqslant1 for all i. Let \varepsilon\in(0,1) and n\leqslant N(1-\varepsilon). Then

\begin{equation} d_n^{\mathrm{avg}}(\xi,\ell_1^N) \geqslant c(\varepsilon)N - \sum_{i=1}^N\|\mathsf{E}(\xi_i\mid \{\operatorname{sign}\xi_j\}_{j\ne i})\|_{L_1}. \end{equation} \tag{10}

Before proving the theorem let us note that the distance in \ell_1^N of a vector x\in\mathbb{R}^N to a subspace Q_n\subset\mathbb{R}^N can be written in dual terms as

\begin{equation*} \rho(x,Q_n)_1=\sup_{z\in K} \langle x,z\rangle, \qquad K :=B_\infty^N \cap Q_n^\perp. \end{equation*} \notag
So central cross-sections of the cube B_\infty^N play an important role here. We prove the following lemma.

Lemma 3.1. Let K be an m-dimensional central section of the cube B_\infty^N, and let m\geqslant \varepsilon N. Then \operatorname{vc}(K,c_1(\varepsilon)) \geqslant c_2(\varepsilon)N.

The proof of the lemma relies on two useful results.

Theorem D [23]. Any m-dimensional cross-section of a volume-one cube has volume at least 1:

\begin{equation*} \operatorname{Vol}_m\biggl(\frac12 B_\infty^N \cap L_m\biggr)\geqslant 1. \end{equation*} \notag

In fact, a weaker bound

\begin{equation*} \operatorname{Vol}_m(B_\infty^N\cap L_m)\geqslant c^m \end{equation*} \notag
suffices for our purposes.

Theorem E [24]. Let \mathcal F be a class of functions bounded by 1 and defined on a set I. Then for any probability measure \mu on I,

\begin{equation} \log N_t(\mathcal F,L_2(\mu)) \leqslant C \operatorname{vc}(\mathcal F,ct)\log\frac2t, \qquad 0<t<1, \end{equation} \tag{11}
where C and c are positive absolute constants.

Here N_\delta(K,X) is the size of a minimal \delta-net for a set K in the space X.

Proof of Lemma 3.1. Let K=L_m\cap B_\infty^N. From Theorem D we obtain {\operatorname{Vol}_m(K)\!\geqslant\! 2^m}, and we can use the standard volume estimate
\begin{equation*} \begin{aligned} \, N_r(K,\ell_2^N)&=N_r(K,\ell_2^N\cap L_m) \\ &\geqslant \frac{\operatorname{Vol}_m(K)}{\operatorname{Vol}_m(r B_2^m)} \geqslant \frac{2^m}{(cr m^{-1/2})^m}\geqslant 2^m \quad\text{for } r=c_1m^{1/2}. \end{aligned} \end{equation*} \notag
We apply (11) to the set K (note that \|z\|_\infty\leqslant 1 for z\in K by definition) in the space L_2^N for t=N^{-1/2}r\asymp \varepsilon^{1/2}:
\begin{equation*} \operatorname{vc}(K,ct)\geqslant c_1\frac{\log N_t(K,L_2^N)}{\log(2/t)} \geqslant c(\varepsilon)N. \end{equation*} \notag

The lemma is proved.

Now we are ready to prove Theorem 3.1.

Proof of Theorem 3.1. By duality we have
\begin{equation*} \mathsf{E}\rho(\xi,Q_n)_1=\mathsf{E} \max_{z\in K} \langle z,\xi\rangle, \qquad K := B_\infty^N\cap Q_n^\perp. \end{equation*} \notag

Lemma 3.1 produces a set of coordinates \Lambda\subset\{1,\dots,N\}, |\Lambda|\geqslant c(\varepsilon)N, and a number v\geqslant c(\varepsilon) such that for any choice of signs \sigma\colon\Lambda\to\{-1,1\} there is a vector z^\sigma\in K such that

\begin{equation} \min_{i\in\Lambda} \sigma_i z^\sigma_i \geqslant v. \end{equation} \tag{12}
(Recall that, as K is convex and centrally-symmetric, we may assume that z^\sigma oscillates around zero.)

We will derive from (12) that

\begin{equation} \mathsf{E} \max_\sigma \langle \xi,z^\sigma \rangle \geqslant v \sum_{i\in\Lambda} \mathsf{E}|\xi_i| - \sum_{i\notin\Lambda} \|\mathsf{E}(\xi_i\mid \{\operatorname{sign}\xi_j\}_{j\in\Lambda})\|_{L_1}. \end{equation} \tag{13}

Fix \sigma^*\colon\Lambda\to\{-1,1\}, and let \mathsf{E}^* be the conditional expectation \mathsf{E}^*(\,\cdot\,)=\mathsf{E}(\,\cdot\mid\operatorname{sign}\xi_i =\sigma_i^*,\,i\in\Lambda). Then we have

\begin{equation*} \begin{aligned} \, \mathsf{E}^*\max_\sigma\langle\xi,z^\sigma\rangle &\geqslant \mathsf{E}^*\langle\xi,z^{\sigma^*}\rangle \\ &=\mathsf{E}^*\sum_{i\in\Lambda}\xi_i z^{\sigma^*}_i+\sum_{i\notin\Lambda}\xi_i z^{\sigma^*}_i \geqslant v\sum_{i\in\Lambda}\mathsf{E}^*|\xi_i| - \sum_{i\notin\Lambda}|\mathsf{E}^*\xi_i|. \end{aligned} \end{equation*} \notag
Here we have used that \|z^{\sigma^*}\|_\infty\leqslant 1. We average over \sigma^* and obtain (13).

The theorem follows from (13), because the first sum there is at least v|\Lambda|\geqslant c_\varepsilon N, and for each term in the second sum we have

\begin{equation*} \|\mathsf{E}(\xi_i\mid\{\operatorname{sign}\xi_j\}_{j\in\Lambda})\|_{L_1} \leqslant \|\mathsf{E}(\xi_i\mid \{\operatorname{sign}\xi_j\}_{j\ne i})\|_{L_1}. \end{equation*} \notag

The theorem is proved.

Let us present some corollaries of Theorem 3.1 for the case when \mathsf{E}(\xi_i\mid\{\operatorname{sign}\xi_j\}_{j\ne i})\equiv0.

Corollary 3.1. Let \xi_1,\dots,\xi_N be independent random variables such that \mathsf{E}\xi_i=0 and \mathsf{E}|\xi_i|\geqslant 1. Then for any random variables \eta_1,\dots,\eta_n, n\leqslant N(1-\varepsilon), and any coefficients \{a_{i,j}\} we have

\begin{equation*} \mathsf{E}\|\xi-A\eta\|_1=\sum_{i=1}^N \mathsf{E}\biggl|\xi_i-\sum_{j=1}^n a_{i,j}\eta_j\biggr| \geqslant c(\varepsilon)N. \end{equation*} \notag
In terms of widths,
\begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_1)=N^{-1}d_n^{\mathrm{avg}}(\xi,\ell_1^N) \geqslant c(\varepsilon). \end{equation*} \notag
This also holds true for unconditional vectors \xi, that is, if \operatorname{Law}(\xi_1,\dots,\xi_N)=\operatorname{Law}(\pm\xi_1,\dots,\pm\xi_N).

Example 3.1. Suppose that

\begin{equation*} \int_0^1 f_k(x)\,dx=0\quad\text{and}\quad \int_0^1 |f_k(x)|\,dx=1,\quad k=1,\dots,N. \end{equation*} \notag
Then
\begin{equation*} d_n^{\mathrm{avg}}(\{f_1(x_1),\dots,f_N(x_N)\}, L_1[0,1]^N)\geqslant c(\varepsilon) \quad\text{for } n\leqslant N(1-\varepsilon). \end{equation*} \notag

3.2. Rigidity in L_0

Theorem 3.2. For any \varepsilon\in(0,1) there exists \delta>0 such that if \xi_1,\dots,\xi_N are independent random values and

\begin{equation*} \inf_{c\in\mathbb{R}}\|\xi_i-c\|_{L_0}\geqslant\varepsilon, \qquad i=1,\dots,N, \end{equation*} \notag
then for any subspace Q_n\subset\mathbb{R}^N of dimension n\leqslant \delta N we have
\begin{equation} \mathsf{P}(\rho(\xi,Q_n)_{L_0^N} \leqslant \delta)\leqslant 2\exp(-\delta N), \end{equation} \tag{14}
where \xi=(\xi_1,\dots,\xi_N).

The condition \|\xi_i-c\|_{L_0}\geqslant\varepsilon is essential here because it forbids an approximation of the vector \xi by a constant vector. This condition allows us to ‘separate’ slightly the values of the \xi_i.

Lemma 3.2. Let \varepsilon,\tau\in(0,1], and let \zeta be a random variable such that \inf_{c\in\mathbb{R}}\|{\zeta-c}\|_{L_0}\geqslant\varepsilon. Then there exist real numbers a<b, |b-a|\geqslant\frac{\tau\varepsilon}2, such that

\begin{equation*} \mathsf{P}(\zeta\leqslant a)\geqslant \frac{\varepsilon}3,\quad \mathsf{P}(\zeta\geqslant b)\geqslant \frac{\varepsilon}3\quad\textit{and}\quad \mathsf{P}(a<\zeta<b)\leqslant\tau. \end{equation*} \notag

Proof. Consider the quantities
\begin{equation*} q_- :=\inf\biggl\{x\colon \mathsf{P}(\zeta\leqslant x)\geqslant \frac\varepsilon3\biggr\}\quad\text{and}\quad q_+ :=\sup\biggl\{x\colon \mathsf{P}(\zeta\geqslant x)\geqslant \frac\varepsilon3\biggr\}. \end{equation*} \notag
Note that \mathsf{P}(\zeta\leqslant q_-)\geqslant \varepsilon/3 and \mathsf{P}(\zeta\geqslant q_+)\geqslant \varepsilon/3. We see that |q_+-q_-|\geqslant\varepsilon, because otherwise \|\zeta-c\|_{L_0}\leqslant\frac23\varepsilon for c=(q_-+q_+)/2. We fix k:=\lceil1/\tau\rceil, divide [q_-,q_+] into k equal subintervals and choose (a,b) to be the one having the least probability \mathsf{P}(a<\zeta<b). Then this probability is at most 1/k\leqslant\tau. Finally, |b-a|\geqslant |q_+-q_-|/k \geqslant \tau\varepsilon/2.

The lemma is proved.

Now we prove the theorem.

Proof of Theorem 3.2. We fix some small \delta>0 (to be chosen later) and estimate \mathsf{P}(\rho(\xi,Q_n)_{L_0^N}<\delta), where \dim Q_n\leqslant n\leqslant \delta N. Consider an almost best L_0^N approximation of (\xi_1,\dots,\xi_N) by a (random) vector (y_1,\dots,y_N) in the subspace Q_n. Let
\begin{equation*} \Lambda :=\{k\colon |\xi_k-y_k| \geqslant \delta\} \end{equation*} \notag
be the (random) set of badly approximated coordinates. If \|\xi-y\|_{L_0^N} < \delta, then |\Lambda|<\delta N, so it is sufficient to bound the probability of the event \{|\Lambda| \leqslant \delta N\}.

We apply Lemma 3.2 to \tau := 5\delta/\varepsilon and \xi_k and find intervals (a_k,b_k) such that

\begin{equation*} \begin{gathered} \, \mathsf{P}(a_k<\xi_k<b_k)\leqslant \tau,\qquad |b_k-a_k|\geqslant\frac{\tau\varepsilon}2 =\frac{5\delta}2, \\ \max\{\mathsf{P}(\xi_k\leqslant a_k),\mathsf{P}(\xi_k\geqslant b_k)\}\leqslant1-\frac{\varepsilon}3. \end{gathered} \end{equation*} \notag
Let
\begin{equation*} \Gamma :=\{k\colon \xi_k\in(a_k,b_k)\} \end{equation*} \notag
be the (random) set of ‘intermediate’ coordinates. We have \mathsf{E}|\Gamma|\leqslant \tau N; by Bernstein inequality we have \mathsf{P}(|\Gamma|\leqslant 2\tau N)\geqslant 1-2\exp(-c\tau N) and
\begin{equation} \mathsf{P}(|\Lambda|\leqslant \delta N) \leqslant \mathsf{P}(|\Lambda|\leqslant \delta N, \, |\Gamma|\leqslant 2\tau N)+2\exp(-c\tau N). \end{equation} \tag{15}

Fix some sets \Lambda^\circ,\Gamma^\circ\subset\{1,\dots,N\} of size |\Lambda^\circ|\leqslant\delta N and |\Gamma^\circ|\leqslant 2\tau N and consider the event \{\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ\}. Let N'=N-|\Lambda^\circ\cup\Gamma^\circ|; note that N'\geqslant N/2. For x\in\mathbb{R}^N let x'\in\mathbb{R}^{N'} be the vector x without coordinates in \Lambda^\circ\cup\Gamma^\circ. Then \|\xi'-y'\|_{\ell_\infty^{N'}} \leqslant \delta, and we have y'\in Q_n'; therefore,

\begin{equation} d_n(K,\ell_\infty^{N'})\leqslant\delta, \quad\text{where } K:=\bigl\{\xi'(\omega)\colon \Lambda(\omega)=\Lambda^\circ, \, \Gamma(\omega)=\Gamma^\circ\bigr\}. \end{equation} \tag{16}
Consider the random vector \pi with coordinates \pi_k := -1 for \xi_k\leqslant a_k, \pi_k := 0 for a_k<\xi_k<b_k and \pi_k := 1 for \xi_k\geqslant b_k. Note that \pi'\in\{-1,1\}^{N'} by construction. We argue that the set S := \{\pi'(\omega)\colon \Lambda(\omega)=\Lambda^\circ,\,\Gamma(\omega)=\Gamma^\circ\} of all possible values of \pi' contains at most (eN'/n)^{n} elements. Indeed, otherwise (8) and (9) imply that S contains a cube of dimension n+1 and
\begin{equation*} \operatorname{vc}(K,t)\geqslant n+1, \quad\text{where } t:=\frac12\min(b_k-a_k)\geqslant \frac{5\delta}{4}, \end{equation*} \notag
so (7) implies that d_n(K,\ell_\infty^{N'})\geqslant 5\delta/4, which contradicts (16). For any s\in S we have \mathsf{P}(\pi'=s)\leqslant(1-\varepsilon/3)^{N'}, so
\begin{equation*} \mathsf{P}(\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ) \leqslant \biggl(\frac{eN}{n}\biggr)^n\biggl(1-\frac{\varepsilon}3\biggr)^{N/2}. \end{equation*} \notag
Finally, we bound the probability in (15):
\begin{equation*} \begin{aligned} \, \mathsf{P}(|\Lambda|\leqslant \delta N,\,|\Gamma|\leqslant 2\tau N) &=\sum_{|\Lambda^\circ|\leqslant\delta N,\,|\Gamma^\circ|\leqslant 2\tau N}\mathsf{P}(\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ) \\ &\leqslant \biggl(\frac{eN}{\delta N}\biggr)^{\delta N} \biggl(\frac{eN}{2\tau N}\biggr)^{2\tau N} \biggl(\frac{eN}{n}\biggr)^{n} \biggl(1-\frac{\varepsilon}{3}\biggr)^{N/2}. \end{aligned} \end{equation*} \notag
It remains to choose the parameter \delta sufficiently small to ensure that this probability is exponentially small.

Theorem 3.2 is proved.

We can state corollary on the best n-term approximation by a dictionary:

\begin{equation*} \sigma_n(x,\Phi)_X := \inf_{\substack{\varphi_1,\dots,\varphi_n\in\Phi\\c_1,\dots,c_n\in\mathbb{R}}} \biggl\|x-\sum_{k=1}^n c_k\varphi_k \biggr\|_X. \end{equation*} \notag

Corollary 3.2. Under the assumptions of Theorem 3.2, for any set \Phi\subset\mathbb{R}^N, {|\Phi|\leqslant AN} and n<\delta N, we have

\begin{equation*} \mathsf{P}(\sigma_n(\xi,\Phi)_{L_0^N} \leqslant \delta)\leqslant 2\exp(-\delta N). \end{equation*} \notag

Indeed, there are exponentially many subspaces, so we can choose smaller {\delta=\delta(A,\varepsilon)} to provide an exponential bound on the probability.

We need a modification of Statement 2.1.

Lemma 3.3. For any random vector \xi in \mathbb{R}^N

\begin{equation*} c\,d_n^{\mathrm{avg}}(\xi,L_0^N)^4 \leqslant d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_0) \leqslant C\,d_n^{\mathrm{avg}}(\xi,L_0^N)^{1/4}. \end{equation*} \notag

Proof. Let d_n^\mathrm{avg}(\{\xi_1,\dots,\xi_N\},L_0) \leqslant \varepsilon\leqslant 1. Then for some \eta_1,\dots,\eta_N in an n-dimensional space we have \sum\|\xi_k-\eta_k\|_{L_0}\leqslant N\varepsilon. Consider the random sets \Lambda_t := \{k\colon |{\xi_k-\eta_k}|\geqslant t\}; note that \|\xi-\eta\|\geqslant t if and only if |\Lambda_t|\geqslant tN.

We take \delta := \sqrt\varepsilon and obtain

\begin{equation*} \begin{aligned} \, \mathsf{E}|\Lambda_\delta| &=\sum_{k=1}^N \mathsf{P}(|\xi_k-\eta_k|\geqslant\delta) \leqslant \delta N+ \bigl|\{k\colon\mathsf{P}(|\xi_k-\eta_k|\geqslant\delta)\geqslant\delta\}\bigr| \\ &\leqslant \delta N+\bigl|\{k\colon \|\xi_k-\eta_k\|_{L_0}\geqslant\delta\}\bigr| \leqslant \delta N+ \frac{\varepsilon N}\delta=2\delta N. \end{aligned} \end{equation*} \notag
Moreover, for \gamma := \sqrt{2\delta} we have
\begin{equation*} \mathsf{P}(\|\xi-\eta\|_{L_0^N}\geqslant\gamma) \leqslant \mathsf{P}(|\Lambda_\gamma|\geqslant \gamma N) \leqslant \frac{\mathsf{E}|\Lambda_\gamma|}{\gamma N} \leqslant \frac{\mathsf{E}|\Lambda_\delta|}{\gamma N} \leqslant \frac{2\delta N}{\gamma N}=\gamma. \end{equation*} \notag
Hence \mathsf{E}\|\xi-\eta\|_{L_0^N} \leqslant 2\gamma.

The proof of the second inequality is analogous.

The lemma is proved.

Corollary 3.3. In the assumptions of Theorem 3.2 we have

\begin{equation*} d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_0)\geqslant\delta\quad\textit{if $n\leqslant \delta N$.} \end{equation*} \notag

3.3. Random sets

Let A be a signum matrix, that is, a matrix with entries \pm1. We define the signum rank \operatorname{rank}_\pm A to be the minimum rank of matrices B such that with \operatorname{sign} B_{i,j}\equiv A_{i,j} (B_{i,j}\ne0). There is a bound on the number of signum N\times N matrices with signum rank at most r:

\begin{equation*} |\{A\in\{-1,1\}^{N\times N}\colon \operatorname{rank}_\pm A \leqslant r\}| \leqslant \biggl(\frac{4eN}{r}\biggr)^{2rN} \end{equation*} \notag
(see [25], and also [26] and [6]). For r=cN we obtain the bound 2^{bN^2}, where b=b(c)\to0 as c\to0. Hence, if c is small, then there are very few signum matrices such that \operatorname{rank}_\pm A\leqslant cN. Moreover, almost all signum matrices are quite distant from such matrices in the Hamming metric. Hence almost all signum matrices are also distant from the low-rank matrices with \operatorname{rank} B\leqslant cN in the L_0-distance. Now we state a corollary in terms of random signum matrices (with equiprobable signs).

Statement 3.1. Let \mathbf{\mathcal E} be a random matrix with independent \pm1 entries. Then

\begin{equation*} \mathsf{P}\Bigl(\,\min_{\operatorname{rank} B\leqslant c N}\|\mathbf{\mathcal E}-B\|_{L_0^{N\times N}}\leqslant c\Bigr)\leqslant 2\exp(-c N^2). \end{equation*} \notag

Let D_N(a,b) be the subset of L(a,b) consisting of functions that are piecewise constant on the intervals

\begin{equation*} \biggl(a+\frac{(b-a)(j-1)}{N},\, a+\frac{(b-a)j}{N}\biggr),\qquad j=1,\dots,N. \end{equation*} \notag

Statement 3.2. Let \mathbf{\varphi}_1,\dots,\mathbf{\varphi}_N be a random system in D_N(0,1) such that \varphi_{k,j} :=\varphi_k|_{((j-1)/N,j/N)}, where the \varepsilon_{k,j} are independent random signs. Then

\begin{equation*} d^{\mathrm{avg}}_n(\{\mathbf{\varphi}_1,\dots,\mathbf{\varphi}_N\},L_0)\geqslant c,\quad\textit{if $n\leqslant cN$} \end{equation*} \notag
with probability at least 1-2\exp(-cN^2).

Note that, although we use independent random variables in the definition of \varphi_k, the system \{\varphi_k\}_1^N is by no means independent.

Proof of Statement 3.2. Let us prove that if \{\varphi_k\} is not rigid in L_0(0,1), then the matrix \Phi := (\varphi_{k,j}) can be well approximated in L_0^{N\times N}; this will contradict Statement 3.1.

Consider an approximation \varphi_k\approx g_k in L_0 by functions in some n-dimensional subspace. There is a technical subtlety: we cannot claim that g_k\in D_N (one cannot average over intervals because such an operator is not defined in L_0). Instead, we use Lemma 3.3: it shows that the rigidity of the system \xi_k := \varphi_k is equivalent to the rigidity of the vector (\xi_1,\dots,\xi_N), hence it is determined by the distribution of this vector. So we can assume that the measure on the space (0,1), where everything is defined, consists of ‘atoms’ ((j-1)/N,j/N).

Finally, if \varphi_k\approx g_k, g_k\in D_N, and the average of \|\varphi_k - g_k\|_{L_0} is less than \delta, then we have \|\varphi_k-g_k\|_{L_0}\leqslant \gamma := \delta^{1/2} for all but at most \gamma N indices. Therefore, \|\Phi-G\|_{L_0^{N\times N}} \leqslant 2\gamma.

The statement is proved.

Corollary 3.4. There exists an orthonormal system \varphi_1,\dots,\varphi_N\in D_N(0,1) that is rigid in L_0:

\begin{equation*} d_n^{\mathrm{avg}}\bigl(\{\varphi_1,\dots,\varphi_N\}, L_0(0,1)\bigr) \geqslant c \quad\textit{if } n\leqslant cN. \end{equation*} \notag

Proof. We construct a system \varphi_1,\dots,\varphi_N\!\in\! D_{2N}(0,2) instead. We define \varphi_1,\dots,\varphi_N on (0,1) using random values \varphi_i|_{((j-1)/N,j/N)}=\pm\delta for some \delta to be specified below. This system will be L_0-rigid with high probability.

Then we extend this system to (0,2) to make it orthonormal. A necessary and sufficient condition of the existence of such an extension is known:

\begin{equation*} \max_{|a|=1}\biggl\|\sum_{k=1}^N a_k\varphi_k\biggr\|_{L_2(0,1)} \leqslant 1 \end{equation*} \notag
(see, for instance, [27], Ch. 8). If \delta is sufficiently small, then this condition holds true with high probability, because the spectral norm of a random \pm1 matrix is O(N^{1/2}); see, for example, [22], Ch. 4.

Moreover, we can put the functions \varphi_k|_{(1,2)} in an arbitrary N-dimensional space (preserving the integrals \displaystyle\int_1^2\varphi_k\varphi_l), for example, in D_N(1,2).

The corollary is proved.

3.4. Lacunary systems

Let \lambda>1. We say that a sequence of positive integers k_1,\dots,k_N is \lambda-lacunary if k_{j+1}/k_j\geqslant\lambda for j=1,\dots,N-1.

Statement 3.3. Let \varphi be a Riemann-integrable Borel function on \mathbb{T} such that \mu\{x\in\mathbb{T}\colon \varphi(x)\ne a\}>0 for each a\in\mathbb{R}. Then the following hold.

1. There exists \lambda=\lambda(\varphi)>1 such that for any \lambda-lacunary sequence k_1,\dots,k_N we have

\begin{equation*} d_n^{\mathrm{avg}}\bigl(\{\varphi(k_1x),\dots,\varphi(k_Nx)\},L_0(\mathbb{T})\bigr)\geqslant c(\varphi) \quad\textit{if } n\leqslant c(\varphi) N. \end{equation*} \notag

2. For any \varepsilon>0 there exists \lambda=\lambda(\varphi,\varepsilon)>1 such that for any \lambda-lacunary sequence k_1,\dots,k_N we have

\begin{equation*} d_n^{\mathrm{avg}}\bigl(\{\varphi(k_1x),\dots,\varphi(k_Nx)\},L_1(\mathbb{T})\bigr)\geqslant c(\varphi,\varepsilon) \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag

We use the following quantitative result.

Theorem F (see [28], Theorem 4.3). Let (k_j) be an increasing sequence of positive integers. Then there exists a sequence (g_j) of measurable functions on (0,1) which are independent and uniformly distributed over (0,1) and such that

\begin{equation*} \mu\biggl\{x\in[0,1]\colon |\{k_j x\}-g_j(x)|>\frac{2k_j}{k_{j+1}}\biggr\}\leqslant \frac{2k_j}{k_{j+1}}, \qquad j=1,2,\dotsc\,. \end{equation*} \notag

Now can prove our statement.

Proof of Statement 3.3. Let us start with the L_1-case. First of all, we can shift and renorm \varphi to obtain \displaystyle\int_\mathbb{T}\varphi(x)\,dx=0 and \displaystyle\int_\mathbb{T}|\varphi(x)|\,dx=1 (this does not affect rigidity). Given a \lambda-lacunary sequence k_1,\dots,k_N for some \lambda>1, we apply Theorem F to obtain functions g_1,\dots,g_N satisfying \|\{k_j x\}-g_j(x)\|_{L_0} \leqslant 2/\lambda.

Put f_j(x):=\varphi(g_j(x)). Then the functions f_j are independent, \displaystyle\int f_j=0 and \displaystyle\int|f_j|=1. Corollary 3.1 implies the rigidity of \{f_j\}. To prove the rigidity of \{\varphi(k_jx)\} we have to bound \|\varphi(k_jx)-f_j\|_1:

\begin{equation*} \begin{aligned} \, &\int_\mathbb{T}|\varphi(k_jx)-f_j(x)|\,dx \\ &\qquad=\int_A|\varphi(k_jx)-\varphi(g_j(x))|\,dx+ \int_{\mathbb{T}\setminus A}|\varphi(k_jx)-\varphi(g_j(x))|\,dx, \end{aligned} \end{equation*} \notag
where A=\{x\colon |\{k_j x\}-g_j(x)|>2/\lambda\}. The first integral is at most 2\|\varphi\|_\infty\mu(A) \leqslant 4\|\varphi\|_\infty/\lambda. The second integral does not exceed
\begin{equation*} \int_{\mathbb T}\sup_{|x-t|\leqslant 2/\lambda}|\varphi(t)-\varphi(x)|\,dx=: \tau\biggl(\varphi,\frac{4}\lambda\biggr), \end{equation*} \notag
where \tau is the averaged modulus of continuity. It is known that {\lim_{h\to 0}\tau(\varphi,h)=0} for Riemann-integrable functions. Hence, for sufficiently large \lambda the value of \|\varphi(k_jx)-f_j(x)\|_1 is sufficiently small. This proves the L_1-case.

The L_0-case also follows because

\begin{equation*} \|\varphi(k_jx)-f_j(x)\|_{L_0} \leqslant \|\varphi(k_jx)-f_j(x)\|_1^{1/2} \end{equation*} \notag
and we can use Corollary 3.3.

The statement is proved.

3.5. Complemented subspaces

In this subsection we outline another method for the proof of rigidity, which is based on the notion of complementability.

Let f_1,\dots,f_N be an orthonormal system in L_2(0,1). Suppose that we aim to prove the rigidity of \{f_j\} in a larger space X\supset L_2. If there exists a bounded linear operator \pi\colon X\to L_2 that preserves the system: \pi f_j=f_j, j=1,\dots,N, then we obtain rigidity in X:

\begin{equation*} \|\pi\|_{X\to L_2} d_n^{\mathrm{avg}}(\{f_1,\dots,f_N\},X)_2 \geqslant d_n^{\mathrm{avg}}(\{f_1,\dots,f_N\},L_2)_2=\biggl(1-\frac nN\biggr)^{1/2}. \end{equation*} \notag

Let us state a corollary.

Statement 3.4. Let \{f_j\}_1^\infty be an orthonormal system in X\supset L_2 such that two conditions are satisfied:

Then

\begin{equation*} d_n^{\mathrm{avg}}(\{f_{i_1},\dots,f_{i_N}\},X)_2\geqslant c\sqrt{1-\frac nN}, \quad\textit{where } c=c(F_X,X)>0, \end{equation*} \notag
for all i_1<i_2<\dots<i_N.

Corollary 3.5. The Rademacher chaos \{r_ir_j\}_{1\leqslant i<j<\infty} is rigid in L\log L: for any set of pairs \Lambda we have

\begin{equation*} d_n^{\mathrm{avg}}(\{r_ir_j\}_{(i,j)\in\Lambda}, L\log L)_2 \geqslant c(\varepsilon)>0 \quad\textit{if } n<|\Lambda|(1-\varepsilon). \end{equation*} \notag

Proof. Let X be a symmetric space. It is known that for the Rademacher chaos property (1) is equivalent to the embedding X\supset H (see [29], Theorem 6.4); where the space H is the closure of L_\infty in the Orlicz space L_{\varphi_1}, \varphi_1(t)=e^t-1. A criterion for complementability is the double embedding H\subset X\subset H' (see [29], Theorem 6.7). Thus, for X=H' both conditions are satisfied. The space H' coincides with the Lorentz space \Lambda(\psi_1) of functions satisfying \displaystyle\int_0^1 f^*(t)\log(2/t)\,dt<\infty (see [29], §§ 2.2 and 2.4). Further, this Lorentz space coincides with the space L\log L of functions satisfying \displaystyle\int_0^1 |f|\log(2+|f|)\,dt<\infty (see, for example, [30], Theorem D).

The corollary is proved.

§ 4. Rigidity in L_p, 1<p<2

4.1. The S_{p'}-property

We start with a finite-dimensional result. Its proof follows the paper [31] by Gluskin.

Theorem 4.1. Let 1<p<2 and N\in\mathbb N. Suppose that \xi=(\xi_1,\dots,\xi_N) is an isotropic random vector in \mathbb{R}^N: \mathsf{E}\xi_i^2=1 and \mathsf{E}\xi_i\xi_j=0, i\ne j, that satisfies two conditions:

\begin{equation} (\mathsf{E}|\xi|^{2+\gamma})^{1/(2+\gamma)} \leqslant AN^{1/2} \quad\textit{for some } \gamma>0, \quad A\geqslant 1, \end{equation} \tag{17}
and
\begin{equation} \max_{|v|=1}(\mathsf{E}|\langle \xi,v\rangle|^{p'})^{1/{p'}} \leqslant B\quad\textit{for some $B\geqslant1$}. \end{equation} \tag{18}
Then for any \varepsilon\in(0,1) satisfying n\leqslant N(1-\varepsilon) and any n-dimensional space Q_n\subset\mathbb{R}^N we have
\begin{equation*} \mathsf{P}\bigl\{\rho(\xi,Q_n)_{\ell_p^N}\geqslant c(A,\varepsilon,\gamma)N^{1/p} B^{-1}\bigr\} \geqslant c(A,\varepsilon,\gamma). \end{equation*} \notag

Proof. Let P be the orthogonal projection onto the space Q_n^\perp; then
\begin{equation} \rho(\xi,Q_n)_p=\max_{z\in Q_n^\perp} \frac{\langle \xi,z\rangle}{\|z\|_{p'}} \geqslant \frac{\langle \xi,P\xi\rangle}{\|P\xi\|_{p'}}. \end{equation} \tag{19}

First we analyze \eta := \langle\xi,P\xi\rangle, the numerator of (19):

\begin{equation*} \mathsf{E}\eta=\mathsf{E}\Bigl\langle\sum\xi_i e_i,\sum \xi_i Pe_i\Bigr\rangle= \sum \langle e_i,Pe_i\rangle=\operatorname{tr} P=N -n \geqslant \varepsilon N. \end{equation*} \notag
On the other hand
\begin{equation*} \begin{aligned} \, \mathsf{E}\eta &\leqslant \frac12\, \varepsilon N+\mathsf{E}\eta\, \mathbf{1} \biggl\{\eta\geqslant\frac12\,\varepsilon N\biggr\} \\ &\leqslant \frac12\,\varepsilon N+ (\mathsf{E}|\eta|^{1+\gamma/2})^{2/(2+\gamma)}\mathsf{P} \biggl(\eta\geqslant\frac12\,\varepsilon N\biggr)^{\gamma/(2+\gamma)}. \end{aligned} \end{equation*} \notag
Since |\eta|=|\langle\xi,P\xi\rangle|\leqslant|\xi|^2, we have \mathsf{E}|\eta|^{1+\gamma/2}\leqslant (AN^{1/2})^{2+\gamma} and we conclude that
\begin{equation} \mathsf{P}\biggl(\eta \geqslant \frac12\, \varepsilon N\biggr)\geqslant t := \biggl(\frac{\varepsilon}{2A^2}\biggr)^{(2+\gamma)/\gamma}. \end{equation} \tag{20}

Now consider the denominator of (19):

\begin{equation*} \mathsf{E}\|P\xi\|_{p'}^{p'}=\sum_{k=1}^N \mathsf{E}|\langle P\xi,e_k\rangle|^{p'}= \sum_{k=1}^N \mathsf{E}|\langle \xi,Pe_k\rangle|^{p'} \leqslant NB^{p'}. \end{equation*} \notag
Using Markov’s inequality we obtain
\begin{equation} \mathsf{P}(\|P\xi\|_{p'}^{p'} \geqslant 2t^{-1} NB^{p'})\leqslant \frac12\, t. \end{equation} \tag{21}

We combine (20) and (21) to obtain the required bound

\begin{equation*} \rho(\xi,Q_n)_p \geqslant \frac{\langle \xi,P\xi\rangle}{\|P\xi\|_{p'}} \geqslant \frac{\varepsilon N/2}{2^{1/p'}t^{-1/p'}N^{1/p'}B} \geqslant \frac14 \, \varepsilon t^{1/2} N^{1/p}B^{-1} \end{equation*} \notag
with probability at least t/2.

The theorem is proved.

Remark 4.1. Note that if condition (18) is satisfied, then (17) also holds for A:=B and \gamma:=p'-2; see the proof of Theorem 4.2. However, is some cases we need dependence of the form (\cdots)B^{-1} of the lower bound on the parameter B.

Let us formulate a corollary in function-theoretic terms.

Theorem 4.2. Let \varphi_1,\dots,\varphi_N be an orthonormal system in a space L_2(\mathcal X,\mu), \mu(\mathcal X)=1. Let 1<p<2, and suppose that \{\varphi_k\} has the S_{p'}-property, that is, for some B\geqslant1

\begin{equation*} \biggl\|\sum_{k=1}^N a_k\varphi_k \biggr\|_{p'} \leqslant B|a| \quad\forall\, a\in\mathbb{R}^N. \end{equation*} \notag
Then this system is rigid in L_p:
\begin{equation*} d_n^{\mathrm{avg}}(\{\varphi_1,\dots,\varphi_N\},L_p)_p \geqslant c(B,\varepsilon,p) \quad\textit{if } n\leqslant N(1-\varepsilon). \end{equation*} \notag

Proof. Indeed, let x be a random point in \mathcal X, and consider the random vector \xi=(\varphi_1(x),\dots,\varphi_N(x)). Condition (18) in Theorem 4.1 is exactly the S_{p'}-property. Let us check that condition (17) is fulfilled for A:=B and \gamma=p'-2. Indeed,
\begin{equation*} \|\varphi_k^2\|_{1+\gamma/2} =\|\varphi_k\|_{2+\gamma}^2=\|\varphi_k\|_{p'}^2 \leqslant B^2 \end{equation*} \notag
and
\begin{equation*} \biggl\|\biggl(\sum_{k=1}^N \varphi_k^2\biggr)^{1/2}\biggr\|_{2+\gamma}^2= \|\varphi_1^2+\dots+\varphi_N^2\|_{1+\gamma/2} \leqslant \sum_{k=1}^N \|\varphi_k^2\|_{1+\gamma/2} \leqslant B^2N. \end{equation*} \notag
Hence d_n^{\mathrm{avg}}(\xi,\ell_p^N)_p is bounded below. It remains to use Statement 2.1.

The theorem is proved.

Corollary 4.1. Let \varphi_1,\dots,\varphi_N be an orthonormal system in L_2(\mathcal X,\mu), \mu(\mathcal X)=1, that is uniformly bounded: \|\varphi_k\|_\infty\leqslant A. Then for 1<p<2 we have

\begin{equation*} d_n(\{\varphi_1,\dots,\varphi_N\},L_p) \geqslant c(A,p)>0 \quad\textit{if } n\leqslant \frac12 N^{2/p'}. \end{equation*} \notag

Proof. Indeed, by a theorem of Bourgain [15] one can find a subsystem of size \lfloor N^{2/p'}\rfloor of \{\varphi_k\} with the S_{p'}-property and apply Corollary 3.5 to this subsystem.

The corollary is proved.

4.2. Additional remarks

Explicit rigid sets

Previous considerations allow us to present explicit constructions of \ell_p-rigid subsets of \{-1,1\}^N of polynomial size.

Statement 4.1. For any p\in(1,2) and any sufficiently large N one can explicitly construct a set V\subset\{-1,1\}^N of size |V|\leqslant N^{C(p)} such that d_{N/2}(V,\ell_p^N)\geqslant c(p) N^{1/p}.

Proof. It is sufficient to construct N characters \chi_1,\dots,\chi_N\in \widehat{\mathbb Z_2^k} (with appropriate k) that form an S_{p'}-system. Then we apply Theorem 4.2: d_{N/2}^\mathrm{avg}(\{\chi_1,\dots,\chi_N\},L_p)_p\geqslant c>0. Hence once can take V:=\{(\chi_1(x),\dots,\chi_N(x))\colon x\in \mathbb Z_2^k\}.

In [32] an S_{2m}-system of 2^n characters in \mathbb Z_2^{mn} was constructed, so it remains to take 2^n\approx N, m=\lceil p'/2\rceil and k=mn.

The statement is proved.

Independence for p>2

An analogue of Theorem 3.1 with normalization in L_p also holds for 1<p\leqslant2. We can easily deduce from Theorem 4.1 that an independent system of functions is rigid (with an additional log factor). However, this result will be the subject of another paper. The analogue of Theorem on the L_1-rigidity of independent functions is not true for p>2.

Statement 4.2. For any p\in(2,\infty) there exists \delta>0 such that for all sufficiently large N there exists a sequence \xi_1,\dots,\xi_N of independent identically distributed random variables such that \mathsf{E}\xi_i=0, \mathsf{E}|\xi_i|^p=1 and

\begin{equation*} d_{N^{1-\delta}}(\{\xi_1,\dots,\xi_N\},L_p) \leqslant C(p)N^{-\delta}. \end{equation*} \notag

Proof. We use the approximation of the octahedron B_1^N in \ell_\infty^N (see [33]). Let n=N^{1-\beta}, where \beta is to be chosen later, and let Q_n^* be an extremal n-dimensional \ell_\infty-subspace:
\begin{equation*} \rho(B_1^N,Q_n^*)_\infty=d_n(B_1^N,\ell_\infty^N) \leqslant C(\beta) n^{-1/2}. \end{equation*} \notag
We will approximately ‘simulate’ vertices of B_1^N. Fix some small \varepsilon>0, and let \xi_1 be a random variable such that
\begin{equation*} \mathsf{P}(\xi_1=0)=1-\varepsilon\quad\text{and} \quad \mathsf{P}(\xi_1=K)=\mathsf{P}(\xi_1=-K)=\frac\varepsilon2, \end{equation*} \notag
where K is defined by the condition \mathsf{E}|\xi_1|^p=1; so that \varepsilon K^p=1. Consider independent copies \xi_2,\dots,\xi_N of \xi_1 and the vector \xi=(\xi_1,\dots,\xi_N). We construct an approximation vector \eta that lies in Q_n^* (so that \eta_1,\dots,\eta_N lie in an n-dimensional subspace of L^p).

Consider three events: \mathcal{A}_0=\{\xi=0\}, \mathcal{A}_1: exactly one coordinate of \xi is nonzero, and \mathcal{A}_{2}: at least two coordinates of \xi are nonzero. Set \eta(\omega):=0 for \omega\in \mathcal{A}_0\cup \mathcal{A}_{2}, and let \eta(\omega) be the best approximation to \xi(\omega) from Q_n^* for \omega\in \mathcal{A}_1. In the latter case we have \|\xi-\eta\|_\infty \leqslant K\rho(B_1^N,Q_n^*)_\infty.

By symmetry it is sufficient to estimate \mathsf{E}|\xi_1-\eta_1|^p:

\begin{equation*} \mathsf{E}|\xi_1-\eta_1|^p \leqslant \mathsf{P}(\mathcal{A}_1) K^p\rho(B_1^N,Q_n^*)_\infty^p+ \mathsf{P}(\mathcal{A}_2) K^p. \end{equation*} \notag
The first term is bounded by
\begin{equation*} N\varepsilon K^p C(\beta)^p n^{-p/2} \ll N^{1-(1-\beta)p/2} \ll N^{-\delta} \end{equation*} \notag
if \beta is sufficiently small. The second term is bounded by N^2\varepsilon^2K^p = N^2\varepsilon, and we can make it small by letting \varepsilon tend to zero.

The statement is proved.

§ 5. Approximation of Walsh and trigonometric functions

Recall some notation. Given a finite Abelian group G, we denote the group of characters \chi\colon G\to\mathbb{C} by \widehat{G}. The probability measure \mu_G on G (\mu_G(A)=|A|/|G|) allows us to define the metrics L_{\mathrm{H}} and L_0 for functions on G.

We are interested in two particular cases, the group \mathbb Z_2^k and the cyclic group \mathbb Z_N.

In the first case characters are real and can be written as W_x\colon y\mapsto (-1)^{\langle x,y\rangle}, x,y\in\mathbb Z_2^k. We can identify these characters with functions in the well-known orthonormal Walsh system (in the Paley numeration) w_0\equiv1,w_1,w_2,\dots . Indeed, for n<2^k we have w_n\bigl(\sum_{j\geqslant 1}y_j2^{-j}\bigr)=W_x(y), where x=(x_1,\dots,x_k)\in\{0,1\}^k, y=(y_1,\dots,y_k)\in\{0,1\}^k and n=\sum_{j=1}^kx_j2^{j-1}.

The second case leads, of course, to the discrete trigonometric system y\mapsto e(xy/N), where e(\theta):=\exp(2\pi i\theta).

5.1. Kolmogorov widths

Recall the following well-known statement (see, for instance, [34], § 2.1).

Lemma 5.1. Let G be a finite Abelian group, and let A,B\subset G and |A|+|B|>|G|. Then A+B=\{a+b\colon a\in A,\,b\in B\} = G.

We use an immediate consequence of this lemma.

Lemma 5.2. Let A,B\subset\widehat{G} and |A|+|B|>|G|. Then for any m,n\in\mathbb N

\begin{equation*} d_{mn}(\widehat{G},L_{\mathrm{H}}) \leqslant d_m(A,L_{\mathrm{H}})+d_n(B,L_{\mathrm{H}}). \end{equation*} \notag

The next theorem is proved using methods from [8]. It is a modification of Theorem 1.2 in that paper.

Theorem 5.1. Let k\in\mathbb N. Then for any 1\leqslant \lambda\leqslant \sqrt{k}/4 we have

\begin{equation*} d_n(\widehat{\mathbb Z_2^k},L_{\mathrm{H}}) \leqslant 4\exp(-2\lambda^2) \quad \textit{if } n\geqslant \biggl(\frac{k}{\lambda^2}\biggr)^{4\lambda\sqrt{k}}. \end{equation*} \notag

Proof. We identify \mathbb Z_2^k\equiv\{0,1\}^k and denote characters by W_x:
\begin{equation*} W_x\colon y\mapsto (-1)^{\langle x,y\rangle}, \qquad x,y\in\{0,1\}^k. \end{equation*} \notag
Our goal is to approximate \{W_x\} in L_{\mathrm{H}}. Let \mathcal X be the set of x such that \bigl|\sum_{i=1}^k x_i- k/2\bigr|\leqslant\sqrt{k}. Hoeffding’s inequality shows that |X|>2^{k-1}, so it is enough to approximate W_x for x\in\mathcal X (and then use Lemma 5.2).

Fix x\in\mathcal X. Let I:=\{i\colon x_i=1\} and

\begin{equation*} Y_I :=\biggl\{y\in\{0,1\}^k\colon \biggl|\sum_{i\in I} y_i - \frac{|I|}2\biggr| \leqslant \lambda|I|^{1/2}\biggr\}. \end{equation*} \notag
Then 2^{-k}|Y_I| \geqslant 1-2\exp(-2\lambda^2), and so for y\in Y_I we have
\begin{equation*} \begin{aligned} \, \biggl|\langle x,y\rangle-\frac{k}{4}\biggr| &=\biggl|\sum_{i\in I}y_i - \frac k4\biggr| \leqslant\biggl|\sum_{i\in I}y_i-\frac{|I|}2\biggr| +\biggl|\frac{|I|}2-\frac k4\biggr| \\ &\leqslant\lambda\sqrt{|I|}+\frac{\sqrt{k}}2 \leqslant\lambda\sqrt{\frac k2+\sqrt{k}}+\frac{\sqrt{k}}2 \leqslant 2\lambda\sqrt{k}. \end{aligned} \end{equation*} \notag
We consider the polynomial q(t) that interpolates (-1)^t for t\in\mathbb Z\cap[k/4-2\lambda\sqrt{k}, k/4+2\lambda\sqrt{k}]. Its degree d is at most 4\lambda\sqrt{k}; note that 4\lambda\sqrt{k}\leqslant k. Then W_x(y)=q(\langle x,y\rangle) for all y\in Y_I, hence
\begin{equation*} \|W_x - q(\langle x,\,\cdot\,\rangle)\|_{L_{\mathrm{H}}} \leqslant 2\exp(-2\lambda^2). \end{equation*} \notag

Let us bound the linear dimension of the family of functions \{q(\langle x,\,\cdot\,\rangle)\}_x (it is sufficient to consider x\in\mathcal X but it is more convenient to work with all x\in\{0,1\}^k). Their dimension is equal to the rank of the matrix (q(\langle x,y\rangle))_{x,y\in\{0,1\}^k}, where x indexes rows and y indexes columns. The rank is bounded by the number of the monomials in the polynomial q(x_1y_1+\dots+x_ky_k), because each monomial generates rank-one matrix. Every monomial is a product of several x_iy_i, so we can bound the total number of them:

\begin{equation*} \operatorname{rank}(q(\langle x,y\rangle)_{x,y\in\{0,1\}^k}) \leqslant \binom{k}{0}+\dots+\binom{k}{d} \leqslant \biggl(\frac{ek}d\biggr)^d \leqslant \biggl(\frac{\sqrt{k}}\lambda\biggr)^{4\lambda\sqrt{k}}. \end{equation*} \notag
To finish the proof it remains to apply Lemma 5.2.

The theorem is proved.

Corollary 5.1. Let w_1,\dots,w_N,\dots be the Walsh system. Then

\begin{equation*} d_n(\{w_1,\dots,w_N\},L_{\mathrm{H}}[0,1])\leqslant 2\exp(-c\log^{0.3}N) \quad\textit{if } n\geqslant\exp(C\log^{0.7}N). \end{equation*} \notag

Theorem 5.2. Let k\in\mathbb N. For 1\leqslant \lambda \leqslant c_0k^{1/4} we have

\begin{equation*} d_n(\widehat{\mathbb Z_{2^k}},L_0) \leqslant c_1k\exp(-c_2\lambda^2) \quad \textit{if } n\geqslant \biggl(\frac k{\lambda^2}\biggr)^{c_3\lambda^3\sqrt{k}}. \end{equation*} \notag

Proof. We have to approximate characters y\mapsto e(xy/2^k). We write x=\sum_{i=0}^{k-1}x_i2^i and y=\sum_{j=0}^{k-1}y_j2^j to identify x,y\in\mathbb Z_{2^k} with the binary vectors (x_0,\dots,x_{k-1}) and (y_0,\dots,y_{k-1}). Using the 1-periodicity of e(\,\cdot\,) we obtain
\begin{equation*} e\biggl(\frac{xy}{2^k}\biggr)=e\biggl(\sum_{i,j=0}^{k-1}x_iy_j2^{i+j-k}\biggr) =e\biggl(\sum_{s=1}^k2^{-s}\sum_{i+j=k-s}x_iy_j\biggr). \end{equation*} \notag

We fix a number s_0\leqslant\sqrt{k} and estimate the tail:

\begin{equation*} \sum_{s>s_0}2^{-s}\sum_{i+j=k-s}x_iy_j\leqslant k\sum_{s>s_0}2^{-s}\leqslant k2^{-s_0}. \end{equation*} \notag
The function e(\,\cdot\,) is (2\pi)-Lipschitz, so we have a uniform approximation
\begin{equation} \biggl|e\biggl(\frac{xy}{2^k}\biggr) - \psi_x(y)\biggr| \leqslant 2\pi k2^{-s_0}, \quad \psi_x(y) :=e\biggl(\sum_{s=1}^{s_0}2^{-s}\sum_{i+j=k-s}x_iy_j\biggr). \end{equation} \tag{22}
Hence it is sufficient to approximate the functions \psi_x.

As in the previous proof, first we deal with the set \mathcal X:=\bigl\{x\colon \bigl|\sum_{i=0}^{k-1}x_i-k/2\bigr|\leqslant\sqrt{k}\,\bigr\}. Fix x\in\mathcal X.

Fix an index s\in\{1,\dots,s_0\}. Then we have

\begin{equation*} \biggl|\sum_{i=0}^{k-s}x_i-\frac{k}2\biggr|\leqslant\sqrt{k}+s\leqslant2\sqrt{k}. \end{equation*} \notag
Let J_s=\{j\in\{0,\dots,k-s\}\colon x_{k-s-j}=1\}; then ||J_s|-k/2|\leqslant2\sqrt{k} and
\begin{equation*} \biggl|\sum_{j\in J_s}y_j-\frac12|J_s|\biggr|\leqslant\lambda|J_s|^{1/2} \end{equation*} \notag
for random y\in\{0,1\}^k with probability at least 1-2\exp(-2\lambda^2). For such y we have
\begin{equation*} \begin{aligned} \, \biggl|\sum_{i+j=k-s}x_iy_j-\frac{k}{4}\biggr| &=\biggl|\sum_{j\in J_s}y_j-\frac{k}{4}\biggr| \leqslant \biggl|\sum_{j\in J_s}y_j-\frac{|J_s|}{2}\biggr|+\biggl|\frac{|J_s|}{2}-\frac{k}{4}\biggr| \\ &\leqslant \lambda\sqrt{\frac{k}{2}+2\sqrt{k}}+\sqrt{k}\leqslant 2\lambda\sqrt{k}. \end{aligned} \end{equation*} \notag
(Here we may assume that k is rather large.)

We use interpolation again. There exists a polynomial q_s of degree d\leqslant 4\lambda\sqrt{k}, that interpolates the function t\mapsto e(2^{-s}t) at the points \mathbb Z\cap [k/4-2\lambda\sqrt{k},k/4+2\lambda\sqrt{k}]. Then the function e\bigl(2^{-s}\sum_{i+j=k-s}x_iy_j\bigr) is approximated by q_s(\langle x,y\rangle) and the error in the Hamming metric is at most 2\exp(-2\lambda^2); moreover, \dim\{q_s(\langle x,\,\cdot\,\rangle)\}_x \leqslant (ek/d)^d \leqslant (\sqrt{k}/\lambda)^{4\lambda\sqrt{k}}.

The function that we wish to approximate is a product:

\begin{equation} \psi_x(y)=\prod_{s=1}^{s_0} e\biggl(2^{-s}\sum_{i+j=k-s}x_iy_j\biggr). \end{equation} \tag{23}
We approximate it by the product of the corresponding polynomials. The error in the Hamming metric is at most 2s_0\exp(-2\lambda^2). The rank of the product is at most the product of the ranks, that is, (\sqrt{k}/\lambda)^{4s_0\lambda\sqrt{k}}.

We use Lemma 5.2 to approximate \psi_x in L_{\mathrm{H}} for all x. Finally, we use (22) to obtain an L_0-approximation of the characters with accuracy 2\pi k2^{-\mkern-1mu s_0}\mkern-1.5mu + 4s_0\mkern-1mu\exp(\mkern-2mu-2\lambda^2\mkern-1.5mu) by using dimension \leqslant (k/\lambda^2)^{4s_0\lambda\sqrt{k}}. It remains to take s_0\asymp\lambda^2.

The theorem is proved.

Corollary 5.2. The following inequality holds:

\begin{equation*} d_n(\{e(kx)\}_{k=-N}^N, L_0(\mathbb T))\leqslant 2\exp(-c\log^{0.2}N) \quad\textit{if } n\geqslant\exp(C\log^{0.9}N). \end{equation*} \notag

To approximate continuous function, first we discretize them by approximating them by piecewise constant functions defined on sufficiently finite partitionings into 2^k subintervals, and then we approximate the resulting matrix (of the discrete Fourier transform) using the above result.

5.2. Trigonometric widths in L_p, p<1

Recall that the trigonometric widths d_n^{\mathrm{T}} correspond to approximation by subspaces spanned by functions in the trigonometric system.

We consider two cases in parallel: the continuous case with harmonics e(kx) on \mathbb{T} and the discrete case with harmonics e(kx/N) on \mathbb Z_N. We use the shorthand L_p^N for the space L_p(\mathbb Z_N) with the usual (quasi)norm

\begin{equation*} \|f\|_{L_p^N}=\biggl(\frac1{N}\sum_{x\in\mathbb Z_N}|f(x)|^p\biggr)^{1/p}. \end{equation*} \notag

Theorem 5.3. For any p\in(0,1) and for sufficiently large N we have

\begin{equation} d_n^{\mathrm{T}}(\{e(kx)\}_{k=-N}^N,L_p) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N) \end{equation} \tag{24}
and
\begin{equation} d_n^{\mathrm{T}}\biggl(\biggl\{e\biggl(\frac{kx}N\biggr)\biggr\}_{k\in \mathbb Z_N}, L_p^N\biggr) \leqslant \log^{-c_1(p)}N \quad\textit{if } n\geqslant N\exp(-\log^{c_2}N). \end{equation} \tag{25}

Consider the spaces of trigonometric polynomials of degree at most m:

\begin{equation*} \mathcal T_m :=\biggl\{\sum_{k=-m}^m c_k e(kx)\biggr\} \quad\text{and}\quad \mathcal T_m^N :=\biggl\{\sum_{k=-m}^m c_k e\biggl(\frac{kx}N\biggr)\biggr\}. \end{equation*} \notag

Lemma 5.3. For any p\in(0,1) there exists \delta=\delta(p)>0 such that for all {m,N\in\mathbb N}, N\geqslant m we have

\begin{equation*} \min\{\|T\|_p\colon T\in\mathcal T_m, \,\widehat{T}(0)=1\} \leqslant C(p)m^{-\delta} \end{equation*} \notag
and
\begin{equation*} \min\{\|T\|_{L_p^N}\colon T\in\mathcal T_m^N,\,\widehat{T}(0)=1\} \leqslant C(p)m^{-\delta}. \end{equation*} \notag

For the proof one can take the Fejér kernel and use the well-known inequality |K_{m-1}(x)|\leqslant \min(m,c/(mx^2)).

Lemma 5.4. Let m,N\in\mathbb N, N \geqslant C m^4.

Moreover, in both cases

\begin{equation*} |\Lambda|\leqslant C N^{1-{1}/(2m)}\log^{1/m}N. \end{equation*} \notag

We prove Theorem 5.3. We consider only the discrete case; the continuous case is quite analogous.

Proof of Theorem 5.3. We fix some number m and use Lemma 5.4 to find a set \Lambda and Lemma 5.3 to find a polynomial T\in\mathcal T_m^N. In order to approximate e(kx/N) by the space \mathcal T(\Lambda) of trigonometric polynomials with spectrum in \Lambda we find h\in\mathbb Z_N^* such that k\pm h,\dots,k\pm mh\in\Lambda. Consider the polynomial
\begin{equation*} t(x)=\sum_{l=-m}^m \widehat T_m(l)e\biggl(\frac{(k+lh)x}{N}\biggr). \end{equation*} \notag
It equals e(kx/N)+s(x), where \operatorname{supp}\widehat{s}\subset\Lambda, so we have to estimate \|t\|_{L_p^N}. Note that t(x)=e(kx/N)T_m(hx) and
\begin{equation*} \|t\|_{L_p^N}=\|T_m(hx)\|_{L_p^N}=\|T_m(x)\|_{L_p^N} \ll m^{-\delta}. \end{equation*} \notag
It remains to take m\asymp \log^{1/2} N and obtain |\Lambda|\leqslant N\exp(-(\log N)^{1/2+o(1)}).

The theorem is proved.

Now we prove Lemma 5.4.

Proof of Lemma 5.4. We start with the discrete case. Let \Lambda\subset\mathbb Z_N be a random set: a point occurs in \Lambda independently of the other points with some probability \tau\in(0,1).

We call a step h\in\mathbb Z_N^* admissible for k\in \mathbb Z_N if k\pm lh\in\Lambda for l=1,\dots,m. We have

\begin{equation*} \mathsf{P}(h \text{ is admissible for }k)=\tau^{2m}. \end{equation*} \notag
We call k ‘bad’ if there are no admissible steps for k. Assume that for each k there are at least S steps h_1,\dots,h_S\in\mathbb Z_N^* such that the progressions k\pm lh_i, l=1,\dots,m, are disjoint (we estimate S slightly below). Then the events that h_i is admissible for k are independent for different i. Hence the probability that k is bad is at most the probability that none of the h_i is admissible, that is, (1-\tau^{2m})^S.

The expected number of bad k is at most N(1-\tau^{2m})^S < N\exp(-S\tau^{2m}) so with probability \geqslant2/3 this number does not exceed 3N\exp(-S\tau^{2m}). Moreover, \mathsf{E}|\Lambda|=N\tau, so with probability \geqslant2/3 we have |\Lambda|\leqslant 3N\tau. Hence there exists \Lambda such that both estimates are true. If \tau is sufficiently small to ensure that

\begin{equation} 3N\exp(-S\tau^{2m}) < 1, \end{equation} \tag{26}
then there are no bad k and \Lambda is the required set.

It remains to estimate S from below. There are |\mathbb Z_N^*|\gg N/\log N possible steps in total. A step h_1 forbids the other steps h that solve the 2m^2 equations lh \equiv \pm l_1h_1\pmod{N}; l,l_1\in\{1,\dots,m\}. Each equation has at most \mathrm{gcd}(N,l)\leqslant m solutions, so we can take

\begin{equation*} S \asymp m^{-3}\frac{N}{\log N}. \end{equation*} \notag
Finally, to satisfy (29) we take
\begin{equation*} \tau=\biggl(\frac{\log (4N)}{S}\biggr)^{1/(2m)} \ll \biggl(\frac{\log^2N}{N}\biggr)^{1/(2m)}. \end{equation*} \notag

The continuous case is analogous. Now, \Lambda is a random subset of \{-2N,\dots,2N\} and we consider k\in\{-N,\dots,N\}. To ensure that the progressions \{k\pm lh, {l=1,\dots,m}\} are disjoint we take prime numbers h in the interval (m,N/m]; hence S \asymp m^{-1}N/\log N.

The lemma is proved.

§ 6. Further work

In the most interesting case of the L_1-norm there is much more to do.

Question 6.1. Is the trigonometric system \{e(kx)\}_{1\leqslant k\leqslant N} rigid in L_1?

It is obvious that d_n(\{\varphi_1,\dots,\varphi_N\},L_1)\geqslant A^{-1} d_n(B_1^N,\ell_\infty^N) for a uniformly bounded (\|\varphi_k\|_\infty\leqslant A) orthonormal system. So good approximation is impossible for n\leqslant c\log N. Can one improve this bound?

Question 6.2. Prove that d_n(\{w_1,\dots,w_N\},L_1)\geqslant c for n=n(N) such that n/\log N\to\infty.

The independence (or unconditionality) is sufficient for L_1-rigidity but this condition is, of course, very restrictive. Can one find a weaker condition or at least prove the rigidity of some concrete system?

Question 6.3. Is the Rademacher chaos \{r_i(t)r_j(t)\}_{1\leqslant i,j\leqslant N} rigid in L_1?

Cf. Statement 3.4. This question is connected with the next one (also see Statement 4.1).

Question 6.4. Present explicit constructions of \ell_1-rigid sets of polynomial (or at least subexponential) size in W\subset \{-1,1\}^N.

In [16] the non-rigidity of DFT matrices was connected with the non-rigidity of circulants. In terms of widths we obtain the following question.

Question 6.5. Let u\in[-1,1]^N, and let T be a cyclic coordinate shift operator. Is it true that

\begin{equation*} d_{N/2}(\{u,Tu,\dots,T^{N-1}u\},\ell_1^N)=o(N)? \end{equation*} \notag

We proceed to the L_0-case.

Question 6.6. Construct a uniformly bounded orthonormal system \varphi_1,\dots,\varphi_N\in D_N(0,1) that is L_0-rigid.

Question 6.7. Is it true that for any finite Abelian group G the set of characters admits a good approximation by very low-dimensional spaces:

\begin{equation*} d_n(\widehat{G},L_0(G))=o(1) \quad\text{for } n=\exp(\log^{1-c}|G|)? \end{equation*} \notag


Bibliography

1. A. Kolmogoroff (Kolmogorov), “Über die beste Annäherung von Funktionen einer gegebenen Funktionenklasse”, Ann. of Math. (2), 37:1 (1936), 107–110  crossref  mathscinet  mathscinet  zmath  zmath
2. G. G. Lorentz, M. von Golitschek and Y. Makovoz, Constructive approximation. Advanced problems, Grundlehren Math. Wiss., 304, Springer-Verlag, Berlin, 1996, xii+649 pp.  mathscinet  zmath
3. A. Pinkus, n-widths in approximation theory, Ergeb. Math. Grenzgeb. (3), 7, Springer-Verlag, Berlin, 1985, x+291 pp.  crossref  mathscinet  zmath
4. Dinh Dũng, V. Temlyakov and T. Ullrich, Hyperbolic cross approximation, Adv. Courses Math. CRM Barcelona, Birkhäuser/Springer, Cham, 2018, xi+218 pp.  crossref  mathscinet  zmath
5. V. M. Tikhomirov, “Approximation theory”, Analysis, v. II, Encyclopaedia Math. Sci., 14, Convex analysis and approximation theory, Springer-Verlag, Berlin, 1990, 93–243  mathnet  crossref  mathscinet  zmath
6. Y. Malykhin, “Matrix and tensor rigidity and L_p-approximation”, J. Complexity, 72 (2022), 101651, 13 pp.  crossref  mathscinet  zmath
7. S. V. Lokam, “Complexity lower bounds using linear algebra”, Found. Trends Theor. Comput. Sci., 4:1–2 (2008), 1–155  crossref  mathscinet  zmath
8. J. Alman and R. Williams, “Probabilistic rank and matrix rigidity”, STOC'17 Proceedings of the 49th annual ACM SIGACT symposium on theory of computing (Montreal, QC 2017), ACM, New York, 2017, 641–652  crossref  mathscinet  zmath
9. S. M. Voronin and N. Temirgaliev, “On some applications of Banach measure”, Izv. Akad. Nauk Kaz. SSR, Ser. Fiz.-Mat., 1984, no. 5, 8–11 (Russian)  mathscinet  zmath
10. V. E. Maĭorov, “Kolmogorov's (n,\delta)-widths of spaces of smooth functions”, Russian Acad. Sci. Sb. Math., 79:2 (1994), 265–279  mathnet  crossref  mathscinet  zmath  adsnasa
11. J. Creutzig, “Relations between classical, average, and probabilistic Kolmogorov widths”, J. Complexity, 18:1 (2002), 287–303  crossref  mathscinet  zmath
12. B. S. Kashin, “Lower bounds for m-term approximations in the metric of the discrete space L_n^0”, Russian Math. Surveys, 76:5 (2021), 933–935  mathnet  crossref  mathscinet  zmath
13. V. F. Gaposhkin, “Lacunary series and independent functions”, Russian Math. Surveys, 21:6 (1966), 1–82  mathnet  crossref  mathscinet  zmath  adsnasa
14. B. S. Kashin, Yu. V. Malykhin and K. S. Ryutin, “Kolmogorov width and approximate rank”, Proc. Steklov Inst. Math., 303 (2018), 140–153  mathnet  crossref  mathscinet  zmath
15. J. Bourgain, “Bounded orthogonal systems and the \Lambda(p)-set problem”, Acta Math., 162:3–4 (1989), 227–245  crossref  mathscinet  zmath
16. Z. Dvir and A. Liu, “Fourier and circulant matrices are not rigid”, 34th computational complexity conference, LIPIcs. Leibniz Int. Proc. Inform., 137, Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2019, 17, 23 pp.  crossref  mathscinet  zmath
17. V. I. Ivanov and V. A. Yudin, “Trigonometric system in L_p, 0<p<1”, Math. Notes, 28:6 (1980), 884–889  mathnet  crossref  mathscinet  zmath
18. T. Oikhberg and M. I. Ostrovskii, “Dependence of Kolmogorov widths on the ambient space”, Zh. Mat. Fiz. Anal. Geom., 9:1 (2013), 25–50  mathnet  mathscinet  zmath
19. E. Novak and H. Woźniakowski, Tractability of multivariate problems, v. I, EMS Tracts Math., 6, Linear information, Eur. Math. Soc., Zürich, 2008, xii+384 pp.  crossref  mathscinet  zmath
20. R. S. Ismagilov, “On n-dimensional diameters of compacts in a Hilbert space”, Funct. Anal. Appl., 2:2 (1968), 125–132  mathnet  crossref  mathscinet  zmath
21. N. N. Vakhania, V. I. Tarieladze and S. A. Chobanyan, Probability distributions on Banach spaces, Math. Appl. (Soviet Ser.), 14, D. Reidel Publishing Co., Dordrecht, 1987, xxvi+482 pp.  crossref  mathscinet  mathscinet  zmath  zmath
22. R. Vershynin, High-dimensional probability. An introduction with applications in data science, Camb. Ser. Stat. Probab. Math., 47, Cambridge Univ. Press, Cambridge, 2018, xiv+284 pp.  crossref  mathscinet  zmath
23. J. D. Vaaler, “A geometric inequality with applications to linear forms”, Pacific J. Math., 83:2 (1979), 543–553  crossref  mathscinet  zmath
24. S. Mendelson and R. Vershynin, “Entropy and the combinatorial dimension”, Invent. Math., 152:1 (2003), 37–55  crossref  mathscinet  zmath
25. N. Alon, P. Frankl and V. R{o}dl, “Geometrical realization of set systems and probabilistic communication complexity”, SFCS'85 Proceedings of the 26th annual symposium on foundations of computer science (Portland, OR 1985), IEEE Computer Soc., Washington, DC, 1985, 277–280  crossref
26. N. Alon, S. Moran and A. Yehudayoff, “Sign rank versus VC dimension”, 29th annual conference on learning theory, Proceedings of Machine Learning Research (PMLR), 49, 2016, 47–80, https://proceedings.mlr.press/v49/alon16.html
27. B. S. Kashin and A. A. Saakyan, Orthogonal series, Transl. Math. Monogr., 75, Amer. Math. Soc., Providence, RI, 1989, xii+451 pp.  crossref  mathscinet  mathscinet  zmath  zmath
28. I. Berkes, “On the uniform theory of lacunary series”, Number theory — Diophantine problems, uniform distribution and applications, Springer, Cham, 2017, 137–167  crossref  mathscinet  zmath
29. S. V. Astashkin, The Rademacher system in function spaces, Birkhäuser/Springer, Cham, 2020, xx+559 pp.  crossref  mathscinet  zmath  zmath
30. C. Bennett and K. Rudnick, On Lorentz–Zygmund spaces, Dissertationes Math. (Rozprawy Mat.), 175, PWN, Warszawa, 1980, 67 pp.  mathscinet  zmath
31. E. D. Gluskin, “Intersections of a cube with an octahedron are poorly approximated by subspaces of small dimension”, Approximation of functions by special classes of operators, Interuniv. Collect. Sci. Works, Min. pros. RSFSR, Vologod. Gos. Ped. Inst., Vologda, 1987, 35–41 (Russian)  mathscinet  zmath
32. D. J. Hajela, “Construction techniques for some thin sets in duals of compact abelian groups”, Ann. Inst. Fourier (Grenoble), 36:3 (1986), 137–166  crossref  mathscinet  zmath
33. B. S. Kashin, “The diameters of octahedra”, Uspekhi Mat. Nauk, 30:4(184) (1975), 251–252 (Russian)  mathnet  mathscinet  zmath
34. Terence Tao and Van H. Vu, Additive combinatorics, Cambridge Stud. Adv. Math., 105, Reprint of the 2006 ed., Cambridge Univ. Press, Cambridge, 2010, xviii+512 pp.  crossref  mathscinet  zmath

Citation: Yu. V. Malykhin, “Widths and rigidity”, Sb. Math., 215:4 (2024), 543–571
Citation in format AMSBIB
\Bibitem{Mal24}
\by Yu.~V.~Malykhin
\paper Widths and rigidity
\jour Sb. Math.
\yr 2024
\vol 215
\issue 4
\pages 543--571
\mathnet{http://mi.mathnet.ru/eng/sm9958}
\crossref{https://doi.org/10.4213/sm9958e}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4782822}
\zmath{https://zbmath.org/?q=an:07945685}
\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2024SbMat.215..543M}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001298689600005}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85202183298}
Linking options:
  • https://www.mathnet.ru/eng/sm9958
  • https://doi.org/10.4213/sm9958e
  • https://www.mathnet.ru/eng/sm/v215/i4/p117
  • Related presentations:
    This publication is cited in the following 2 articles:
    1. V. M. Bukhshtaber, “Poperechniki Kolmogorova, mnogoobraziya Grassmana i razvertka vremennykh ryadov”, Matem. sb., 216:3 (2025), 49–68  mathnet  crossref
    2. Yu. V. Malykhin, K. S. Ryutin, “Poperechniki i zhestkost bezuslovnykh mnozhestv i sluchainykh vektorov”, Izv. RAN. Ser. matem., 89:2 (2025), 45–59  mathnet  crossref
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Математический сборник Sbornik: Mathematics
    Statistics & downloads:
    Abstract page:383
    Russian version PDF:10
    English version PDF:24
    Russian version HTML:37
    English version HTML:172
    References:47
    First page:22
     
      Contact us:
    math-net2025_04@mi-ras.ru
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025