Abstract:
We consider the Kolmogorov widths of finite sets of functions. Any orthonormal system of N functions in L2 is rigid, that is, it cannot be well approximated by linear subspaces of dimension essentially smaller than N. This is not true for weaker metrics: it is known that in every Lp for p<2 the first N Walsh functions can be o(1)-approximated by a linear space of dimension o(N).
We present some sufficient conditions for rigidity. We prove that the independence of functions (in the probabilistic meaning) implies rigidity in L1 and even in L0, the metric that corresponds to convergence in measure. In the case of Lp for 1<p<2 the condition is weaker: any Sp′-system is Lp-rigid.
Also we obtain some positive results, for example, that the first N trigonometric functions can be approximated by very low-dimensional spaces in L0, and by subspaces generated by o(N) harmonics in Lp for p<1.
Bibliography: 34 titles.
This research was performed at Lomonosov Moscow State University and supported by the Russian Science Foundation under grant no. 22-11-00129,
https://rscf.ru/en/project/22-11-00129/.
In this paper we consider the Kolmogorov widths of finite sets of functions. Recall that the Kolmogorov width dn(K,X) is defined as the minimum distance supx∈Kρ(x,Qn)X from the set K to n-dimensional subspaces Qn of the space X. This classical notion goes back to the paper [1] by Kolmogorov (1936). Widths are fundamental approximation characteristics of sets, and the behaviour of the sequence (dn(K,X))∞n=1 for functional classes and other finite-dimensional sets was actively studied by many authors. The books [2], Chs. 13 and 14, and [3] are standard references on the subject; see also a recent survey [4], § 4.3.
We will informally say that a set is rigid if it cannot be well approximated by low-dimensional subspaces. The starting point for us is the well-known result that any orthonormal system of functions φ1,…,φN is rigid in L2(0,1) (of course, this holds true in each Euclidean space):
dn({φ1,…,φN},L2)=√1−nN.
(This equality for widths was first used by Stechkin; see [5] for details.)
The situation changes if we consider weaker metrics.
Theorem A [6]. Let w1,w2,… be the Walsh system in the Paley numeration. Then for any p∈[1,2) there exists δ=δ(p)>0 such that the following inequality holds for sufficiently large N:
dn({w1,…,wN},Lp[0,1])⩽
This theorem appeared as a corollary of methods introduced in complexity theory for the related problem of matrix rigidity. The rigidity function of a matrix A is defined as the Hamming distance from A to the matrices of prescribed rank:
Matrix rigidity was introduced by Valiant in 1977 as a way to lower bounds for linear circuit complexity (for example, the inequality \operatorname{Rig}(A_N,\varepsilon N)\geqslant N^{1+\varepsilon} for some fixed \varepsilon>0 leads to superlinear lower bounds for circuit complexity); see the survey [7] for more details.
It was a surprising result by Alman and Williams [8] that the family of Walsh–Hadamard matrices is not rigid, that is, the rank of these matrices can significantly be reduced by changing a small number of elements. Theorem A is based on their construction.
In this paper we obtain sufficient conditions for systems of functions to be rigid in metrics weaker than L_2.
Average widths and L_1-rigidity
Our lower bounds on widths are based on the approximation properties of random vectors in \mathbb{R}^N. Let \xi_1,\dots,\xi_N be independent random variables such that \mathsf{E}\xi_i=0 and \mathsf{E}|\xi_i| = 1. We prove that for any n-dimensional space Q_n\subset\mathbb{R}^N,
we call this quantity the average Kolmogorov width of a random vector\xi. If fact, average widths were originally considered by Voronin and Temirgaliev in [9]. This notion was further studied by Maiorov [10], Creutzig [11] and others. We discuss the notation d_n^\mathrm{avg} and the properties of average widths in § 2.2. There is a simple equality
which links the rigidity of finite-dimensional vectors and the rigidity of finite function systems. Note that the averaging on the right-hand side is over a finite set of elements of L_1, that is, we minimize \frac1N\sum_{k=1}^N\rho(\xi_k,Q_n) over the subspaces of L_1.
Thus, we can say that independence implies rigidity.
Theorem 1.1. Let \xi_1,\dots,\xi_N be independent random variables such that \mathsf{E}\xi_i=0 and \mathsf{E}|\xi_i| = 1. Then
This also provides rigidity for the standard widths, because d_n \geqslant d_n^\mathrm{avg}. A classical example of an independent system is the Rademacher system.
The same result holds if we replace independence by unconditionality:
Approximation in the space of measurable functions
Consider the space L_0(\mathcal X,\mu) of all measurable functions on some measure space (\mathcal X,\mu). Convergence in measure is equivalent to convergence in certain metrics, say, \displaystyle\int_{\mathcal X}\frac{|f-g|}{1+|f-g|}\,d\mu or \sup\{\varepsilon\colon \mu(|f-g|\geqslant\varepsilon)\geqslant\varepsilon\}. We will use the latter. Of course, this metric is weaker than the usual L_p-metrics. The approximation in L_0 is much poorer studied than in L_p but we believe that this subject contains interesting problems.
In § 3.2 we prove that independence implies rigidity even in the case of the L_0-metric.
Theorem 1.2. For any \varepsilon\in(0,1) there exists \delta>0 such that if \xi_1,\dots,\xi_N are independent random variables satisfying \inf_{c\in\mathbb{R}}\|\xi_i-c\|_{L_0}\geqslant\varepsilon, i=1,\dots,N, then
\begin{equation*}
d_n^{\mathrm{avg}}(\{\xi_1,\dots,\xi_N\},L_0)\geqslant\delta \quad\textit{if } n\leqslant \delta N.
\end{equation*}
\notag
This is also a corollary to a finite-dimensional result. In fact, there is an exponential bound
Let us mention a recent result due to Kashin [12].
Theorem B. There exist absolute constants 0<\gamma_0<1 and c_1,c_2>0 such that for N=1,2,\dots, for any orthonormal basis \Psi=\{\psi_k\}_{k=1}^N in \mathbb{R}^N and any linear subspace Q\subset\mathbb{R}^N, \dim Q\leqslant c_1 N, the following inequality holds
It is known that lacunary systems \varphi(k_jx) behave like independent variables; see [13], for example. The rigidity property is not an exception. We will prove that lacunary systems \{\varphi(k_1x),\dots,\varphi(k_Nx)\} are rigid in L_0 if \min k_{j+1}/k_j is sufficiently large (see Statement 3.3).
Another source of rigid systems is random matrices. It is well known that an N\times N square matrix \mathcal E consisting of random signs is very rigid with high probability: \operatorname{Rig}(\mathcal E,\delta N)\geqslant \delta N^2. We observe that this is true in weaker L_0-metric:
This gives us an example of an L_0-rigid orthonormal system \{\varphi_1,\dots,\varphi_N\} of functions that are piecewise constant on the intervals ((j-1)/N, j/N).
There is another metric in L_0(\mathcal X,\mu), namely, \mu\{x\colon f(x)\ne g(x)\}. We denote it by L_{\mathrm{H}}; the distance \|f-g\|_{L_{\mathrm{H}}} is an analogue of the Hamming distance. (It is called the ‘indicator distance’ in probability theory.) This metric appears in function theory in the context of ‘correction’ theorems. For example, Luzin’s theorem states that C is dense in L_{\mathrm{H}}, and Menshov’s theorem implies that the set of functions with uniformly convergent Fourier series is dense in L_{\mathrm{H}}. Note that matrix rigidity is a particular case of average Kolmogorov widths in the normalized Hamming distance L_{\mathrm{H}}^N. Let A be an M\times N real matrix and W_A=\{A_1,\dots,A_M\}\subset\mathbb{R}^N be the set of rows of A. Then
Some of our results can be stated in matrix terms; in some cases the matrix language is even more convenient. However, in this paper we prefer to stick to the functional-theoretic language; see also [14].
The S_{p'}-property
In the case 1<p<2 we establish a less restrictive sufficient condition for rigidity.
Theorem 1.3. Let \varphi_1,\dots,\varphi_N be an orthonormal system in a space L_2(\mathcal X,\mu), \mu(\mathcal X)=1. Let 1<p<2, and suppose that \{\varphi_k\} has the S_{p'}-property, that is,
Note that Bourgain’s theorem [15] states that any uniformly bounded orthonormal system has a S_{p'}-subsystem of size N^{2/p'}; this subsystem is rigid in L_p. Hence an o(1)-approximation of a uniformly bounded orthonormal system in L_p requires a dimension of at least n\geqslant N^{2/p'}/2. In particular, in Theorem A we must indeed have polynomial dependence between n and N.
The analogue of Theorem 1.1 also holds true for 1<p<2 (we do not prove it here), but it fails for p>2.
Positive approximation results
Let G be a finite Abelian group and \widehat{G} be the dual group of characters \chi\colon G\to\mathbb C^*.
It follows from [16] that characters are not rigid in L_{\mathrm{H}}:
Such a small error of approximation, N^{-1+c\varepsilon}, is natural to see in the context of Valiant’s result on circuit complexity. However, we are interested in approximation in different ‘regimes’. We consider systems generated by specific characters, namely, the Walsh and trigonometric systems. First we show that they admit good approximation by very low-dimensional spaces.
We bound the trigonometric widths d_n^{\mathrm{T}} of the first N trigonometric functions, that is, approximate them by trigonometric subspaces Q_n=\{\sum_{k\in \Lambda}c_ke(kx)\}, {|\Lambda|=n}.
Theorem 1.5. For any p\in(0,1) and for sufficiently large N we have
In § 2 we provide necessary definitions and some needed facts. In §§ 3–5 we prove the results mentioned in § 1, as well as some other results. In § 6 we discuss some remaining questions on the subject.
Acknowledgement
The author thanks Konstantin Sergeevich Ryutin, Sergei Vladimirovich Astashkin and Boris Sergeevich Kashin for fruitful discussions.
§ 2. Preliminaries
2.1. Metrics
Let (\mathcal X,\mu) be a measure space, \mu(\mathcal X)<\infty. As usual, we identify functions (or random variables) that differ on a set of zero measure.
The space L_0(\mathcal X,\mu) consists of all measurable functions. Convergence in measure in L_0 can be metricised by several metrics. Let us describe the Ky–Fan metric that we use. Set
We denote the unit ball in \ell_p^N by B_p^N. Note that the meaning of \|\cdot\|_p is clear from the context: it is either the norm \|\cdot\|_{L_p}, if the argument is a function, or the norm \|\cdot\|_{\ell_p^N} if the argument is a vector. In the Euclidean case we write |\cdot|.
We let L_p^N denote the L_p-spaces for the normalized (probability) measure on \{1,\dots,N\}. Obviously, \|x\|_{L_p^N} = N^{-1/p}\|x\|_{\ell_p^N}, 0<p\leqslant\infty.
The quantity \|x\|_{L_{\mathrm{H}}^N} is the share of nonzero coordinates of a vector, and the corresponding distance is the normalized Hamming distance. Note that \lim_{p\to+0}\|x\|_p^p = \#\{k\colon x_k\ne0\} = N\|x\|_{L_{\mathrm{H}}^N}.
In the case p=0 we do not work with \ell_N^0, but only with L_0^N:
Let X be a linear space equipped with a metric (or a more general distance functional) \rho. Recall that the Kolmogorov width of order n\in\mathbb Z_+ of a set K\subset X is defined by
where \rho(x,Q)_X :=\inf_{y\in Q} \rho(x,y) and the infimum is taken over the linear subspaces of X of dimension at most n. Usually, X is a normed space and \rho(x,y)=\|x-y\|_X; note that this is not required in Kolmogorov’s original paper, and we will indeed use widths in a more general context.
Average widths
Let X be a linear space with metric and \mu be a Borel measure on X. For p\in(0,\infty) we define the p-averaged Kolmogorov width of \mu in X by
If K is a subset of X and it is clear what a ‘random point in K’ means, then the width d_n^{\mathrm{avg}}(K,X)_p is defined as the width of such a random point. For example, for a finite K we have
Let us discuss the new notation d_n^{\mathrm{avg}}(\xi,X)_p. Some authors write d_n^{(\mathrm{a})} instead of d_n^{\mathrm{avg}}. This can be confusing because d_n^{\mathrm{a}} denotes absolute widths (see [18]). Also the use of \mathrm{avg} is standard in average settings in information-based complexity; see [19]. Some authors write d_n^{(\mathrm{a})}(X,\mu), but when we deal with sets and random variables the notation d_n^{\mathrm{avg}}(\xi,X) seems to be more intuitive (the object whose width is considered goes first).
Properties
The most basic inequality that we always have in mind is
Statement 2.1. Let \xi_1,\dots,\xi_N\colon\Omega\to\mathbb{R} be some random variables such that \mathsf{E}|\xi_i|^p<\infty for all i. Then for p\in[1,\infty) and n\in\mathbb Z_+ we have
Note that in the width on the left-hand side the averaging is over the finite set of ‘points’ \xi_1,\dots,\xi_N in the space L_p(\Omega); and on the right-hand side we see the averaged width of the random vector \xi=(\xi_1,\dots,\xi_N) in \mathbb{R}^N. In the case of discrete \Omega equality (3) is a simple consequence of the equality of the row-rank and the column-rank of a matrix. Let us give the full proof for completeness.
Proof. Consider the error of an approximation \xi_k\approx \eta_k:
Note that the random variables \eta_1,\dots,\eta_N lie in a subspace of dimension at most n of L_p(\Omega) if and only if there is a subspace Q\subset\mathbb{R}^N, \dim Q\leqslant n, such that
To prove the reverse inequality we have to construct \eta from an (almost) optimal subspace Q\subset\mathbb{R}^N. Note that for any \varepsilon>0 there is a Borel-measurable (for example, piecewise-constant) map \pi\colon\mathbb{R}^N\to Q, such that
Take \eta := \pi(\xi); this is a random vector because \pi is Borel. Moreover, \eta\in Q almost surely, so \eta_1,\dots,\eta_N span an at most n-dimensional space in L_p(\Omega). It remains to substitute x=\xi into the previous inequality, take the p-expectation and use (4) to finish the proof.
An average width in Hilbert space
We present a formula for the 2-averaged width of a random vector in Hilbert space. This formula is a generalization of the Eckart–Young theorem on matrix approximation in the Frobenius norm. Also, it is a reformulation of Ismagilov’s theorem [20]. However, our terminology is very convenient, so this formulation makes sense. Our proof repeats Ismagilov’s and is presented for completeness.
Let H be a separable Hilbert space (of finite or infinite dimension) and \xi be a random vector in H such that \mathsf{E}|\xi|^2<\infty. Consider the correlation operator of \xi defined by \langle Au,v\rangle = \mathsf{E}\langle\xi,u\rangle\langle\xi,v\rangle. It is known that A is a compact positive selfadjoint operator (see, for example, [21], Ch. 3, § 2). A sequence (\lambda_n) is the sequence of eigenvalues of A if and only if there is an orthonormal basis \{\varphi_n\} (an eigenbasis of A) such that
\begin{equation}
\mathsf{E}\langle\xi,\varphi_k\rangle^2=\lambda_k, \qquad \mathsf{E}\langle\xi,\varphi_k\rangle\langle\xi,\varphi_l\rangle=0 \quad\text{for } k\ne l.
\end{equation}
\tag{5}
We arrange the eigenvalues in decreasing order: \lambda_1\geqslant\lambda_2\geqslant\dots\geqslant0.
Statement 2.2. We have d_n^\mathrm{avg}(\xi,H)_2 = (\sum_{k>n}\lambda_k)^{1/2}.
Proof. Fix a basis (5) and put \xi_k:=\langle\xi,\varphi_k\rangle. We bound the width from below. Consider some approximating space Q_n with orthonormal basis \{\psi_1,\dots,\psi_n\}. Then
We have \mathsf{E}|\xi|^2=\sum_{k\geqslant1}\mathsf{E}\xi_k^2=\sum_{k\geqslant 1}\lambda_k, so it is sufficient to find an upper bound for the projection:
Note that the numbers (\mu_k) belong to [0,1] and their sum equals n. By the monotonicity of the \lambda_n the sum \sum_{k\geqslant1}\lambda_k\mu_k is maximum if \mu_k = 1 for k\leqslant n and \mu_k=0 for k>n. We have obtained a lower bound for the width. Equality is attained at the subspace Q_n = \operatorname{span}\{\varphi_1,\dots,\varphi_n\}.
The statement is proved.
It is often convenient to work with singular values \sigma_k:=\lambda_k^{1/2} instead of eigenvalues. One can rewrite (5) in terms of the Schmidt decomposition:
Indeed, the singular value decomposition of X gives use a basis \{\varphi_k\}_{k=1}^M such that the coordinate vectors (\langle x^1,\varphi_k\rangle,\dots,\langle x^N,\varphi_k\rangle) are orthogonal and have length \sigma_k(X). For \xi being a random choice of x^i we have \mathsf{E}\langle\xi,\varphi_k\rangle^2=N^{-1}\sigma_k^2(X) and \mathsf{E}\langle\xi,\varphi_k\rangle\langle\xi,\varphi_l\rangle=0, hence \sigma_k(\xi) = N^{-1/2}\sigma_k(X).
Finally, an orthonormal system \varphi_1,\dots,\varphi_N in a Euclidean space E corresponds to the identity matrix X (in appropriate coordinates), so we have (see also (1))
However, we need only the case when \Phi is the trigonometric system \{e(kx)\}_{k\in\mathbb Z} in the continuous case or \{e(kx/N)\}_{k\in\mathbb Z_N} in the discrete case. In this situation the width is called the trigonometric width and is denoted by d_n^{\mathrm{T}}.
Usually, we consider real spaces (and the linear dimension over \mathbb R). When complex-valued functions are approximated, one can also consider linear spaces over \mathbb C. The distinction here is not important for us because in all cases where complex-valued functions are approximated we are interested in the dimension n up to its order of magnitude.
2.3. Miscellaneous
The material in this subsection is contained in [22], §§ 2.2, 2.8 and 8.8.
Probability
We use the probabilistic notation \mathsf{P} for the probability, \mathsf{E} for the expectation and \operatorname{Law} for the distribution. We also use the conditional expectation \mathsf{E}(\xi\mid\eta), but only for \eta with a finite number of values.
We make use of standard tail estimates. Let X_i, \mathsf{E} X_i=0, be independent random variables.
Hoeffding’s inequality
Let a_i\leqslant X_i\leqslant b_i almost surely; then
Let us recall the definition of the combinatorial dimension of a class \mathcal F of functions f\colon I\to\mathbb{R}. We say that a set J\subset I is t-shattered (t>0) by \mathcal F, if there is a function h\colon J\to\mathbb{R}, such that for any choice of signs \sigma\colon J\to\{-1,1\} one can find a function f_\sigma\in\mathcal F such that
\begin{equation}
\min_{x\in J}\sigma(x)(f_\sigma(x)-h(x))\geqslant t.
\end{equation}
\tag{6}
Then the combinatorial dimension \operatorname{vc}(\mathcal F,t) is the maximum cardinality of a set t-shattered by \mathcal F.
If \mathcal F is convex and centrally symmetric, then one can assume that h(x)\equiv 0. Indeed, we can replace \{f_\sigma\} by \{\frac12(f_\sigma-f_{-\sigma})\}.
We will use the simple (known) fact that
\begin{equation}
\text{if } \operatorname{vc}(\mathcal F,t)> n, \quad\text{then } d_n(\mathcal F,\ell_\infty(I))\geqslant t.
\end{equation}
\tag{7}
Indeed, let J be a t-shattered set, |J|=n+1, and suppose that there is an approximation by an n-dimensional subspace: \mathcal F\ni f\approx g, g\in Q_n, \|f-g\|_\infty < t. Then there exists a nontrivial linear functional \Lambda(f) := \sum_{x\in J}\lambda(x)f(x) such that \Lambda(g)\equiv 0 for g\in Q_n. Assume that \Lambda(h)\geqslant 0, where h is from (6). Consider a function f such that \operatorname{sign}(\lambda(x))(f(x)-h(x))\geqslant t for all \{x\in J\colon \lambda(x)\ne0\}. Then for the approximating function we have \operatorname{sign}(\lambda(x))(g(x)-h(x))>0, hence \Lambda(g)=\Lambda(g-h)+\Lambda(h)>0. We obtain a contradiction. The case \Lambda(h)\leqslant0 is analogous.
We will also make use of the classical VC-bound for classes (that is, in fact, families of subsets) \mathcal F\colon I\to\{-1,1\}^k:
\begin{equation}
|\mathcal F|>\binom{k}{0}+\binom{k}{1}+\dots+\binom{k}{d} \quad\Longrightarrow\quad \operatorname{vc}(\mathcal F,1) > d.
\end{equation}
\tag{8}
Thus, when the left-hand inequality in (8) holds, taking the projection of the class onto the set J, |J|>d, we obtain the whole cube \{-1,1\}^J.
There is a useful bound on binomial coefficients:
\begin{equation}
\binom{k}{0}+\dots+\binom{k}m \leqslant \biggl(\frac{ek}m\biggr)^m, \qquad 0\leqslant m \leqslant k.
\end{equation}
\tag{9}
Note that the function (ek/x)^x is increasing for x\in(0,k].
We will often consider 1-periodic functions, that is, functions on \mathbb T:=\mathbb R/\mathbb Z. We also use the notation e(\theta):=\exp(2\pi i\theta).
We denote positive constants by c,c_1,c_2,C,\dots. If there is dependence on some parameters, then we specify it explicitly: c_p,c(\varepsilon),\dots .
§ 3. Sufficient conditions for rigidity in weak metrics
3.1. Rigidity in L_1
For definiteness we put \operatorname{sign}(x)\!=\!1 for x\!\geqslant\! 0 and {\operatorname{sign}(x)\!=\!-1} for x<0.
Theorem 3.1. Let \xi=(\xi_1,\dots,\xi_N) be a random vector in \mathbb{R}^N such that \mathsf{E}|\xi_i|\geqslant1 for all i. Let \varepsilon\in(0,1) and n\leqslant N(1-\varepsilon). Then
Before proving the theorem let us note that the distance in \ell_1^N of a vector x\in\mathbb{R}^N to a subspace Q_n\subset\mathbb{R}^N can be written in dual terms as
So central cross-sections of the cube B_\infty^N play an important role here. We prove the following lemma.
Lemma 3.1. Let K be an m-dimensional central section of the cube B_\infty^N, and let m\geqslant \varepsilon N. Then \operatorname{vc}(K,c_1(\varepsilon)) \geqslant c_2(\varepsilon)N.
The proof of the lemma relies on two useful results.
Theorem D [23]. Any m-dimensional cross-section of a volume-one cube has volume at least 1:
Here N_\delta(K,X) is the size of a minimal \delta-net for a set K in the space X.
Proof of Lemma 3.1. Let K=L_m\cap B_\infty^N. From Theorem D we obtain {\operatorname{Vol}_m(K)\!\geqslant\! 2^m}, and we can use the standard volume estimate
Lemma 3.1 produces a set of coordinates \Lambda\subset\{1,\dots,N\}, |\Lambda|\geqslant c(\varepsilon)N, and a number v\geqslant c(\varepsilon) such that for any choice of signs \sigma\colon\Lambda\to\{-1,1\} there is a vector z^\sigma\in K such that
\begin{equation}
\min_{i\in\Lambda} \sigma_i z^\sigma_i \geqslant v.
\end{equation}
\tag{12}
(Recall that, as K is convex and centrally-symmetric, we may assume that z^\sigma oscillates around zero.)
Fix \sigma^*\colon\Lambda\to\{-1,1\}, and let \mathsf{E}^* be the conditional expectation \mathsf{E}^*(\,\cdot\,)=\mathsf{E}(\,\cdot\mid\operatorname{sign}\xi_i =\sigma_i^*,\,i\in\Lambda). Then we have
Here we have used that \|z^{\sigma^*}\|_\infty\leqslant 1. We average over \sigma^* and obtain (13).
The theorem follows from (13), because the first sum there is at least v|\Lambda|\geqslant c_\varepsilon N, and for each term in the second sum we have
Let us present some corollaries of Theorem 3.1 for the case when \mathsf{E}(\xi_i\mid\{\operatorname{sign}\xi_j\}_{j\ne i})\equiv0.
Corollary 3.1. Let \xi_1,\dots,\xi_N be independent random variables such that \mathsf{E}\xi_i=0 and \mathsf{E}|\xi_i|\geqslant 1. Then for any random variables \eta_1,\dots,\eta_n, n\leqslant N(1-\varepsilon), and any coefficients \{a_{i,j}\} we have
The condition \|\xi_i-c\|_{L_0}\geqslant\varepsilon is essential here because it forbids an approximation of the vector \xi by a constant vector. This condition allows us to ‘separate’ slightly the values of the \xi_i.
Lemma 3.2. Let \varepsilon,\tau\in(0,1], and let \zeta be a random variable such that \inf_{c\in\mathbb{R}}\|{\zeta-c}\|_{L_0}\geqslant\varepsilon. Then there exist real numbers a<b, |b-a|\geqslant\frac{\tau\varepsilon}2, such that
Note that \mathsf{P}(\zeta\leqslant q_-)\geqslant \varepsilon/3 and \mathsf{P}(\zeta\geqslant q_+)\geqslant \varepsilon/3. We see that |q_+-q_-|\geqslant\varepsilon, because otherwise \|\zeta-c\|_{L_0}\leqslant\frac23\varepsilon for c=(q_-+q_+)/2. We fix k:=\lceil1/\tau\rceil, divide [q_-,q_+] into k equal subintervals and choose (a,b) to be the one having the least probability \mathsf{P}(a<\zeta<b). Then this probability is at most 1/k\leqslant\tau. Finally, |b-a|\geqslant |q_+-q_-|/k \geqslant \tau\varepsilon/2.
The lemma is proved.
Now we prove the theorem.
Proof of Theorem 3.2. We fix some small \delta>0 (to be chosen later) and estimate \mathsf{P}(\rho(\xi,Q_n)_{L_0^N}<\delta), where \dim Q_n\leqslant n\leqslant \delta N. Consider an almost best L_0^N approximation of (\xi_1,\dots,\xi_N) by a (random) vector (y_1,\dots,y_N) in the subspace Q_n. Let
be the (random) set of badly approximated coordinates. If \|\xi-y\|_{L_0^N} < \delta, then |\Lambda|<\delta N, so it is sufficient to bound the probability of the event \{|\Lambda| \leqslant \delta N\}.
We apply Lemma 3.2 to \tau := 5\delta/\varepsilon and \xi_k and find intervals (a_k,b_k) such that
be the (random) set of ‘intermediate’ coordinates. We have \mathsf{E}|\Gamma|\leqslant \tau N; by Bernstein inequality we have \mathsf{P}(|\Gamma|\leqslant 2\tau N)\geqslant 1-2\exp(-c\tau N) and
Fix some sets \Lambda^\circ,\Gamma^\circ\subset\{1,\dots,N\} of size |\Lambda^\circ|\leqslant\delta N and |\Gamma^\circ|\leqslant 2\tau N and consider the event \{\Lambda=\Lambda^\circ,\,\Gamma=\Gamma^\circ\}. Let N'=N-|\Lambda^\circ\cup\Gamma^\circ|; note that N'\geqslant N/2. For x\in\mathbb{R}^N let x'\in\mathbb{R}^{N'} be the vector x without coordinates in \Lambda^\circ\cup\Gamma^\circ. Then \|\xi'-y'\|_{\ell_\infty^{N'}} \leqslant \delta, and we have y'\in Q_n'; therefore,
Consider the random vector \pi with coordinates \pi_k := -1 for \xi_k\leqslant a_k, \pi_k := 0 for a_k<\xi_k<b_k and \pi_k := 1 for \xi_k\geqslant b_k. Note that \pi'\in\{-1,1\}^{N'} by construction. We argue that the set S := \{\pi'(\omega)\colon \Lambda(\omega)=\Lambda^\circ,\,\Gamma(\omega)=\Gamma^\circ\} of all possible values of \pi' contains at most (eN'/n)^{n} elements. Indeed, otherwise (8) and (9) imply that S contains a cube of dimension n+1 and
so (7) implies that d_n(K,\ell_\infty^{N'})\geqslant 5\delta/4, which contradicts (16). For any s\in S we have \mathsf{P}(\pi'=s)\leqslant(1-\varepsilon/3)^{N'}, so
Indeed, there are exponentially many subspaces, so we can choose smaller {\delta=\delta(A,\varepsilon)} to provide an exponential bound on the probability.
Proof. Let d_n^\mathrm{avg}(\{\xi_1,\dots,\xi_N\},L_0) \leqslant \varepsilon\leqslant 1. Then for some \eta_1,\dots,\eta_N in an n-dimensional space we have \sum\|\xi_k-\eta_k\|_{L_0}\leqslant N\varepsilon. Consider the random sets \Lambda_t := \{k\colon |{\xi_k-\eta_k}|\geqslant t\}; note that \|\xi-\eta\|\geqslant t if and only if |\Lambda_t|\geqslant tN.
Let A be a signum matrix, that is, a matrix with entries \pm1. We define the signum rank \operatorname{rank}_\pm A to be the minimum rank of matrices B such that with \operatorname{sign} B_{i,j}\equiv A_{i,j} (B_{i,j}\ne0). There is a bound on the number of signum N\times N matrices with signum rank at most r:
(see [25], and also [26] and [6]). For r=cN we obtain the bound 2^{bN^2}, where b=b(c)\to0 as c\to0. Hence, if c is small, then there are very few signum matrices such that \operatorname{rank}_\pm A\leqslant cN. Moreover, almost all signum matrices are quite distant from such matrices in the Hamming metric. Hence almost all signum matrices are also distant from the low-rank matrices with \operatorname{rank} B\leqslant cN in the L_0-distance. Now we state a corollary in terms of random signum matrices (with equiprobable signs).
Statement 3.1. Let \mathbf{\mathcal E} be a random matrix with independent \pm1 entries. Then
Statement 3.2. Let \mathbf{\varphi}_1,\dots,\mathbf{\varphi}_N be a random system in D_N(0,1) such that \varphi_{k,j} :=\varphi_k|_{((j-1)/N,j/N)}, where the \varepsilon_{k,j} are independent random signs. Then
Note that, although we use independent random variables in the definition of \varphi_k, the system \{\varphi_k\}_1^N is by no means independent.
Proof of Statement 3.2. Let us prove that if \{\varphi_k\} is not rigid in L_0(0,1), then the matrix \Phi := (\varphi_{k,j}) can be well approximated in L_0^{N\times N}; this will contradict Statement 3.1.
Consider an approximation \varphi_k\approx g_k in L_0 by functions in some n-dimensional subspace. There is a technical subtlety: we cannot claim that g_k\in D_N (one cannot average over intervals because such an operator is not defined in L_0). Instead, we use Lemma 3.3: it shows that the rigidity of the system \xi_k := \varphi_k is equivalent to the rigidity of the vector (\xi_1,\dots,\xi_N), hence it is determined by the distribution of this vector. So we can assume that the measure on the space (0,1), where everything is defined, consists of ‘atoms’ ((j-1)/N,j/N).
Finally, if \varphi_k\approx g_k, g_k\in D_N, and the average of \|\varphi_k - g_k\|_{L_0} is less than \delta, then we have \|\varphi_k-g_k\|_{L_0}\leqslant \gamma := \delta^{1/2} for all but at most \gamma N indices. Therefore, \|\Phi-G\|_{L_0^{N\times N}} \leqslant 2\gamma.
The statement is proved.
Corollary 3.4. There exists an orthonormal system \varphi_1,\dots,\varphi_N\in D_N(0,1) that is rigid in L_0:
Proof. We construct a system \varphi_1,\dots,\varphi_N\!\in\! D_{2N}(0,2) instead. We define \varphi_1,\dots,\varphi_N on (0,1) using random values \varphi_i|_{((j-1)/N,j/N)}=\pm\delta for some \delta to be specified below. This system will be L_0-rigid with high probability.
Then we extend this system to (0,2) to make it orthonormal. A necessary and sufficient condition of the existence of such an extension is known:
(see, for instance, [27], Ch. 8). If \delta is sufficiently small, then this condition holds true with high probability, because the spectral norm of a random \pm1 matrix is O(N^{1/2}); see, for example, [22], Ch. 4.
Moreover, we can put the functions \varphi_k|_{(1,2)} in an arbitrary N-dimensional space (preserving the integrals \displaystyle\int_1^2\varphi_k\varphi_l), for example, in D_N(1,2).
The corollary is proved.
3.4. Lacunary systems
Let \lambda>1. We say that a sequence of positive integers k_1,\dots,k_N is \lambda-lacunary if k_{j+1}/k_j\geqslant\lambda for j=1,\dots,N-1.
Statement 3.3. Let \varphi be a Riemann-integrable Borel function on \mathbb{T} such that \mu\{x\in\mathbb{T}\colon \varphi(x)\ne a\}>0 for each a\in\mathbb{R}. Then the following hold.
1. There exists \lambda=\lambda(\varphi)>1 such that for any \lambda-lacunary sequence k_1,\dots,k_N we have
\begin{equation*}
d_n^{\mathrm{avg}}\bigl(\{\varphi(k_1x),\dots,\varphi(k_Nx)\},L_0(\mathbb{T})\bigr)\geqslant c(\varphi) \quad\textit{if } n\leqslant c(\varphi) N.
\end{equation*}
\notag
2. For any \varepsilon>0 there exists \lambda=\lambda(\varphi,\varepsilon)>1 such that for any \lambda-lacunary sequence k_1,\dots,k_N we have
Theorem F (see [28], Theorem 4.3). Let (k_j) be an increasing sequence of positive integers. Then there exists a sequence (g_j) of measurable functions on (0,1) which are independent and uniformly distributed over (0,1) and such that
Proof of Statement 3.3. Let us start with the L_1-case. First of all, we can shift and renorm \varphi to obtain \displaystyle\int_\mathbb{T}\varphi(x)\,dx=0 and \displaystyle\int_\mathbb{T}|\varphi(x)|\,dx=1 (this does not affect rigidity). Given a \lambda-lacunary sequence k_1,\dots,k_N for some \lambda>1, we apply Theorem F to obtain functions g_1,\dots,g_N satisfying \|\{k_j x\}-g_j(x)\|_{L_0} \leqslant 2/\lambda.
Put f_j(x):=\varphi(g_j(x)). Then the functions f_j are independent, \displaystyle\int f_j=0 and \displaystyle\int|f_j|=1. Corollary 3.1 implies the rigidity of \{f_j\}. To prove the rigidity of \{\varphi(k_jx)\} we have to bound \|\varphi(k_jx)-f_j\|_1:
where A=\{x\colon |\{k_j x\}-g_j(x)|>2/\lambda\}. The first integral is at most 2\|\varphi\|_\infty\mu(A) \leqslant 4\|\varphi\|_\infty/\lambda. The second integral does not exceed
where \tau is the averaged modulus of continuity. It is known that {\lim_{h\to 0}\tau(\varphi,h)=0} for Riemann-integrable functions. Hence, for sufficiently large \lambda the value of \|\varphi(k_jx)-f_j(x)\|_1 is sufficiently small. This proves the L_1-case.
In this subsection we outline another method for the proof of rigidity, which is based on the notion of complementability.
Let f_1,\dots,f_N be an orthonormal system in L_2(0,1). Suppose that we aim to prove the rigidity of \{f_j\} in a larger space X\supset L_2. If there exists a bounded linear operator \pi\colon X\to L_2 that preserves the system: \pi f_j=f_j, j=1,\dots,N, then we obtain rigidity in X:
Proof. Let X be a symmetric space. It is known that for the Rademacher chaos property (1) is equivalent to the embedding X\supset H (see [29], Theorem 6.4); where the space H is the closure of L_\infty in the Orlicz space L_{\varphi_1}, \varphi_1(t)=e^t-1. A criterion for complementability is the double embedding H\subset X\subset H' (see [29], Theorem 6.7). Thus, for X=H' both conditions are satisfied. The space H' coincides with the Lorentz space \Lambda(\psi_1) of functions satisfying \displaystyle\int_0^1 f^*(t)\log(2/t)\,dt<\infty (see [29], §§ 2.2 and 2.4). Further, this Lorentz space coincides with the space L\log L of functions satisfying \displaystyle\int_0^1 |f|\log(2+|f|)\,dt<\infty (see, for example, [30], Theorem D).
The corollary is proved.
§ 4. Rigidity in L_p, 1<p<2
4.1. The S_{p'}-property
We start with a finite-dimensional result. Its proof follows the paper [31] by Gluskin.
Theorem 4.1. Let 1<p<2 and N\in\mathbb N. Suppose that \xi=(\xi_1,\dots,\xi_N) is an isotropic random vector in \mathbb{R}^N: \mathsf{E}\xi_i^2=1 and \mathsf{E}\xi_i\xi_j=0, i\ne j, that satisfies two conditions:
Remark 4.1. Note that if condition (18) is satisfied, then (17) also holds for A:=B and \gamma:=p'-2; see the proof of Theorem 4.2. However, is some cases we need dependence of the form (\cdots)B^{-1} of the lower bound on the parameter B.
Let us formulate a corollary in function-theoretic terms.
Theorem 4.2. Let \varphi_1,\dots,\varphi_N be an orthonormal system in a space L_2(\mathcal X,\mu), \mu(\mathcal X)=1. Let 1<p<2, and suppose that \{\varphi_k\} has the S_{p'}-property, that is, for some B\geqslant1
Proof. Indeed, let x be a random point in \mathcal X, and consider the random vector \xi=(\varphi_1(x),\dots,\varphi_N(x)). Condition (18) in Theorem 4.1 is exactly the S_{p'}-property. Let us check that condition (17) is fulfilled for A:=B and \gamma=p'-2. Indeed,
Hence d_n^{\mathrm{avg}}(\xi,\ell_p^N)_p is bounded below. It remains to use Statement 2.1.
The theorem is proved.
Corollary 4.1. Let \varphi_1,\dots,\varphi_N be an orthonormal system in L_2(\mathcal X,\mu), \mu(\mathcal X)=1, that is uniformly bounded: \|\varphi_k\|_\infty\leqslant A. Then for 1<p<2 we have
Proof. Indeed, by a theorem of Bourgain [15] one can find a subsystem of size \lfloor N^{2/p'}\rfloor of \{\varphi_k\} with the S_{p'}-property and apply Corollary 3.5 to this subsystem.
The corollary is proved.
4.2. Additional remarks
Explicit rigid sets
Previous considerations allow us to present explicit constructions of \ell_p-rigid subsets of \{-1,1\}^N of polynomial size.
Statement 4.1. For any p\in(1,2) and any sufficiently large N one can explicitly construct a set V\subset\{-1,1\}^N of size |V|\leqslant N^{C(p)} such that d_{N/2}(V,\ell_p^N)\geqslant c(p) N^{1/p}.
Proof. It is sufficient to construct N characters \chi_1,\dots,\chi_N\in \widehat{\mathbb Z_2^k} (with appropriate k) that form an S_{p'}-system. Then we apply Theorem 4.2: d_{N/2}^\mathrm{avg}(\{\chi_1,\dots,\chi_N\},L_p)_p\geqslant c>0. Hence once can take V:=\{(\chi_1(x),\dots,\chi_N(x))\colon x\in \mathbb Z_2^k\}.
In [32] an S_{2m}-system of 2^n characters in \mathbb Z_2^{mn} was constructed, so it remains to take 2^n\approx N, m=\lceil p'/2\rceil and k=mn.
The statement is proved.
Independence for p>2
An analogue of Theorem 3.1 with normalization in L_p also holds for 1<p\leqslant2. We can easily deduce from Theorem 4.1 that an independent system of functions is rigid (with an additional log factor). However, this result will be the subject of another paper. The analogue of Theorem on the L_1-rigidity of independent functions is not true for p>2.
Statement 4.2. For any p\in(2,\infty) there exists \delta>0 such that for all sufficiently large N there exists a sequence \xi_1,\dots,\xi_N of independent identically distributed random variables such that \mathsf{E}\xi_i=0, \mathsf{E}|\xi_i|^p=1 and
Proof. We use the approximation of the octahedron B_1^N in \ell_\infty^N (see [33]). Let n=N^{1-\beta}, where \beta is to be chosen later, and let Q_n^* be an extremal n-dimensional \ell_\infty-subspace:
where K is defined by the condition \mathsf{E}|\xi_1|^p=1; so that \varepsilon K^p=1. Consider independent copies \xi_2,\dots,\xi_N of \xi_1 and the vector \xi=(\xi_1,\dots,\xi_N). We construct an approximation vector \eta that lies in Q_n^* (so that \eta_1,\dots,\eta_N lie in an n-dimensional subspace of L^p).
Consider three events: \mathcal{A}_0=\{\xi=0\}, \mathcal{A}_1: exactly one coordinate of \xi is nonzero, and \mathcal{A}_{2}: at least two coordinates of \xi are nonzero. Set \eta(\omega):=0 for \omega\in \mathcal{A}_0\cup \mathcal{A}_{2}, and let \eta(\omega) be the best approximation to \xi(\omega) from Q_n^* for \omega\in \mathcal{A}_1. In the latter case we have \|\xi-\eta\|_\infty \leqslant K\rho(B_1^N,Q_n^*)_\infty.
By symmetry it is sufficient to estimate \mathsf{E}|\xi_1-\eta_1|^p:
if \beta is sufficiently small. The second term is bounded by N^2\varepsilon^2K^p = N^2\varepsilon, and we can make it small by letting \varepsilon tend to zero.
The statement is proved.
§ 5. Approximation of Walsh and trigonometric functions
Recall some notation. Given a finite Abelian group G, we denote the group of characters \chi\colon G\to\mathbb{C} by \widehat{G}. The probability measure \mu_G on G (\mu_G(A)=|A|/|G|) allows us to define the metrics L_{\mathrm{H}} and L_0 for functions on G.
We are interested in two particular cases, the group \mathbb Z_2^k and the cyclic group \mathbb Z_N.
In the first case characters are real and can be written as W_x\colon y\mapsto (-1)^{\langle x,y\rangle}, x,y\in\mathbb Z_2^k. We can identify these characters with functions in the well-known orthonormal Walsh system (in the Paley numeration) w_0\equiv1,w_1,w_2,\dots . Indeed, for n<2^k we have w_n\bigl(\sum_{j\geqslant 1}y_j2^{-j}\bigr)=W_x(y), where x=(x_1,\dots,x_k)\in\{0,1\}^k, y=(y_1,\dots,y_k)\in\{0,1\}^k and n=\sum_{j=1}^kx_j2^{j-1}.
The second case leads, of course, to the discrete trigonometric system y\mapsto e(xy/N), where e(\theta):=\exp(2\pi i\theta).
5.1. Kolmogorov widths
Recall the following well-known statement (see, for instance, [34], § 2.1).
Lemma 5.1. Let G be a finite Abelian group, and let A,B\subset G and |A|+|B|>|G|. Then A+B=\{a+b\colon a\in A,\,b\in B\} = G.
We use an immediate consequence of this lemma.
Lemma 5.2. Let A,B\subset\widehat{G} and |A|+|B|>|G|. Then for any m,n\in\mathbb N
Our goal is to approximate \{W_x\} in L_{\mathrm{H}}. Let \mathcal X be the set of x such that \bigl|\sum_{i=1}^k x_i- k/2\bigr|\leqslant\sqrt{k}. Hoeffding’s inequality shows that |X|>2^{k-1}, so it is enough to approximate W_x for x\in\mathcal X (and then use Lemma 5.2).
We consider the polynomial q(t) that interpolates (-1)^t for t\in\mathbb Z\cap[k/4-2\lambda\sqrt{k}, k/4+2\lambda\sqrt{k}]. Its degree d is at most 4\lambda\sqrt{k}; note that 4\lambda\sqrt{k}\leqslant k. Then W_x(y)=q(\langle x,y\rangle) for all y\in Y_I, hence
Let us bound the linear dimension of the family of functions \{q(\langle x,\,\cdot\,\rangle)\}_x (it is sufficient to consider x\in\mathcal X but it is more convenient to work with all x\in\{0,1\}^k). Their dimension is equal to the rank of the matrix (q(\langle x,y\rangle))_{x,y\in\{0,1\}^k}, where x indexes rows and y indexes columns. The rank is bounded by the number of the monomials in the polynomial q(x_1y_1+\dots+x_ky_k), because each monomial generates rank-one matrix. Every monomial is a product of several x_iy_i, so we can bound the total number of them:
Proof. We have to approximate characters y\mapsto e(xy/2^k). We write x=\sum_{i=0}^{k-1}x_i2^i and y=\sum_{j=0}^{k-1}y_j2^j to identify x,y\in\mathbb Z_{2^k} with the binary vectors (x_0,\dots,x_{k-1}) and (y_0,\dots,y_{k-1}). Using the 1-periodicity of e(\,\cdot\,) we obtain
Hence it is sufficient to approximate the functions \psi_x.
As in the previous proof, first we deal with the set \mathcal X:=\bigl\{x\colon \bigl|\sum_{i=0}^{k-1}x_i-k/2\bigr|\leqslant\sqrt{k}\,\bigr\}. Fix x\in\mathcal X.
We use interpolation again. There exists a polynomial q_s of degree d\leqslant 4\lambda\sqrt{k}, that interpolates the function t\mapsto e(2^{-s}t) at the points \mathbb Z\cap [k/4-2\lambda\sqrt{k},k/4+2\lambda\sqrt{k}]. Then the function e\bigl(2^{-s}\sum_{i+j=k-s}x_iy_j\bigr) is approximated by q_s(\langle x,y\rangle) and the error in the Hamming metric is at most 2\exp(-2\lambda^2); moreover, \dim\{q_s(\langle x,\,\cdot\,\rangle)\}_x \leqslant (ek/d)^d \leqslant (\sqrt{k}/\lambda)^{4\lambda\sqrt{k}}.
The function that we wish to approximate is a product:
We approximate it by the product of the corresponding polynomials. The error in the Hamming metric is at most 2s_0\exp(-2\lambda^2). The rank of the product is at most the product of the ranks, that is, (\sqrt{k}/\lambda)^{4s_0\lambda\sqrt{k}}.
We use Lemma 5.2 to approximate \psi_x in L_{\mathrm{H}} for all x. Finally, we use (22) to obtain an L_0-approximation of the characters with accuracy 2\pi k2^{-\mkern-1mu s_0}\mkern-1.5mu + 4s_0\mkern-1mu\exp(\mkern-2mu-2\lambda^2\mkern-1.5mu) by using dimension \leqslant (k/\lambda^2)^{4s_0\lambda\sqrt{k}}. It remains to take s_0\asymp\lambda^2.
To approximate continuous function, first we discretize them by approximating them by piecewise constant functions defined on sufficiently finite partitionings into 2^k subintervals, and then we approximate the resulting matrix (of the discrete Fourier transform) using the above result.
5.2. Trigonometric widths in L_p, p<1
Recall that the trigonometric widthsd_n^{\mathrm{T}} correspond to approximation by subspaces spanned by functions in the trigonometric system.
We consider two cases in parallel: the continuous case with harmonics e(kx) on \mathbb{T} and the discrete case with harmonics e(kx/N) on \mathbb Z_N. We use the shorthand L_p^N for the space L_p(\mathbb Z_N) with the usual (quasi)norm
\begin{equation*}
|\Lambda|\leqslant C N^{1-{1}/(2m)}\log^{1/m}N.
\end{equation*}
\notag
We prove Theorem 5.3. We consider only the discrete case; the continuous case is quite analogous.
Proof of Theorem 5.3. We fix some number m and use Lemma 5.4 to find a set \Lambda and Lemma 5.3 to find a polynomial T\in\mathcal T_m^N. In order to approximate e(kx/N) by the space \mathcal T(\Lambda) of trigonometric polynomials with spectrum in \Lambda we find h\in\mathbb Z_N^* such that k\pm h,\dots,k\pm mh\in\Lambda. Consider the polynomial
Proof of Lemma 5.4. We start with the discrete case. Let \Lambda\subset\mathbb Z_N be a random set: a point occurs in \Lambda independently of the other points with some probability \tau\in(0,1).
We call a step h\in\mathbb Z_N^* admissible for k\in \mathbb Z_N if k\pm lh\in\Lambda for l=1,\dots,m. We have
\begin{equation*}
\mathsf{P}(h \text{ is admissible for }k)=\tau^{2m}.
\end{equation*}
\notag
We call k ‘bad’ if there are no admissible steps for k. Assume that for each k there are at least S steps h_1,\dots,h_S\in\mathbb Z_N^* such that the progressions k\pm lh_i, l=1,\dots,m, are disjoint (we estimate S slightly below). Then the events that h_i is admissible for k are independent for different i. Hence the probability that k is bad is at most the probability that none of the h_i is admissible, that is, (1-\tau^{2m})^S.
The expected number of bad k is at most N(1-\tau^{2m})^S < N\exp(-S\tau^{2m}) so with probability \geqslant2/3 this number does not exceed 3N\exp(-S\tau^{2m}). Moreover, \mathsf{E}|\Lambda|=N\tau, so with probability \geqslant2/3 we have |\Lambda|\leqslant 3N\tau. Hence there exists \Lambda such that both estimates are true. If \tau is sufficiently small to ensure that
then there are no bad k and \Lambda is the required set.
It remains to estimate S from below. There are |\mathbb Z_N^*|\gg N/\log N possible steps in total. A step h_1 forbids the other steps h that solve the 2m^2 equations lh \equiv \pm l_1h_1\pmod{N}; l,l_1\in\{1,\dots,m\}. Each equation has at most \mathrm{gcd}(N,l)\leqslant m solutions, so we can take
\begin{equation*}
S \asymp m^{-3}\frac{N}{\log N}.
\end{equation*}
\notag
The continuous case is analogous. Now, \Lambda is a random subset of \{-2N,\dots,2N\} and we consider k\in\{-N,\dots,N\}. To ensure that the progressions \{k\pm lh, {l=1,\dots,m}\} are disjoint we take prime numbers h in the interval (m,N/m]; hence S \asymp m^{-1}N/\log N.
The lemma is proved.
§ 6. Further work
In the most interesting case of the L_1-norm there is much more to do.
Question 6.1. Is the trigonometric system \{e(kx)\}_{1\leqslant k\leqslant N} rigid in L_1?
It is obvious that d_n(\{\varphi_1,\dots,\varphi_N\},L_1)\geqslant A^{-1} d_n(B_1^N,\ell_\infty^N) for a uniformly bounded (\|\varphi_k\|_\infty\leqslant A) orthonormal system. So good approximation is impossible for n\leqslant c\log N. Can one improve this bound?
Question 6.2. Prove that d_n(\{w_1,\dots,w_N\},L_1)\geqslant c for n=n(N) such that n/\log N\to\infty.
The independence (or unconditionality) is sufficient for L_1-rigidity but this condition is, of course, very restrictive. Can one find a weaker condition or at least prove the rigidity of some concrete system?
Question 6.3. Is the Rademacher chaos \{r_i(t)r_j(t)\}_{1\leqslant i,j\leqslant N} rigid in L_1?
Cf. Statement 3.4. This question is connected with the next one (also see Statement 4.1).
Question 6.4. Present explicit constructions of \ell_1-rigid sets of polynomial (or at least subexponential) size in W\subset \{-1,1\}^N.
In [16] the non-rigidity of DFT matrices was connected with the non-rigidity of circulants. In terms of widths we obtain the following question.
Question 6.5. Let u\in[-1,1]^N, and let T be a cyclic coordinate shift operator. Is it true that
A. Kolmogoroff (Kolmogorov), “Über die beste Annäherung von Funktionen einer gegebenen Funktionenklasse”, Ann. of Math. (2), 37:1 (1936), 107–110
2.
G. G. Lorentz, M. von Golitschek and Y. Makovoz, Constructive approximation. Advanced problems, Grundlehren Math. Wiss., 304, Springer-Verlag, Berlin, 1996, xii+649 pp.
3.
A. Pinkus, n-widths in approximation theory, Ergeb. Math. Grenzgeb. (3), 7, Springer-Verlag, Berlin, 1985, x+291 pp.
4.
Dinh Dũng, V. Temlyakov and T. Ullrich, Hyperbolic cross approximation, Adv. Courses Math. CRM Barcelona, Birkhäuser/Springer, Cham, 2018, xi+218 pp.
5.
V. M. Tikhomirov, “Approximation theory”, Analysis, v. II, Encyclopaedia Math. Sci., 14, Convex analysis and approximation theory, Springer-Verlag, Berlin, 1990, 93–243
6.
Y. Malykhin, “Matrix and tensor rigidity and L_p-approximation”, J. Complexity, 72 (2022), 101651, 13 pp.
7.
S. V. Lokam, “Complexity lower bounds using linear algebra”, Found. Trends Theor. Comput. Sci., 4:1–2 (2008), 1–155
8.
J. Alman and R. Williams, “Probabilistic rank and matrix rigidity”, STOC'17 Proceedings of the 49th annual ACM SIGACT symposium on theory of computing (Montreal, QC 2017), ACM, New York, 2017, 641–652
9.
S. M. Voronin and N. Temirgaliev, “On some applications of Banach measure”, Izv. Akad. Nauk Kaz. SSR, Ser. Fiz.-Mat., 1984, no. 5, 8–11 (Russian)
10.
V. E. Maĭorov, “Kolmogorov's (n,\delta)-widths of spaces of smooth functions”, Russian Acad. Sci. Sb. Math., 79:2 (1994), 265–279
11.
J. Creutzig, “Relations between classical, average, and probabilistic Kolmogorov widths”, J. Complexity, 18:1 (2002), 287–303
12.
B. S. Kashin, “Lower bounds for m-term approximations in the metric of the discrete space L_n^0”, Russian Math. Surveys, 76:5 (2021), 933–935
13.
V. F. Gaposhkin, “Lacunary series and independent functions”, Russian Math. Surveys, 21:6 (1966), 1–82
14.
B. S. Kashin, Yu. V. Malykhin and K. S. Ryutin, “Kolmogorov width and approximate rank”, Proc. Steklov Inst. Math., 303 (2018), 140–153
15.
J. Bourgain, “Bounded orthogonal systems and the \Lambda(p)-set problem”, Acta Math., 162:3–4 (1989), 227–245
16.
Z. Dvir and A. Liu, “Fourier and circulant matrices are not rigid”, 34th computational complexity conference, LIPIcs. Leibniz Int. Proc. Inform., 137, Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2019, 17, 23 pp.
17.
V. I. Ivanov and V. A. Yudin, “Trigonometric system in L_p, 0<p<1”, Math. Notes, 28:6 (1980), 884–889
18.
T. Oikhberg and M. I. Ostrovskii, “Dependence of Kolmogorov widths on the ambient space”, Zh. Mat. Fiz. Anal. Geom., 9:1 (2013), 25–50
19.
E. Novak and H. Woźniakowski, Tractability of multivariate problems, v. I, EMS Tracts Math., 6, Linear information, Eur. Math. Soc., Zürich, 2008, xii+384 pp.
20.
R. S. Ismagilov, “On n-dimensional diameters of compacts in a Hilbert space”, Funct. Anal. Appl., 2:2 (1968), 125–132
21.
N. N. Vakhania, V. I. Tarieladze and S. A. Chobanyan, Probability distributions on Banach spaces, Math. Appl. (Soviet Ser.), 14, D. Reidel Publishing Co., Dordrecht, 1987, xxvi+482 pp.
22.
R. Vershynin, High-dimensional probability. An introduction with applications in data science, Camb. Ser. Stat. Probab. Math., 47, Cambridge Univ. Press, Cambridge, 2018, xiv+284 pp.
23.
J. D. Vaaler, “A geometric inequality with applications to linear forms”, Pacific J. Math., 83:2 (1979), 543–553
24.
S. Mendelson and R. Vershynin, “Entropy and the combinatorial dimension”, Invent. Math., 152:1 (2003), 37–55
25.
N. Alon, P. Frankl and V. R{o}dl, “Geometrical realization of set systems and probabilistic communication complexity”, SFCS'85 Proceedings of the 26th annual symposium on foundations of computer science (Portland, OR 1985), IEEE Computer Soc., Washington, DC, 1985, 277–280
26.
N. Alon, S. Moran and A. Yehudayoff, “Sign rank versus VC dimension”, 29th annual conference on learning theory, Proceedings of Machine Learning Research (PMLR), 49, 2016, 47–80,https://proceedings.mlr.press/v49/alon16.html
27.
B. S. Kashin and A. A. Saakyan, Orthogonal series, Transl. Math. Monogr., 75, Amer. Math. Soc., Providence, RI, 1989, xii+451 pp.
28.
I. Berkes, “On the uniform theory of lacunary series”, Number theory — Diophantine problems, uniform distribution and applications, Springer, Cham, 2017, 137–167
29.
S. V. Astashkin, The Rademacher system in function spaces, Birkhäuser/Springer, Cham, 2020, xx+559 pp.
30.
C. Bennett and K. Rudnick, On Lorentz–Zygmund spaces, Dissertationes Math. (Rozprawy Mat.), 175, PWN, Warszawa, 1980, 67 pp.
31.
E. D. Gluskin, “Intersections of a cube with an octahedron are poorly approximated by subspaces of small dimension”, Approximation of functions by special classes of operators, Interuniv. Collect. Sci. Works, Min. pros. RSFSR, Vologod. Gos. Ped. Inst., Vologda, 1987, 35–41 (Russian)
32.
D. J. Hajela, “Construction techniques for some thin sets in duals of compact abelian groups”, Ann. Inst. Fourier (Grenoble), 36:3 (1986), 137–166
33.
B. S. Kashin, “The diameters of octahedra”, Uspekhi Mat. Nauk, 30:4(184) (1975), 251–252 (Russian)
34.
Terence Tao and Van H. Vu, Additive combinatorics, Cambridge Stud. Adv. Math., 105, Reprint of the 2006 ed., Cambridge Univ. Press, Cambridge, 2010, xviii+512 pp.
Citation:
Yu. V. Malykhin, “Widths and rigidity”, Sb. Math., 215:4 (2024), 543–571