Abstract:
The paper presents elements of the theory of local extremum in the problem of optimal control with free right end and, in general, uncertain initial position of trajectories on the basis of exact formulae for the increment (variations of infinite order) of the objective functional. Necessary conditions for optimality of ‘feedback’ type are obtained: their formulations involve auxiliary feedback controls which generate program descent controls (in the minimum problem). The conditions proposed in this work provide an alternative to the classical Pontryagin principle (and even improve it in some special cases) and open the way to constructing indirect methods for local search without procedures for adjustment of the parameters of ‘descent depth’.
Bibliography: 26 titles.
Keywords:
optimal control, exact formulae for the increment of the objective functional, feedback necessary conditions of optimality, Pontryagin's maximum principle, continuity equation.
This work elaborates the subject of [1]–[3], which is asymmetric to Krotov’s sufficient optimality conditions (see [4]); it is devoted to so-called feedback necessary optimality conditions for program controls and to the corresponding methods of descent in the classical problem of dynamic optimization
(P)I[u]≐ℓ(x[u](1))→min,
ddtx(t)≐˙x(t)=f(x(t),u(t)),x(0)=y∈Rn,
u∈U≐L∞(I;U),U⊂Rm,
and in one of its qualitative generalizations, namely, the problem of control for an ensemble of trajectories (§ 3). The first condition in (1.1) is assumed to hold almost everywhere (in the sense of the Lebesgue measure) on the interval I≐[0,1]; we let x=x[u] denote the Carathéodory solution of the Cauchy problem (1.1) which corresponds to the control u.
Assume that a given process ¯σ=(¯x,¯u), ¯u∈U, ¯x=x[¯u], has to be examined for optimality. The conventional necessary conditions for a strong (as well as a Pontryagin or an L1-) extremum (see [5] and [6]) are based on classes of needle-shaped and weak variations of the control ¯u, which generate processes σ=(x,u) with the property of potential descent with respect to the functional
I[u]≐ℓ(x(1))⩽ℓ(¯x(1))≐I[¯u].
We adopt the terminology of [1] and say that such variations reject ¯u, and we refer to the corresponding processes as comparison processes.
In feedback (feedback) necessary conditions the role of rejecting variations is played by feedback controls (and this is already indicated by the term ‘feedback’). These variations widen the set of comparison processes by augmenting it with some sliding modes, namely, admissible process for the convexified problem1[x]1The consistency of such an extension is based on an approximation of sliding modes by admissible processes for problem (P) and a continuous extension of the objective functional.(coP). Trajectories admitted to compare with ¯x are curves (in general, inadmissible in (P)) generated by feedback controls
w(t,x)∈argminυ∈U∇x¯φt(x)⋅f(x,υ),
where ¯φ:I×Rn→R is a sufficiently regular weakly nonincreasing2[x]2A function φ:I×Rn→R, (t,x)↦φt(x), is said to satisfy some property of monotonicity weakly with respect to the control system(1.1), (1.2) if the corresponding property is exhibited by its composition φ∘x:I→R, t↦φt(x(t)), with (at least one) admissible trajectory x. solution of the boundary problem for the Hamilton–Jacobi inequality
∂tφt(x)+minυ∈U∇xφt(x)⋅f(x,υ)⩽0,φ1=ℓ,
associated with ¯σ in one way or another (in the general case the boundary condition can contain an arbitrary majorant of the function ℓ); ∇x is the gradient operator with respect to the variable x∈Rn, and the dot denotes the scalar product of two vectors.
Once ¯φ has been defined, the strategy (1.4) of control for the system that delivers a minimum to (1.5) gives us a formal structure of control for descent from ¯σ, like in dynamic programming. Let us put aside for a while the (nontrivial) aspect of the implementation of this strategy and discuss the problem of the selection of the function ¯φ, which was called in [1] the majorant of the objective functional at the point ¯σ. Among all solutions of inequality (1.5) it is reasonable to select one of those for which the construction (1.4) provides the deepest descent. It should be noted that in general conditions (1.5) themselves do not guarantee any descent at all: it can occur (see Example 2 in § 8.3) that for badly chosen ¯φ all curves x=x[w] synthesized by the rule (1.4) are definitely ‘worse’ than ¯x, that is, ℓ(x(1))>ℓ(¯x(1)).
The problem of the construction of adequate majorants in the nonlinear problem (P) was qualified in [3] as an open problem of the theory.3[x]3The ideal majorant is, of course, the Bellman function, which is much harder to find than to solve problem (P) itself. It is a common practice to consider linear or quadratic (in the state variable) functions generated by constructions of Pontryagin’s maximum principle (PMP) and second-order conditions (solutions of the conjugate system), namely, matrix impulses of Gabasov and Riccati type. These functions generate indeed controls for descent in the corresponding particular classes of problems — linear and linear-quadratic in the state variable — however, in the general case their use has heuristic nature.
In this work we propose a universal (and fairly simple) class of nonlinear majorants in problem (P): the sought-for majorant is the function ¯p:I×Rn→R which takes a constant value along the flow ¯X1,t of the vector field (t,x)↦f(x,¯u(t)) on the interval [t,1] and coincides with ℓ at the terminal instant. In other words, the majorant is defined by the relation ¯pt(x)≐ℓ(¯X1,t(x)) for all (t,x)∈I×Rn.
Note that under the standard regularity assumptions for the vector field f the mapping (t,x)↦¯X1,t(x), as well as the majorant ¯p, is absolute continuous in t and continuously differentiable in x. In particular, for each x∈Rn we have the equalities (see § 3.1)
∂t¯pt(x)+∇x¯pt(x)⋅f(x,¯u(t))=0for almost all t∈I;p1(x)=ℓ(x).
Below we interpret the transport equation in this ‘pointwise’ sense. Note at the same time that the function ¯p is also the unique weak solution (that is, a solution in the sense of distributions) of problem (1.6); see [7].
The majorant constructed above possesses all required properties. First, it is weakly monotone, namely, ‘weakly constant’. Second, ¯p satisfies the Hamilton–Jacobi inequality (1.5):
at all points of I×Rn where the derivative ∂t¯pt(x) exists. Finally, the corresponding feedback strategies
w(t,x)∈argminυ∈U∇x¯pt(x)⋅f(x,υ)
generate (at least one) trajectory (of a sliding mode) satisfying (1.3). We establish this result in § 7. This fact is obvious under the assumption that some implementation of the synthesis of w gives a program control u∈U and the corresponding solution x=x[u] for which
Then (d/dt)¯pt(x(t))⩽0 and ℓ(x(1))=¯p1(x(1))⩽¯p0(y)=¯p1(¯x(1))=ℓ(¯x(1)). Here the first and last equalities follow from the boundary condition (1.6), the inequality is due to the nonincreasing behaviour of the composition ¯p∘x, and the second equality is a consequence of the fact that ¯p is constant along ¯x.
Below we will see that the function ¯p is a solution of the boundary problem (1.5) for which the cost of the process synthesized by the rule (1.4) is certainly at most I[¯u]=ℓ(¯x(1)). This will be established as a consequence of an exact formula for the increment of the functional, which generalizes such formulae for the linear and linear-quadratic settings (see [8]).
By P(Rs) we denote the set of all probability measures on Rs and by Pc(Rn) the subset of measures with compact support. The latter set can be endowed with the structure of a metric space by equipping it with the Kantorovich p-metric Wp, p⩾1 (see [9]) (this space is not complete, but it is dense in any complete space (Pp(Rn),Wp), where Pp(Rn) is the subset of P(Rn) consisting of all measures with finite pth moment: see [10]). It is always assumed below that Pc(Rn) denotes the metric space (Pc(Rn),W1) without further qualification.
We distinguish two Borel measures on Rs, namely, Ls is the classical Lebesgue measure and δx is the atomic Dirac measure concentrated at a point x∈Rs. The abbreviation ‘a.e.’ stands for ‘almost everywhere’ or for ‘almost all’ points of the corresponding set with respect to the measure indicated. When the measure is not indicated, it is assumed to be the one-dimensional Lebesgue measure L1.
Given a norm |⋅| on Rn, we denote the matrix norm consistent with it by the same symbol. Finally, DxV is the matrix of partial derivatives (∂xiVj) of the vector field V:Rn→Rn, where the Vj are components of V.
In this work the following basic assumptions are supposed to hold.
Remark 1. We restrict our considerations to the autonomous case, since the approach elaborated here involves (in § 6) elements of the theory developed by Krasovskii and Subbotin [11], in which the continuity of the vector field with respect to the variable t plays an essential role. At the same time the case when the dependence mentioned above is sufficiently regular reduces to the autonomous case by extending the phase space.
§ 2. Variational approach. Bilinear problem
The method of the derivation of feedback necessary conditions described above does not emphasize their variational character. However, the condition corresponding to the feedback variation (1.7) can be derived in a classical way, by considering the increment of the functional. On the whole this approach reproduces the standard algorithm for deriving the PMP; it is most demonstrative in the case of a bilinear problem.
For the sake of simplicity let us make an additional assumption:
(A4) the mapping υ↦f(x,υ) is affine and the set U is convex.
Recall that in this case the operator u↦x[u] is continuous as a function L∞(I;Rn)→C(I;Rn), where L∞ is endowed with the weak* topology σ(L∞,L1) of duality to L1. The way the main results can be carried over to the general case (sliding modes of control) is discussed in § 7.
In the PMP theory the canonical class of rejecting controls is formed by needle-shaped variations, however, in the u-affine case it is sufficient to restrict the consideration to so-called weak variations:
¯uλ(t)=¯u(t)+λ(u(t)−¯u(t)),u∈U,λ∈[0,1).
As is well known, the linear part of the mapping
λ↦Δ¯uλI[¯u]≐I[¯uλ]−I[¯u]=λddλ|λ=0I[¯uλ]+o(λ)
in a neighborhood of the origin (the first variation of the functional I, the Gateaux derivative in the direction u−¯u) can be represented in the form
which guarantees that for sufficiently small values of λ controls of the form (2.1) are ‘not worse than’ ¯u. In combination with a certain method for adjusting the parameter λ (the method of linear descent) formulae (2.1) and (2.5) describe an algorithm of consecutive approximations in problem (P), an analogue of the method of gradient descent (see, for example, [8]). The condition of the unimprovability of the control ¯u and the halting criterion is, obviously, the validity of the relation
which determines the classical Pontryagin principle (in the form of the minimum principle).
It is obvious that if the initial conditions of the problem are sufficiently regular, then one can go even further and consider the Taylor series expansion of the function λ↦Δ¯uλI[¯u] by taking the second and higher variations (a common practice is not to go beyond the second variation). This gives necessary conditions of higher order and more sophisticated methods for local search.
It is known (see [8]) that in the particular case when problem (P) has a linear-quadratic structure there is an analogue of representation (2.2), (2.3) without remainder terms, hence, without any parameters adjusting the closeness of the controls ¯u and u. Such a representation can be considered an ‘infinite-order’ variation of I. Moreover, if the functions x↦f(x,υ) and x↦ℓ(x) are linear (that is, the problem is linear in state), then this representation is fully expressed in terms of standard constructions of the PMP:4[x]4This representation is easily derived from the relation ΔuI[¯u]=¯φ1(x(1)), where ¯φt(x)=¯ψ(t)⋅(x−¯x(t)).
Let us compare (2.7) and (2.3). The only difference between them is that now the first argument is a point on the ‘new’ trajectory x=x[u] rather than on the reference trajectory. As above, one can provide the inequality ΔuI[¯u]⩽0 by choosing u so as to satisfy the condition of pointwise minimization of the integrand
However, in contrast to the pointwise condition (2.5), which gives us a control in an explicit form, the operator equation (2.8) determines the function u implicitly. If the control ¯u is not extremal, then at first sight it is by no means obvious that there exist any solutions of this equation in the class U.
One could act in the following way: first, distinguish a ‘control construction’ in the form of the feedback w(t,x) as a solution of the problem
H(x,¯ψ(t),w(t,x))=minυ∈UH(x,¯ψ(t),υ),
then substitute this function into the system,
˙x=f(x,w(t,x)),x(0)=y,
and, finally, if the last system has a Carathéodory solution x, put u(t)=w(t,x(t)). This new program control would provide an optimality criterion for ¯u which has the same form as (2.6):
This is what a ‘feedback analogue’ of Pontryagin’s principle might look like.
Let us emphasize once again that, in contrast to the PMP, which involves only the process ¯σ to be tested for optimality, the formulation of the feedback condition assumes that there exists an additional comparison process σ=(x=x[u],u) which is not supposed to be close to ¯σ either in control or in trajectory.
Of course, in the case of a general position the function x↦w(t,x) is not continuous, the system (2.9) has no Carathéodory solutions, and nonclassical solutions (such as Krasovskii–Subbotin motions) are not generated by any control u∈U (for nonconvex U or a nonconvex set f(x,U)≐{f(x,υ)|υ∈U}). Moreover, formula (2.7) itself is valid only in the case when problem (P) is linear in state.
To carry over the idea presented above to the general nonlinear case it is convenient to embed the classical problem (P) into a weaker statement which is linear in the corresponding state variable, namely, the problem of control for an ensemble of trajectories (on the metric space of probability measures). This relaxation is discussed in § 3. The linearity of the weakened problem allows us to involve elements of the duality theory (§ 4). In § 5 we derive two symmetric exact formulae for the increment of the objective functional in the transformed problem (and, as a corollary, in (P)), which are similar to Weierstrass’ classical formula in the calculus of variations (see [12]). As a corollary of these formulae, we obtain a series of necessary optimality conditions similar to (2.10): in § 6 the corresponding conditions are derived in the case of affine dependence on u and a convex set U; this result is generalized in § 7 to the general statement by going over to sliding modes of control. Section 8 is devoted to the discussion of the status of the necessary conditions obtained in this work among similar results, in particular, their relation to Pontryagin’s principle. In § 9 we formulate the method of descent along the functional, which involves the extremal construction of feedback controls (1.7), and establish its convergence. Sections 10 and 11 contain necessary technical results.
§ 3. Relaxation
Let us show that a representation similar to (2.7) is valid for problem (P) of the general form. Below we use the notation fυ(x)≐f(x,υ) and ∫≐∫Rn.
3.1. Flows of vector fields. The transport equation
We start by recalling some necessary facts. Suppose that the assumptions (A2) and (A3) are satisfied. Then for each u∈U the function (t,x)↦fu(t)(x) generates the mapping X=X[u]:(s,t,x)↦Xs,t(x), which is called the flow of the nonautonomous vector field f. Here t↦Xs,t(x) is the solution of the Cauchy problem
∂tXs,t(x)=fu(t)(Xs,t(x)),Xs,s(x)=x.
For all s,t∈R the mapping Xs,t:x↦Xs,t(x) is a C1-diffeomorphism Rn→Rn with the property Xτ,t∘Xs,τ=Xs,t for all s,τ,t∈R. These facts, in particular, imply the invertibility of Xs,t and the relation (Xs,t)−1=Xt,s.
Fixing some s we introduce the shorthand notation Pt=Xs,t and Qt=Xt,s. Clearly,
0=∂t(id)=∂t(Qt∘Pt)=(∂tQt+DxQtfu(t))∘Pt,
where id denotes the identity mapping Rn→Rn. Since the expression in parentheses vanishes for all values z=Pt(x) and the mapping x↦Pt(x), Rn→Rn, is bijective for each t∈I, it can be concluded that t↦Qt satisfies on I×Rn the conditions
∂tQt+DxQtfu(t)=0,Qs=id,
which are treated in the same pointwise sense as equation (1.6). This yields a useful expression for the derivative of the flow with respect to the first index:
∂tXt,s=−DxXt,sfu(t)≐−Jt,sfu(t),
where s↦Jt,s[u](x) for each x∈Rn is a solution (see [13], Theorem 2.3.2) of the Cauchy problem for the linear matrix equation
∂sJt,s=Dxfu(s)∘Xt,s(x)Jt,s,Jt,t=E;
E=En denotes the identity matrix of size n×n. Let ξ∈C1(Rn;R). Then it turns out that the function p≐ξ∘Q is5[x]5This fact is easily verified by direct differentiation if one puts the equality pt=ξ∘Qt into the equivalent form pt∘Pt=ξ. a solution (in the sense indicated above) of the nonconservative transport equation
∂tpt+∇xpt⋅fu(t)=0
with the intermediate condition ps=ξ.
3.2. The problem of control for a (statistical) ensemble of trajectories
Note that the phase space Rn in problem (P) can be endowed with the natural structure of a probability space (Rn,F,P) by introducing the canonical probability measure6[x]6It is obvious that for this choice of P the particular choice of the σ-algebra F does not matter.P=δy. In this case the function t↦Xt(x)≐X0,t(x)[u]≐x[u](t) must be treated as a (deterministic) random process. For each t∈I the distribution of the random variable x↦Xt(x) is determined by the probability measure μt≐(Xt)♯δy=δx[u](t)∈P(Rn). It is well known that under the standard regularity assumptions the function μ=μ[u]:t↦μt, which describes the behaviour of this measure in time, is a weak solution (see the definition below) of the linear partial differential equation
∂tμt+∇x⋅(fu(t)μt)=0.
This formal equation is a direct generalization of the classical continuity equation to the case of arbitrary probability (or nonnegative) measures. If P=δy, then it is equivalent to (its characteristic) ordinary differential equation (1.1): the only weak solution of the continuity equation on I with the initial condition μ0=δy for a control u∈U is the curve t↦δx[u](t).
In turn, the quality criterion for the problem (P) can be formulated in terms of the linear mapping P(Rn)→R,
ℓ(x[u](1))=∫ℓdμ1[u]≐⟨μ1[u],ℓ⟩,
the minimum of which over all μ=μ[u], u∈U, coincides with the value of (P). Thus, we arrive at an equivalent statement of the original control problem, which is now linear in the new state variable.
Now we put aside the particular choice of the probability structure (F,P) and consider the extremal problem
(RP)J[u]≐⟨ℓ,μ1[u]⟩→min,
∂tμt+∇x⋅(fu(t)μt)=0,
μ0=ϑ,
u∈U.
Here the role of states is played by the probability measures μt∈P(Rn) on the phase space of problem (P); the initial distribution of ϑ∈P(Rn) is specified. The class U of admissible controls remains the same. Assumptions (A1)–(A4) are still supposed to be fulfilled, along with the additional assumption
(A5) the measure ϑ has a compact support (ϑ∈Pc(Rn)).
A weak solution of equation (3.6) is a function μ∈C(I;P(Rn)) for which the Newton–Leibniz formula holds:
Remark 2. We adopt the nonclassical definition of a weak solution (with a wider class of test functions), which is here equivalent to the classical one; see [9], Remark 2.5 and Lemma 2.6.
It is known (see [9]) that under the assumptions (A2), (A3) and ϑ∈P1(Rn) (in particular, (A5)) the weak solution of the Cauchy problem (3.6), (3.7) does exist, is unique and admits the representation
μt=(X0,t)♯ϑ.
Here the operator F♯:P(Rn)→P(Rn) defines the image of the measure under the action of the Borel measurable vector field F:Rn→Rn:
⟨F♯μ,φ⟩=⟨μ,φ∘F⟩∀φ:φ∘F∈Lμ1(Rn;Rn).
Remark 3. Under assumption (A5) the family of measures (3.9) satisfies condition (3.8) for an arbitrary φ∈C1(I×Rn) (φ does not necessarily has a compact support with respect to x).
The setup (RP) is called the ensemble control problem. It generalizes problem (P) to the case of an uncertain initial state (more realistic from the standpoint of applications). Although the variational analysis of the original problem on the basis of exact formulae for the increment can be performed directly, the approach proposed in our work can almost literally be carried over to the generalized model, and it is reasonable (and is even simpler in a certain sense) to present it in terms of the latter. Moreover, the main advantage of this approach — the absence of variation parameters — is most pronounced in problems of control for distributed systems, in which Pontryagin’s principle is formulated in terms of the conjugate partial differential equation (see [14], Theorem 2) and the ‘computational cost’ of the classical and feedback optimality condition is almost the same.
Note that under assumptions (A1)–(A4) the minimum in (RP) is attained in the class U of admissible controls: see [15], Theorem 3.2.
The reader familiar with geometric control theory can draw a parallel between the setup (RP) and the formalism of chronological calculus [16], [17], in which the probability structure is replaced with the algebraic one.
§ 4. Duality
Let ξ∈C1(Rn;R) and u∈U be fixed, and let X=X[u] be the flow of the system (1.1) corresponding to the control u. Consider the function p=p[u]: [s,1]×Rn→R,
pt=ξ∘X−1s,t≐ξ∘Xt,s.
As mentioned above, this function is a solution of the Cauchy problem (3.5). It is easily seen that the action of the measure μt on pt does not depend on t:
⟨μt,pt⟩≐⟨pt,(X0,t)♯ϑ⟩=⟨pt∘X0,t,ϑ⟩=⟨ξ∘X0,s,ϑ⟩,
which allows us to consider the trajectory p as the conjugate of μ. Using this we can get rid of the variable μ and reformulate problem (RP) in terms of the variable p. Indeed, putting s=1 and ξ=ℓ we obtain
⟨μ1,ℓ⟩≐⟨μ1,p1⟩=⟨μ0,p0⟩≐⟨ϑ,p0⟩
and conclude that the optimum in (RP) coincides with the solution of the problem
which is equivalent to the original (classical nonlinear) problem (P). This result is closely related to Theorem 2.1 in [18] (see also [19]) and it can be looked upon as a generalization of relations of nonclassical duality (Remark 4 in § 5; see [20]).
§ 5. Exact formulae for the increment
5.1. The increment of the functional of the weakened problem
Problem (RP) admits an exact representation of the increment of the functional which is similar to (2.7).
Proposition 1. Suppose that conditions (A1)–(A3) hold. Let u,¯u∈U, u≠¯u, be arbitrary controls, X=X[u] and ¯X=X[¯u] be the corresponding flows of the characteristic system (1.1) and μ=μ[u] be the solution of equation (3.6) corresponding to the control u.
Then the increment ΔuJ[¯u]≐J[u]−J[¯u] of the objective functional of problem (RP) can be represented in the form
ΔuJ[¯u]=∫I⟨μt,¯Ht(⋅,u(t))−¯Ht(⋅,¯u(t))⟩dt,
where
¯Ht(x,υ)≐H(x,¯ξt(x),υ),¯ξt=∇x¯pt≐¯J∗t∇ℓ∘¯Xt,1,
H is the classical Pontryagin function and ¯p is defined by condition (4.1) for ξ=ℓ, s=1, and u=¯u; the matrix ¯Jt≐¯Jt,1≐Dx¯Xt,1 is the solution of the Cauchy problem (3.4) at the instant s=1 for u=¯u.
Proof. The proof of Proposition 1 is based on elementary facts from calculus and the theory of ordinary differential equations.
(1) Let us show that the function s↦ℓ∘¯Xs,1∘X0,s(x), I→R, is Lipschitz continuous for each x∈Rn. Consider the orbit OI(x)={X0,1[ωs[u,¯u]](x)∣s∈I} of the point x under the mapping s↦X0,1[ωs[u,¯u]](x)≐¯Xs,1∘X0,s(x), I→Rn, where
ωs[u,v]≐{uon [0,s),von [s,1].
The standard arguments based on the Grönwall–Bellman inequality show that under assumptions (A2) and (A3) the set OI(x) is bounded. Then its closure clOI(x) is a compact subset of Rn. It follows from assumption (A1) that the function ℓ is locally Lipschitz continuous, hence (in view of the local compactness of Rn) its restriction to clOI(x) is Lipschitz continuous. Now the required fact follows from the Lipschitz continuity of the functions t↦X0,t(x) and t↦¯Xt,1(x) with regard to the uniform (in t) local Lipschitz continuity of the function x↦¯Xt,1(x) (Lemma 1):
where L1=Lip(ℓ;clOI(x)) is the Lipschitz constant of the objective function ℓ on clOI(x), L2=Lip(¯Xt,1(⋅);{X0,t(x)∣t∈I}) is the Lipschitz constant of x↦¯Xt,1(x) on the phase portrait {X0,t(x)∣t∈I}, L3=Lip(X0,⋅(x);I) and L4=maxs∈ILip(¯X⋅,1(X0,s(x));I).
(2) With regard to the definition (3.10) and the equalities Xs,s=¯Xs,s=id for all s we represent the increment of the functional in the form
As the mapping t↦ℓ∘¯Xt,1∘X0,t(x) is absolutely continuous, one can extend the chain of equalities and convert the last difference with the use of the Newton–Leibniz formula:
⟨ϑ,∫I∂t(ℓ∘¯Xt,1∘X0,t−ℓ∘¯Xt,1∘¯X0,t)dt⟩.
By the semigroup property of the flow ¯X the quantity ℓ∘¯Xt,1∘¯X0,t=ℓ∘¯X0,1 does not depend on t. Consequently,
ΔuJ[¯u]=⟨ϑ,∫I∂t(ℓ∘¯Xt,1∘X0,t)dt⟩.
Let us calculate the derivative under the integral sign:
To finish the proof it remains to apply Fubini’s theorem and take representation (3.9) into account.
The proof of the proposition is complete.
5.2. A ‘direct’ formula for the increment in problem (P)
We refine Proposition 1 for the original setup (P). Putting ϑ=δy (which yields the equality μt[u]=δx[u](t) for each t∈I) we obtain J[u]=I[u]. Then (5.1) takes the form
Remark 4. It is easily seen that in problem (P), which is linear in state, the composition ∇x¯p∘¯x coincides on I with the reference cotrajectory ¯ψ≐ψ[¯u]. This follows from the representation
where t↦¯J∗t,1 is the fundamental matrix solution in the inverse time of the equation in (2.4) (here ¯Jt,1 does not depend on x). In this case formula (5.1) reduces to (2.7).
In the nonlinear case the equality ∇x¯p∘¯x=¯ψ holds under an additional regularity assumption:
(A6) the function ℓ is twice continuously differentiable, as also is the function x↦f(x,υ) for each υ∈U.
This fact is established by direct differentiation of the function t↦∇x¯pt(¯x(t)).
5.3. The dual formula for the increment
Renaming ¯u→u, we obtain a ‘dual’ representation for the increment of the functional in problem (RP):
ΔuJ[¯u]=−∫I⟨¯μt,Ht(⋅,u(t))−Ht(⋅,¯u(t))⟩dt,
where Ht(x,υ)≐H(x,ξt(x),υ), ξt=∇xpt[u]≐J∗t∇ℓ∘Xt,1 and p=p[u] is defined by condition (4.1) for s=1 and the control u (X=X[u] is the corresponding flow (1.1) and t↦Jt≐DxXt,1 is a solution of (3.4)). Refining this representation of problem (P) we obtain an exact formula for the increment:
Let u and ¯u be arbitrary admissible controls, and X and ¯X be the corresponding flows. For each s∈[0,1] consider the intermediate control ωs[u,¯u]∈U defined by equality (5.3) and the flow Xs of the system (1.1) generated by this control. Note that Xs0,1=¯Xs,1∘X0,s. It is obvious that the function γ:I→Rn,
γ(s)=Xs0,1(y),s∈I,
specifies a parametrization of a curve on the attainability set D1(y) of the system (1.1), (1.2). This curve joins the points ¯X0,1(y)≐¯x(1)≐x[¯u](1) and X0,1(y)≐x(1)≐x[u](1) (Figure 1). Recall (formula (4.1)) that ¯ps=ℓ∘¯Xs,1. Hence ¯ps(x) is the cost of the reference control ¯u in problem (P) as restricted to the interval of time [s,1] with the initial condition x(s)=x. Now assume that u takes the initial state x(0)=y to a point x in time s, that is, x=X0,s(y). Then ¯ps(x) is the cost of the intermediate control ωs[u,¯u] in (P). A small variation s+Δt of the moment of ‘switching’ between the controls u and ¯u has the cost
we obtain an exact formula for the increment (5.6).
For a rigorous proof of relation (5.10) we go in the opposite direction: we apply the formula for the increment (5.6) and take into account that the curve ℓ∘γ is Lipschitz continuous (see the proof of Proposition 1). This, in particular, shows that (5.10) holds only for almost all s∈[0,1].
The dual formula (5.8) has a similar representation involving another class of curves ζ on D1(y) generated by variations ωs[¯u,u] of the control of the form (5.3) with the arguments ¯u and u in the reverse order, and joining the points ¯x(1) and x(1) in the ‘opposite direction’: ζ(0)=x(1) and ζ(1)=¯x(1). It is clear that there exist even more sophisticated parametrizations of the ‘motion’ between ¯x(1) and x(1) inside D1(y) — those akin to ‘packages of needles’ in the PMP theory — for example, parametrizations corresponding to the controls
Representation (5.10) suggests an obvious way to organize a monotone descent along the functional I: one must construct a curve γ along which the function ℓ does not increase, that is, ∇ℓ∘γ⋅˙γ⩽0 almost everywhere on I. For example, one can take a feedback control w satisfying inclusion (1.7). Then the process (x,u) with control u(t)=w(t,x(t)) (of course, if such a process is well defined) generates the required curve γ and therefore is an improving process. A rigorous implementation of this idea is the subject of the rest of this paper.
§ 6. Necessary conditions of optimality
We turn back to the problem of the generation of descent directions for (P). Throughout this section assumptions (A1)–(A4) are supposed to hold.
6.1. The principle of optimality with feedback controls
As discussed in § 2, the sign ΔuI[¯u]⩽0 is guaranteed by the choice of u so as to minimize the integrand in (5.6). This turns us back to the problem of the solvability of the operator equation
similar to (2.8), in the class of admissible program controls U.
Let us verify that for each ¯u∈U the set
S[¯u]≐{σ=(x,u)∣x=x[u],u∈U, and (6.1) holds}
is nonempty. To be more precise, we demonstrate that the required property is characteristic for all processes in which:
∙x is an arbitrary Krasovskii–Subbotin solution (see § 12) corresponding to some feedback control
w(t,x)∈argminυ∈U¯Ht(x,υ)≐argminυ∈UH(x,∇x¯pt(x),υ)
(the set of such solutions is denoted by ¯KS),
∙u∈U is an arbitrary program control generating the function x as a solution of system (1.1) (such a control exists by Proposition 6; clearly, if w is Borel measurable, then the class of such controls includes the Borel equivalence class of u such that u(t)≐w(t,x(t)) for almost all t∈I).
Proposition 2. Let x∈¯KS, u∈U and x=x[u]. Then the pair σ≐(x,u) satisfies (6.1).
This fact follows from a more general result of Proposition 4 for sliding modes.
It follows from the representation (5.6) that in problem (P) any control u satisfying (6.1) is ‘not worse’ than the reference one, that is, I[¯u]⩾I[u]. If the process ¯σ is optimal, then it is obvious that any process σ∈S[¯u] is optimal as well. This observation can be considered a necessary optimality condition in the spirit of the feedback conditions [1]. It is clear that if this condition holds, then I[¯u]=I[u] for all σ∈S[¯u].
Note that for any u having the property (6.1) the integrand in (5.6) is nonpositive; then we can reformulate our necessary condition in a form close to the PMP.
Theorem 1 (minimum principle with feedback controls). Let ¯σ=(¯x,¯u) be an optimal process in problem (P). Then the condition
¯Ht(x(t),¯u(t))=minυ∈U¯Ht(x(t),υ)(=¯Ht(x(t),u(t)))for almost all t∈I
holds for each σ=(x,u)∈S[¯u].
In fact, Theorem 1 contains a series of necessary conditions parametrized by the (nonempty) set S[¯u], which we still call comparison processes; in contrast to [1] it is only admissible processes that we admit to comparison. This theorem proposes the concept of ‘feedback extremal’ alternative to [3]: a feedback extremal is a pair (¯σ,σ) of processes satisfying condition (6.3) (in particular, the equality in parenthesis). With regard to Remark 4 the claim that the process ¯σ is extremal in the classical sense is equivalent to the inclusion ¯σ∈S[¯u] which means that the pair (¯σ,¯σ) is a feedback extremal. The relationship between the two types of extremality is discussed in § 8.1 below.
In conclusion we give a rigorous interpretation of the feedback condition in terms of the curve of monotone descent on the attainability set of the control system (§ 5.4).
Proposition 3. Let ¯σ=(¯x,¯u) be an admissible process, σ=(x,u)∈S[¯u] be some comparison process and the curve γ:I→Rn be defined by condition (5.9). Then the objective function ℓ does not increase along γ.
6.2. Descent controls in the weakened problem. Co- and bifeedback optimality conditions
The method for the generation of comparison controls that was presented in § 6 remains unchanged in the weakened problem (RP): the expression (5.1) gives the structure of a descent control in the form of a feedback with respect to the measure μ (it characterizes the state of the system):
w[μ](t)∈argminυ∈U⟨μ,¯Ht(⋅,υ)⟩.
This representation can be used to derive a feedback necessary condition and to construct nonlocal numerical algorithms in the problem of control for an ensemble of trajectories. Some results in this area were obtained in [21].
The dual representations (5.7) and (5.8) suggest a construction for a cofeedback descent control in the form of a functional feedback
w[p](t)∈argminυ∈UH(¯x(t),∇xpt(¯x(t)),υ)
and produce a series of ‘cofeedback’ necessary conditions, which can be combined with Theorem 1. The feedback strategies (6.4) can be implemented with the use of the Krasovskii–Subbotin scheme in reverse time, starting from the point ¯x(1)=ζ(1)≐Xs,1∘¯X0,s|s=1 (see § 5.4). We do not elaborate here this idea and limit out considerations to the direct approach.
§ 7. General case. Sliding modes
Now we abandon assumption (A4) and suppose that υ↦f(x,υ) is an arbitrary mapping satisfying (A3) and U is an arbitrary (nonconvex) compact set. Although this case is much more general, from the technical point of view it is little different from the one discussed above, provided that we apply a classical trick originating from the theory of Young measures (see [22]): we relax the class of admissible controls U by identifying functions u with elements ν of the set
Y=Y(U)≐{ν∈P(I×U)∣[(t,υ)↦t]♯ν=L1|I}
whose marginals t↦νt — families of measures obtained by desintegrating ν with respect to the Lebesgue measure L1 on I — have the form t↦δu(t) (in control theory the mappings t↦νt are called controls of Gamkrelidze or Warga–Gamkrelidze type, in differential games they are called mixed strategies). This causes a relaxation of the original control system
x(t)=y+∫[0,t]×Uf(x(s),υ)dν(s,υ),ν∈Y,
which corresponds to the convexification of the set of velocities (1.1), (1.2):
˙x=∫Uf(x,υ)dνt(υ)⟺˙x∈co{f(x,υ)∣υ∈U};x(0)=y.
Let x=x[ν] be the solution of (7.1) that corresponds to ν∈Y. Processes (x,ν) are called sliding modes of the control. The mapping ν↦x[ν] is well known to be continuous as a function Y↦C(I;Rn), where Y is endowed with the topology of weak convergence of probability measures. The flow (s,t,x)↦Xs,t[ν] of system (7.1) and the corresponding mapping (s,t,x)↦Js,t[ν] can be defined in the same way as in § 3.1.
It is obvious that problem (coRP) of the minimization of the linear form ⟨ℓ,μ1[ν]⟩ over the curves in the family t↦μt[ν]≐(X0,t[ν])♯ϑ, ν∈Y, is the convexification of problem (RP). The convexified version (coP) of the original problem (P) is a particular case of (coRP) that corresponds to the initial measure ϑ=δy. Note that equation (7.1) is linear in the variable of generalized control and the corresponding (weakened) problem (coRP) is bilinear in the pair (μ,ν).
Now all results of § 6 can be rewritten (almost word for word) in terms of the generalized control ν and the function
¯Ht(x,ϱ)≐∫U¯Ht(x,υ)dϱ(υ),I×Rn×P(U)→R.
The following proposition holds.
Proposition 4. Let x∈¯KS, ν∈Y and x=x[ν]. Then
¯Ht(x(t),νt)=minϱ∈P(U)¯Ht(x(t),ϱ)for a.e. t∈I.
Proof. One must only notice that the function
g(t,x,ω)≐¯Ht(x,ω)−minυ∈U¯Ht(x,υ)
satisfies all the assumptions of Proposition 7 by virtue of Lemma 2. The proof is complete.
We note that the minimum on the right-hand side of (7.2) is attained at any measure ϱ∈P(U) with the property
spt(ϱ)⊆argminυ∈U¯Ht(x(t),υ),
where spt denotes the support of the measure. Proposition 4 claims the existence of (at least one) comparison process in the class of sliding modes; under assumption (A4) it reduces to Proposition 2. In turn, a direct generalization of Theorem 1 is as follows.
Theorem 2. Let (¯x,¯ν) be an optimal process of problem (coP). Then the relation
As above, to check a process ¯σ in the original problem (P) for optimality, one can rewrite conditions (7.4) and (7.5) in the form
¯Ht(x(t),¯u(t))=minϱ∈P(U)¯Ht(x(t),ϱ)for a.e. t∈I
and
spt(νt)⊆argminυ∈U¯Ht(x(t),υ)∪{¯u(t)}for a.e. t∈I,
respectively. Since one selector of the multivalued mapping I→P(U) generated by the last inclusion is the function t↦δ¯u(t), condition (7.4), in particular, contains Pontryagin’s principle. The same ‘additive’ inclusion of Pontryagin’s extremals in the class of comparison processes is assumed by the formulation of the feedback minimum principle [1]. However, the relationship between the conditions mentioned above and Theorems 1 and 2 is not that trivial. It is discussed in the next section.
§ 8. Discussion and examples
Here we discuss the status of Theorems 1 and 2 among close results.
8.1. The relation to Pontryagin’s principle
We turn back to problem (P) under assumptions (A1)–(A4) and (A6). It follows from the equality ∇x¯p∘¯x=¯ψ (Remark 4) that the process ¯σ satisfies the PMP once the comparison trajectory x coincides with ¯x on the whole interval I (it is possible, however, that u≠¯u). This is definitely so, for example, in the case when for all (t,x)∈I×Rn the extremal problem (6.2) has a unique solution.7[x]7If a problem is affine in control, then this can be accomplished by means of a standard ‘regularization’, addition of a convex integral term α2∫I|u(t)|2dt with small positive weight α to the objective functional. This method is widely used in the numerical solution of control problems. However, such a perturbation of the problem can lead to a degeneration of the feedback optimality conditions (see Example 3 in § 8.3). Sometimes a more efficient method of perturbation is concave ‘antiregularization’ proposed by Dykhta, namely, subtraction of the above integral from the objective function. We present several versions of such statements. We introduce a feedback analogue of the regularity property for an extremal, namely, the absence of so-called singular pieces of control components.
Definition 1. Let (¯σ,σ) be a pair of admissible processes. A component ¯uk of a control ¯u=(¯u1,…,¯um) is said to be regular if
∂υk¯Ht|x=x(t)≠0∀t∈I
(except, perhaps, finitely many points t∈I). The pair (¯σ,σ) is said to be regular if inequality (8.1) holds for every k=1,…,m.
We can verify that in some typical situations all regular feedback extremals of problem (P) which is affine in control are extremals of Pontryagin’s principle.
Theorem 4 (relation to PMP). Suppose that assumptions (A1)–(A4) and (A6) are satisfied, and let (¯σ,σ) be a feedback extremal. Suppose that one of the following conditions holds.
It is also clear that the regularity of the component ¯uk, in combination with condition (6.3), implies the equality uk(t)=¯uk(t) for almost all t∈I, and since all components of ¯u are supposed to be regular, we have u=¯u. Now the PMP-extremality of ¯σ follows from the equality ∇x¯p∘¯x=¯ψ.
(2) By (8.1). for almost all t∈I the linear form υ↦υ⋅∇υ¯Ht|x=x(t) is nondegenerate and all of its minimum points lie on the boundary of the compact set U; if U is strictly convex, then such a point is unique. Then, as above, condition (6.3) implies the equality u=¯u.
The proof of the theorem is complete.
In the regular case the feedback condition is not weaker than the PMP (and Example 3 for α=0 illustrates the phenomenon of strict strengthening). A natural question arises: what are the types of PMP-extremal processes excluded/not excluded by Theorems 1 and 2? We have already seen that processes in the class S[¯u] cannot be better than the PMP-extremal ¯σ corresponding to the point of local minimum of the function ℓ on the attainability set of the (convexified) system, since the principle of the construction of γ as a curve of monotone nonincreasing does not assume the ‘ascension’ along level lines of ℓ to get out of ¯x(1). However, feedback extremals can correspond to other types of stationary points (Example 3).
8.2. The relation to the feedback minimum principle
The centerpiece of the theory of feedback necessary conditions is the so-called feedback minimum principle (FMP). This condition was originally formulated in terms of a linear majorant of the objective functional with a reference cotrajectory (that is, in the framework of the standard objects of the PMP; see [1]), while the most general result of this type — with a nonlinear majorant — was presented in [3]. Recall the original formulation of the FMP in problem (P): the optimality of a pair ¯σ=(¯x,¯u) implies the optimality of the curve ¯x in the so-called adjoint problem
(AP¯σ)minℓ(x(T)),x∈X[¯u],
where X[¯u] is the set of all Carathéodory and Krasovskii–Subbotin solutions corresponding to the selectors w(t,x)∈argminυ∈UH(x,p(t,x),υ) and the function p is defined by the expression
p(t,x)≐¯ψ(t)+∇xℓ(x)−∇xℓ(¯x(t)).
Here the construction of feedback controls is the same as in (6.2) up to replacing ∇x¯p by p. In contrast to the conditions obtained above, the FMP has a variational form. In this setting it is assumed that the optimal process ¯σ
Here condition (a) yields directly (see [2]) the extremality of the process ¯σ, as the PMP is certainly ‘included’ in the FMP; this (slightly unnatural) inclusion is provided by the use of feedback Carathéodory solutions, which can be absent if ¯σ is not extremal. As illustrated by examples below, the PMP by no means follows from the ‘essential’ part of the FMP, condition (b).
Let us compare the FMP with Theorems 1 and 2. To do this we reveal the relationship between the function p and the reference solution ¯p of the transport equation. If ℓ is linear and the assumption (A6) holds, then p=¯ψ is obviously the gradient ∇x of the linear approximation of the solution ¯p in a neighbourhood of the characteristic curve ¯x:
The function η is a rather crude approximation of the ‘exact majorant’ ¯p, which combines (8.2) with the linear approximation of the objective function: ℓ(x)≈ℓ(¯x(t))+∇xℓ(¯x(t))⋅(x−¯x(t)).
If (P) is linear in state, then Theorem 2 can be weakened by restricting it to the sliding modes, which are applied only to the set of curves X[¯u] (our formulation also allows other curves). Then the result obtained reduces to (8.2). Hence the proposed necessary condition is not weaker than the FMP. In the bilinear problem the FMP (in its essential part (b)) coincides with the statement of Theorem 1.
8.3. Examples
We begin by illustrating an application of Theorem 2 in comparison with the PMP and FMP.
in which the infimum is attained at the sliding mode (ˇx,ˇν), ˇνt≡12(δ−1+δ1). We write out the auxiliary constructions of the PMP. The Pontryagian has the form
H(x,p,υ)=H(x,px,py,pz,υ)=pxυ+pyυ2+12pzx2,
and the conjugate trajectory ψ=(ψx,ψy,ψz) is defined by the conditions
ψy≡−y(1),ψz≡1;˙ψx=−xandψx(1)=0.
(1) Improving an extremal process. Consider the control ¯u≡0 that generates the direct and conjugate trajectories
¯x=¯y=¯z≡0;¯ψ=(¯ψx,¯ψy,¯ψz)≡(0,0,1).
Clearly, the process ¯σ=(¯x,¯y,¯z;¯u) is a singular extremal and I[¯u]=0. We apply the FMP and Theorem 2 to this process. Here p≡(0,−y,1) and ¯pt=(1−t)x2/2−y2/2+z. The FMP offers for comparison all ‘feedback’ controls
Both these classes include descent strategies generating, in particular, the optimal sliding mode by the Krasovskii–Subbotin scheme. However, it is only the second class that provides important information, namely, the explicit structure of the feedback: since y(0)=0, the FMP gives back the original set of (all admissible) program controls, which means that it actually degenerates.
(2) The FMP: lack of improvement of a nonextremal process. The control ¯u≡1 as the reference control gives
and define the multivalued mapping ¯U(t,x,y,z) as the solution of the minimization problem
H(x,p(t,x,y,z),υ)−x22=1−t22υ−(y+1−t)υ2→min,|υ|⩽1.
We are interested in the case when y⩾0 and t∈[0,1]. Note that the quantity γ=γ(t,y)≐(y+1−t) is strictly positive for t∈[0,1) (the point t=1 can be ignored). Hence we have to minimize the concave function
βυ−γυ2=γυ(βγ−υ),γ>0.
Here the quantity β=β(t)≐(1−t2)/2 is also strictly positive for t∈[0,1), and therefore the minimization over υ∈[−1,1] gives us the extremal mapping ¯U≡{−1}. In other words, all feedback controls of the FMP are exhausted by the single function w≡−1, which generates the unique (in any sense) solution
−x(t)=y(t)=¯x(t)=¯y(t)=t,z(t)=¯z(t)=t36
with the same cost as the reference process: I[u≡−1]=−1/3. We arrive at the conclusion that the FMP does not improve the nonextremal process under consideration.
Thus, the sets of processes satisfying the PMP and the comparison condition from the FMP are not proper subsets of each other. This means that the PMP and FMP are in fact two independent necessary conditions of optimality.
(3) Theorem2: improvement of the control ¯u≡1. To apply Theorem 2 we find that
over υ∈[−1,1] for small values of t gives us a single control δ−1, which is realized till the moment of time t=1/3. On the interval [1/3,1] the sliding mode ν=34δ1+14δ−1 is applied, for which x(t)=(t−1)/2 and ¯U={±1}. As a result, the new process looks as follows:
and it has cost I[ν]=−13/27<−1/3. We see that in the case under study Theorem 2 improves both the PMP and FMP.
As shown in [3], in this example even the second-order FMP ‘does not work’. The improvement is due to a certain strengthening of the FMP by passing to the extremal of the convexified problem on a series of controls which is equivalent to ¯u (see [3], Proposition 1).
In Example 1 we see ‘looping’ iterations of FMP, however, the feedback controls obtained there does not disimprove the reference process in any case. Let us show that the use of the linear majorant in the nonlinear problem can cause strict disimprovement of a nonextremal control.
The dynamics of the system is organized so that the strategy u≡1 is optimal for large values of the parameter ε and counteroptimal for small values. Consider a nonextremal control ¯u≡0 and write out all objects of the PMP:
H=pxυ+py(x2−x3);ψy≡1;˙ψx=3x2−2x,ψx(1)=0.
The reference ‘bitrajectory’ has the form
¯x≡ε,¯y=(ε2−ε3)t;¯ψx=ε(3ε−2)(t−1),
and I[¯u]=ε2−ε3. For 0<ε<2/3 the only solution of the problem
pxυ=¯ψx(t)υ≐ε(3ε−2)(t−1)υ→min,|υ|⩽1,
on [0,1) is the program w≡−1, which generates the trajectory
x[w](t)=ε−t,y[w](t)=(t−ε)33+(t−ε)44+ε33−ε44
with cost
I[w]=(1−ε)33+(1−ε)44+ε33−ε44>ε2−ε3=I[¯u],0<ε≪1
(this inequality becomes obvious as ε tends to zero).
As in the previous problem, here the phase variables and control variable are separated and the functional is linear. Therefore, the control generated by the FMP turns out to be a program control. At the initial instant the FMP ‘determines’ correctly the direction of local descent, but the absence of feedback makes it impossible to adapt the strategy afterwards. As a result, the FMP ‘makes a mistake’, and this mistake is stable under small variations of the parameter ε.
In contrast to the FMP, Theorem 1 ‘introduces’ the lacking feedback and generates a locally optimal synthesis w(t,x)=−sign(2−3x)x(1−t) (here w(0,ε)=−1 for small values of ε and w(0,ε)=1 for large values); we leave out the calculations and only indicate that ¯pt(x,y)=(x2−x3)(1−t)+y).
The last example illustrates the phenomenon of the absence of feedback improvement of a PMP-extremal not corresponding to a point of local minimum of the objective function.
Here H=(px−x)υ+(α/2)υ2. For any α⩾0 the control ¯u≡0 corresponds to an extremal (singular for α=0 and strict otherwise) with cotrajectory ¯ψx≡0, ¯ψy=¯ψz≡1. It is easily seen that for α∈(0,1) the vector (¯y(1),¯z(1)) is neither a point of local minimum, nor a point of local maximum of the function ℓ(x,y,z)=y+z on the attainability set of the convexified system10[x]10For α=0 the extremal under consideration is a point of global maximum. (for example, for a constant u≠0 we have I[u]<0, and for a sliding mode νλ,ε=(1−λ)δ0+(λ/2)(δε+δ−ε), where ε,λ∈(0,1], it turns out that I[νλ,ε]>0). Since the problem is linear in state, the FMP and Theorem 1 produce the same result (¯p≡y+z). For α=0 (when the problem is bilinear) this result is the feedback control w(x)=signx generating, in particular, the global solutions u≡±1. However, for α>0 the only feedback strategy
w(x)={−1,x<−α,xα,|x|⩽α,1,x>α,
admitted for comparison leaves the reference point unchanged.
In conclusion, we comment on the role of feedback conditions in calculus of variations: in contrast to the PMP, which sometimes provides information about the structure and properties of an unknown optimal process, Theorems 1 and 2 (as well as the FMP) cannot be applied without the knowledge of the reference approximation ¯u. From the analytic point of view it is reasonable to consider these results as an additional test for the optimality of the already constructed PMP-extremal control. The direct iterative application of these theorems produces an algorithm for the numerical solution of the optimization problems under study. The formulation and properties of this algorithm are discussed in the next section.
§ 9. Descent method
Let us turn again to problem (P) that is affine in control. On the set U×U we introduce the nonnegative functional
Clearly, the equality E[u,v]=0 is equivalent to the claim that (x[v],v) is a comparison process for (x[u],u), and E[u,u]=0 means the PMP-extremality of u.
Let us describe an iteration of the conceptual descent algorithm based on Theorem 1.
Descent method. Put u0=¯u and suppose that uk∈U has already been calculated.
Remark 5. Finding the function ¯p as a solution of the transport partial differential equation is a well-known computational problem, which can be solved by the classical grid methods of integration only in spaces of low dimension (actually, for n⩽3). However, with the use of the explicit representation ¯pt=ℓ∘¯Xt,1 and the Krasovskii–Subbotin method this obstacle can be avoided: let (ti,xi) be the current node of the polygonal approximation of the synthesized trajectory (§ 12). In accordance with the method of descent, the computation of the (i+1)st node assumes the knowledge of the gradient ∇x¯pt(x)≐¯J∗t(x)∇ℓ(Xt,1(x)) at the point (ti,xi). To achieve this it is sufficient to solve the phase system (1.1) and the linearized system (3.4) with the Cauchy conditions x(ti)=xi and Jti,ti=E, respectively.
The convergence of the method of descent is established by the following result.
Proposition 5. Suppose that assumptions (A1)–(A4) are fulfilled, and let (uk) be a sequence of controls produced by the method. Then the following hold.
(1) There exists a subsequence ωkj⊆ωk≐{(u2k,u2k+1)} that converges in the space U×U with the direct product topology, where U is equipped with the topology σ(L∞,L1).
(2) Let (u,v) be a partial limit of the sequence ωk. Then E[u,v]=0.
Proof. (1) Since U is a convex compact set in Rm, the family U is compact in the space L∞ with the topology σ(L∞,L1) by the Banach–Alaoglu theorem. Then by Tychonoff’s classical theorem the space U×U is compact in the direct product topology. Since L1 is separable, the weak* topology σ(L∞,L1) on U is metrizable, and therefore compactness is equivalent to sequential compactness, that is, the existence of a convergent subsequence.
(2) Under the above assumptions the operators u↦X[u](x) and u↦J[u](x) are continuous as functions U→C(I;Rn) for any x∈Rn (as systems (3.1) and (3.4) are affine in control). Hence for each x∈Rn the mapping u↦∇xp[u](x)≐∇ℓ∘X⋅,1[u](x)J[u](x) is a continuous function U→C(I;Rn). Since the function x↦∇xp[u](x), Rn→Rn, is also continuous, the mapping (u,v)↦∇xp[u](x[v]), U×U→C(I;Rn), is continuous as a composition of continuous functions. Then the operator (u,v)↦E[u,v], U×U→R, is also continuous as a composition of continuous operators.
By the definition of the residual E the equality E[uk,uk+1]=I[uk+1]−I[uk] holds for every k. The sequence of numbers {I[uk]} is monotone and bounded, hence convergent. Consequently, limk→∞E[uk,uk+1]=0. Then the subsequence ωkj≐{(u2kj,u2kj+1)}⊆ωk having the limit (u,v) satisfies the equality 0=limj→∞E[u2kj,u2kj+1]=E[u,v].
The proof of the proposition is complete.
The output of the method proposed above is a feedback (in a particular case, Pontryagin’s) extremal or a sequence of controls lying on the level set of the functional I corresponding to such an extremal (the case of ‘looping’). Its implementation with the use of the Krasovskii–Subbotin scheme produces an algorithm of dynamical optimization,11[x]11A halting criterion for the iterative process can consist in the condition E[uk,uk+1]=I[uk]−I[uk+1]<ε with a predetermined accuracy ε>0. which, in contrast to indirect algorithms based on the PMP [8], does not contain parameters of the ‘descent depth’12[x]12It can be said that the method of feedback variations contains a method for the computation of an optimal step of gradient descent. (hence no internal procedures for linear search). Of course, in practice this algorithm is applicable to the general problem (P) (without assumptions (A4)), as well as to the convexified problem (coP), without any modification. In the latter problem, to approximate sliding modes with property (7.5) one can use feedback controls of the form
wλ(t,x)≐(1−λ)¯u(t)+λw(t,x),λ∈(0,1],
where w obeys inclusion (6.2). This is consistent with the informal idea of the ‘locally optimal’ synthesis of control (see [23]).
§ 10. Appendix: auxiliary assertions
Below Lip(φ;A) is the minimum Lipschitz constant of the function φ:Rn→R on the set A⊆Rn and Lip(φ)≐Lip(φ;Rn); Br is the closed ball of radius r centred at the origin.
Lemma 1. Suppose that assumptions (A1)–(A3) are fulfilled. Then the following hold.
(1) For any compact set K⊂Rn the attainability set {Xs,t[ν](x)∣s,t∈I,x∈K,ν∈Y} of system (7.1) from the set K on the time interval [s,t] is contained in a closed ball BRf whose radius Rf=Rf(K,U)>0 depends only on Cf, K and U (and is the same for all s and t).
(2) For any compact set K⊂Rn and any s,t∈I, x∈K and ν∈Y the following estimate is valid:
(3) For any compact set K⊂Rn the functions t↦Xs,t[ν](x), t↦Xt,s[ν](x) and t↦Jt,s[ν](x) are Lipschitz continuous on I with Lipschitz constants LX(K,U) and LJ(K,U) which are common to all s∈I, x∈K and ν∈Y.
(4) The functions x↦Xt,s[ν](x) and x↦Jt,s(x)≐DxXt,s[ν](x) are locally Lipschitz continuous on Rn uniformly in s,t∈I and ν∈Y.
Proof. We restrict our consideration to the proof of the uniform local Lipschitz continuity of the family Jt,s[ν], s,t∈I, ν∈Y. All other facts are well known in the theory of ordinary differential equations.
Let x,z∈K⊂Rn, where K is a compact set; suppose for definiteness that s>t. It follows from (3.4) that
Lemma 2. Suppose that conditions (A1)–(A3) hold, ¯u∈U and K⊂Rn is a compact set. Then the restriction of the family of functions (t,x)↦¯Ht(x,υ), υ∈U, to the set I×K is a Lipschitz continuous function whose Lipschitz constant depends only on Cf, Lip(ℓ;K), LX(K,U), LJ(K,U), CJ(K,U), Rf(K,U) and Lf(K,U), where Lf(K,U) is a common Lipschitz constant for the mappings x↦f(x,υ), υ∈U, on the set K and LX, LJ, CJ and Rf were defined in Lemma 1.
Proof. In view of Lemma 1 the claim of the lemma follows from the representation ∇x¯pt=¯J∗t∇ℓ∘¯Xt,1 and the following estimates, which are valid for any s,t∈I, x,z∈K and υ∈U:
§ 11. Appendix: feedback controls and sliding modes
This section contains a brief review of the main facts about feedback controls and their relation to sliding modes. A feedback control for system (1.1) can be an arbitrary function (t,x)↦w(t,x), I×Rn→U. In general, this function is not even supposed to be measurable, which makes impossible its direct substitution (for u) into (1.1). The action of w on the system can be defined by use of various ‘sampling’ schemes, the most widely known (and simplest) of which is the Krasovskii–Subbotin scheme.
Definition 2. Let w:I×Rn→U be a given function. Consider the sequence {πN}⊂I of partitions of the interval I by points
{tNi}Ni=0,0≐tN0<tNi<⋯<tNN−1<tNN≐1,N⩾1,
with the properties πN+1⊃πN (the sequence {πN} is nondecreasing by inclusion) and
limN→∞min1⩽i⩽N|tNi−tNi−1|=0.
We define the sequence of ‘Euler’s polygons’ iteratively: xN(t)=x[w(tNi,xN(tNi))](t), t∈[tNi,tNi+1), i=0,…,N−1, and define xN at the point t=1 by continuity. Let {xNs}⊆{xN} be a subsequence of the sequence N↦xN that has a uniform limit x on I as s→∞. All such partial limits x=x[w] are called Krasovskii–Subbotin solutions of the closed equation (2.9).
By construction each of the indicated solutions is contained in the tube of trajectories of the convexified system (7.1), that is, presents a sliding mode. In other words, x[w]=x[ν] for some ν∈Y.
Suppose that assumptions (A2) and (A3) hold, and let KS[w] be the system of all Krasovskii–Subbotin solutions corresponding to w. It is clear that any subset of KS[w] is relatively compact in C(I;Rn), since all the trajectories of sliding modes are uniformly bounded and Lipschitz continuous with a common constant, and therefore satisfy the hypotheses of the Arzelà–Ascoli theorem. This yields the existence of (at least one) solution x[w] for each w. This solution is not unique even in simplest cases (see [11]).
If the operator u↦x[u] is continuous on U, then among the generalized controls ν that generate the curve x[w] as a trajectory of a sliding mode there is a measure with the property νt=δu(t) for almost all t∈I, u∈U (see [24], Corollary 1), which yields the following result.
Proposition 6. Suppose that assumptions (A2)–(A4) hold, and let w:I×Rn→U be an arbitrary feedback control and x=x[w] be one of the Krasovskii–Subbotin solutions of system (2.9) generated by it. Then there exists a program control u∈U such that x[u]=x[w].
If the piecewise constant strategies in the Krasovskii–Subbotin scheme are chosen in accordance with some (sufficiently regular) rule, then it is natural to expect that the last assertion also holds for the limiting sliding mode.
Proposition 7. Let
g∈C(I×Rn×U;R)
be a given nonnegative function such that the mapping (t,x)↦g(t,x,υ) is locally Lipschitz continuous uniformly in υ∈U. Further, let w:I×Rn→U be an arbitrary function, {πN}⊂I be a sequence of partitions (11.1) of the interval I which is nondecreasing by inclusion, {uN} be a sequence of piecewise constant controls of the form
uN(t)≡uN,i∈U for t∈ΔtNi≐(tNi−1,tNi),i=1,…,N,N⩾1,
and {xN≐x[uN]} be the corresponding sequence of Euler polygons. Finally, let ν be a partial limit of the sequence (νN) of Young measures for which νNt=δuN(t) and x=x[ν] be the corresponding partial uniform limit of the sequence {xN}. Suppose that the condition
g(tNi,xN(tNi),uN,i)=0∀tNi∈πN
holds for every N⩾1. Then
∫I×Ug(t,x(t),υ)dν(t,υ)=0.
Proof. For the sake of simplicity we restrict our consideration to uniform partitions of the interval I by the points tNi=i/N, i=0,…,N. It follows from the continuity of the function g, disintegration theorem and the definition of a Young measure that
Let K⊂Rn be a compact set containing the tube of trajectories (1.1), (1.2), and Lg(K) be the minimal common Lipschitz constant for the functions (t,x)↦g(t,x,υ), υ∈U, on I×K. Each term in the last sum can be estimated as follows:
Here the constant LX(K,U) is the same as in Lemma 1. Note that M does not depend on N. Now recalling that g⩾0 and summing over i=1,…,N we obtain the estimate
0⩽∫Idt∫Ug(t,xN(t),υ)dνNt(υ)⩽O(1N).
Finally, taking the limit as N→∞ completes the proof.
In conclusion we pay special attention to one detail concerned with the use of program controls as feedback ones. Each representative of the class u∈U can formally be interpreted as a feedback. Then different realizations of u — functions which differ from one another on a set of Lebesgue measure zero — generate, in general, different sets of Krasovskii–Subbotin solutions (since the set of functions w is subject to no factorization). This effect can be avoided by adopting a more accurate rule of selection, for example, by restricting the set of values u(tk) at points of the partition (11.1) to elements of the closure (the suitable one-sided closure at the endpoints) of the function u with respect to the Lebesgue measure [25].
Another way consists in abandoning the Krasovskii–Subbotin scheme in favour of an alternative sampling algorithm in which piecewise constant strategies for constructing the polygons are replaced by ‘piecewise program’ strategies (see, for example, [26]).
§ 12. Conclusion
This work presents a new approach to the theory of local extremum in problems of optimal control, which is alternative (but related) to Pontryagin’s principle. Advantages of the approach developed include, along with improving the classical necessary condition and a natural algorithmization, rather simple proofs of the main results. Further research will be devoted to aspects of the practical implementation of the descent method proposed here. The results obtained can easily (as we believe) be carried over to some classes of problems of optimal stochastic control and mean-field control.
Acknowledgments
The authors are grateful to V. A. Dykhta for the discussions of this work and a series of valuable suggestions that he made and to anonymous referees, whose constructive observations contributed to an essential improvement of the original version of this paper.
Bibliography
1.
V. A. Dykhta, “Weakly monotone solutions of the Hamilton–Jacobi
inequality and optimality conditions with feedback controls”, Autom. Remote Control, 75:5 (2014), 829–844
2.
V. A. Dykhta, “Variational necessary optimality conditions with feedback descent controls for optimal control problems”, Dokl. Math., 91:3 (2015), 394–396
3.
V. A. Dykhta, “Feedback minimum principle: variational strengthening of the concept of extremality in optimal control”, Izv. Irkutsk. Gos. Univ. Ser. Mat., 41 (2022), 19–39 (Russian)
4.
V. F. Krotov, “Global methods to improve control and optimal control of resonance interaction of light and matter”, Modeling and control of systems in engineering, quantum mechanics, economics and biosciences (Sophia–Antipolis 1988), Lect. Notes Control Inf. Sci., 121, Springer, Berlin, 1989, 267–298
5.
L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze and E. F. Mishchenko, The mathematical theory of optimal processes, Intersci. Publ. John Wiley & Sons, Inc., New York–London, 1962, viii+360 pp.
6.
A. Ya. Dubovitskii and A. A. Milyutin, “Extremum problems in the presence of restrictions”, U.S.S.R. Comput. Math. Math. Phys., 5:3 (1965), 1–80
7.
R. J. DiPerna and P. L. Lions, “Ordinary differential equations, transport theory and Sobolev spaces”, Invent. Math., 98:3 (1989), 511–547
8.
V. A. Srochko, Iterative methods of solution of optimal control problems, Fizmatlit, Moscow, 2000, 160 pp. (Russian)
9.
L. Ambrosio and G. Savaré, “Gradient flows of probability measures”, Handbook of differential equations: evolutionary equations, v. III, Handb. Differ. Equ., Elsevier/North-Holland, Amsterdam, 2007, 1–136
10.
V. I. Bogachev, Weak convergence of measures, Math. Surveys Monogr., 234, Amer. Math. Soc., Providence, RI, 2018, xii+286 pp.
11.
N. N. Krasovskiĭ and A. I. Subbotin, Game-theoretical control problems, Springer Ser. Soviet Math., Springer-Verlag, New York, 1988, xii+517 pp.
12.
M. A. Lavrentyev and L. A. Lyusternik, A course of variational calculus, GONTI–NKTI, Moscow–Leningrad, 1938, 192 pp. (Russian)
13.
A. Bressan and B. Piccoli, Introduction to the mathematical theory of control, AIMS Ser. Appl. Math., 2, Amer. Inst. Math. Sci. (AIMS), Springfield, MO, 2007, xiv+312 pp.
14.
N. Pogodaev, “Program strategies for a dynamic game in the space of measures”, Optim. Lett., 13:8 (2019), 1913–1925
15.
N. Pogodaev and M. Staritsyn, “Impulsive control of nonlocal transport equations”, J. Differential Equations, 269:4 (2020), 3585–3623
16.
A. A. Agrachev and Yu. L. Sachkov, Control theory from the geometric viewpoint, Encyclopaedia Math. Sci., 87, Control Theory Optim., II, Springer-Verlag, Berlin, 2004, xiv+412 pp.
17.
R. J. Kipka and Yu. S. Ledyaev, “Extension of chronological calculus for dynamical systems on manifolds”, J. Differential Equations, 258:5 (2015), 1765–1790
18.
R. Vinter, “Convex duality and nonlinear optimal control”, SIAM J. Control Optim., 31:2 (1993), 518–538
19.
F. H. Clarke and C. Nour, “Nonconvex duality in optimal control”, SIAM J. Control Optim., 43:6 (2005), 2036–2048
20.
V. A. Dykhta, “Nonstandard duality and nonlocal necessary optimality conditions in nonconvex optimal control problems”, Autom. Remote Control, 75:11 (2014), 1906–1921
21.
M. Staritsyn, N. Pogodaev, R. Chertovskih and F. Lobo Pereira, “Feedback maximum principle for ensemble control of local continuity equations: an application to supervised machine learning”, IEEE Control Syst. Lett., 6 (2022), 1046–1051
22.
C. Castaing, P. Raynaud de Fitte and M. Valadier, Young measures on topological spaces. With applications in control theory and probability theory, Math. Appl., 571, Kluwer Acad. Publ., Dordrecht, 2004, xii+320 pp.
23.
V. I. Gurman, The extension principle in control problems, 2nd revised and augmented ed., Fizmatlit, Moscow, 1997, 288 pp. (Russian)
24.
N. Pogodaev, “Optimal control of continuity equations”, NoDEA Nonlinear Differential Equations Appl., 23:2 (2016), 21, 24 pp.
25.
A. V. Arutyunov, D. Yu. Karamzin and F. L. Pereira, “Conditions for the absence of jumps of the solution to the adjoint system of the maximum principle for optimal control problems with state constraints”, Proc. Steklov Inst. Math. (Suppl.), 292, suppl. 1 (2016), 27–35
26.
M. Staritsyn and S. Sorokin, “On feedback strengthening of the maximum principle for measure differential equations”, J. Global Optim., 76:3 (2020), 587–612
Citation:
N. I. Pogodaev, M. V. Staritsyn, “Exact formulae for the increment of the objective functional and necessary optimality conditions, alternative to Pontryagin's maximum principle”, Sb. Math., 215:6 (2024), 790–822
\Bibitem{PogSta24}
\by N.~I.~Pogodaev, M.~V.~Staritsyn
\paper Exact formulae for the increment of the objective functional and necessary optimality conditions, alternative to Pontryagin's maximum principle
\jour Sb. Math.
\yr 2024
\vol 215
\issue 6
\pages 790--822
\mathnet{http://mi.mathnet.ru/eng/sm9967}
\crossref{https://doi.org/10.4213/sm9967e}
\mathscinet{http://mathscinet.ams.org/mathscinet-getitem?mr=4804039}
\zmath{https://zbmath.org/?q=an:07945696}
\adsnasa{https://adsabs.harvard.edu/cgi-bin/bib_query?2024SbMat.215..790P}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=001334620600005}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-85206914941}
Linking options:
https://www.mathnet.ru/eng/sm9967
https://doi.org/10.4213/sm9967e
https://www.mathnet.ru/eng/sm/v215/i6/p77
This publication is cited in the following 1 articles:
E. V. Goncharova, N. I. Pogodaev, M. V. Staritsyn, “Tochnye formuly prirascheniya tselevogo funktsionala v zadache optimalnogo upravleniya lineinym uravneniem balansa”, Izvestiya Irkutskogo gosudarstvennogo universiteta. Seriya Matematika, 51 (2025), 3–20