Abstract:
Let Ξd=(ξt,Ft,Pdx)t∈N be a family of Markov processes on (Ω,F) with values in(X,X), d∈D. Any sequence
δ={d0(x0),d1(x0,x1),…,dk(x0,…,xk),…},
where dk:(X,X)k+1→(D,D), D is a σ-algebra in D, is called a control policy. For each control policy δ, a controlled Markov process Ξδ=(ξt,Ft,Pδx)t∈N is constructed.
Let ¯M be the set of stopping times with respect to {Ft,t∈N⋃{+∞}}, Δ be the set of control policies,
¯Σ=¯M×Δ;Σ={[τ,δ]∈¯Σ:Pδx{τ<∞}=1},Σn={[τ,δ]∈Σ:Pδx{τ⩽n}=1}.
Let g(x) be a real ¯X-measurable function, g−(x)⩽k<∞, and
¯s(x)=sup[τ,δ]∈¯ΣMδxg(ξτ),g(ξ∞)=¯limg(ξn);s(x)=sup[τ,δ]∈ΣMδxg(ξτ),sn(x)=sup[τ,δ]∈ΣnMδxg(ξτ).
We show that the gain functions ξ(x) and s(x) are equal and s(x) is the least excessive majorant of g(x). For each ε>0 and a probability measure μ on (X,X), (μ,ε,s)-
and (μ,ε,sn)-optimal strategies [τ,δ] are constructed. We also show that sn(x)→s(x) as n→∞.
Citation:
A. Barbarošie, “On the theory of controlled Markov processes”, Teor. Veroyatnost. i Primenen., 22:1 (1977), 55–71; Theory Probab. Appl., 22:1 (1977), 53–69