Exploitation of the Value Function in a Bilevel Optimal Control Problem

. The paper discusses a class of bilevel optimal control problems with optimal control problems at both levels. The problem will be transformed to an equivalent single level problem using the value function of the lower level optimal control problem. Although the computation of the value function is diﬃcult in general, we present a pursuit-evasion Stackelberg game for which the value function of the lower level problem can be derived even analytically. A direct discretization method is then used to solve the transformed single level optimal control problem together with some smoothing of the value function.


Introduction
Bilevel optimization problems occur in various applications, e.g. in locomotion and biomechanics, see [15,20,2,1], in optimal control under safety constraints, see [18,19,12], or in Stackelberg dynamic games, compare [24,10]. An abstract bilevel optimization problem (BOP) reads as follows: Minimize F (x, y) with respect to (x, y) ∈ X × Y subject to the constraints G(x, y) ∈ K, H(x, y) = 0, where M (x) is the set of minimizers of the lower level optimization problem Minimize f (x, y) w.r.t. y ∈ Y s.t. g(x, y) ∈ C, h(x, y) = 0.
Herein, X, Y are (finite or infinite) Banach spaces, F, f : sufficiently smooth functions into Banach spaces V u , V , W u , W , and K ⊂ W u , C ⊂ W are convex and closed cones.
Bilevel optimization problems turn out to be very challenging with regard to both, the investigation of theoretical properties and numerical methods, compare [8]. Necessary conditions have been investigated, e.g., in [25,9]. Typical solution approaches aim at reducing the bilevel structure into a single stage optimization problem. In the MPCC approach a single level optimization problem subject to complementarity constraints (MPCC) is obtained by replacing the lower level problem by its first order necessary conditions, compare [1]. However, if the lower level problem is non-convex, the MPCC is not equivalent in general to the original bilevel problem since non-optimal stationary points or non-global solutions may satisfy the necessary conditions as well. Still, the approach is often used owing to a well-established theory and the availability of numerical methods for MPCCs, especially for finite dimensional problems.
In this paper we focus on an equivalent transformation of the bilevel problem to a single level problem (see [7] for an alternative way). The equivalence can be guaranteed by exploitation of the value function V : X → R of the lower level problem, which is defined as An equivalent reformulation of the bilevel optimization problem is then given by the following single level problem, compare [22,25,26]: The advantage of the value function approach is its equivalence with the bilevel problem. On the downside one has to be able to compute the value function, which in general might be intractable. Moreover, the value function is non-smooth in general (often Lipschitz continuous) and hence suitable methods from non-smooth optimization are required to solve the resulting single level problem. In Section 2 we discuss a class of bilevel optimal control problems that fit into the problem class BOP. In Section 3 we we are able to derive an analytical expression for the value function for an example and present numerical results. The new contribution of this paper is the discussion of a particular example, which combines the analytical expression of the value function of the lower level problem and a direct discretization method for the reformulated single level problem. This problem may serve as a test problem for theoretical and numerical investigations. The problem exhibits already most features of more challenging problems such as non-convexity, pure state constraints on the upper level problem as well as control constraints on both levels.

A class of bilevel optimal control problems
Let T > 0, be the fixed final time, be sufficiently smooth mappings. With these definitions the following class of bilevel optimal control problems (BOCP) subject to control-state constraints and boundary conditions fits into the general bilevel optimization problem BOP.
where M (x(0), x(T ), p) is the set of minimizers of the lower level problem OCP L (x(0), x(T ), p): Herein, (x, u, p) ∈ X are the state, the control, and the parameter vector of the upper level problem and (y, v, q) ∈ Y are the state, the control, and the parameter vector of the lower level problem. Please note that the lower level problem only depends on the initial and terminal states x(0), x(T ) and the parameter vector p of the upper level problem. The value function V is then a Remark 1. In a formal way the problem class can be easily extended in such a way that the lower level dynamics f and the lower level control-state constraints s depend on x, u as well. However, in the latter case the value function of the lower level problem would then be a functional V : X → R, i.e. a functional defined on the Banach space X rather than a functional defined on the finite dimensional space R nx × R nx × R np . Computing the mapping V : X → R numerically would be computationally intractable in most cases.
Using the value function V we arrive at the following equivalent single level optimal control problem subject to control-state constraints, smooth boundary conditions, and an in general non-smooth boundary condition with the value function.
It remains to compute the value function V and to solve the potentially nonsmooth single level optimal control problem. Both are challenging tasks owing to non-smoothness and non-convexity. The value function sometimes can be derived analytically as we shall demonstrate in Section 3. Otherwise, if Bellman's optimality principle applies, the value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, see [3]. Various methods exist for its numerical solution, compare [21,14,11,17,4]. The HJB approach is feasible if the state dimension n y does not exceed 5 or 6. If no analytical formula is available and if the HJB approach is not feasible, then a pointwise evaluation of V at (x(0), x(T ), p) can be realized by using suitable optimal control software, e.g. [13]. However, if the lower level problem is non-convex, then it is usually not possible to guarantee global optimality by such an approach. The single level problem can be approached by the non-smooth necessary conditions in [6,5]. Alternatively, direct discretization methods may be applied. The non-smoothness in V in (7) has to be taken into account by, e.g., using bundle type methods, see [23], or by smoothing the value function and applying standard software. Finally, the HJB approach could also be applied to the single level problem again.

A follow-the-leader application
We consider a pursuit-evasion dynamic Stackelberg game of two vehicles moving in the plane. Throughout we assume that the evader knows the optimal strategy of the pursuer and can optimize its own's strategy accordingly. This gives rise to a bilevel optimal control problem. The lower level player (=pursuer P) aims to capture the upper level player (=evader E) in minimum time T . The evader aims to minimize a linear combination of the negative capture time −T and its control effort. The players have individual dynamics and constraints. The coupling occurs through capture conditions at the final time.

The bilevel optimal control problem
The evader E aims to solve the following optimal control problem, called the upper level problem (OCP U ): subject to the constraints where M (x E (T ), y E (T )) denotes the set of minimizers of the lower level problem OCP L (x E (T ), y E (T )) below.
The equations of motion of E describe a simplified car model of length > 0 moving in the plane. The controls are the steering angle velocity w and the acceleration a with given bounds ±w max , a min , and a max , respectively. The velocity v E is bounded by the state constraint v E (t) ∈ [0, v E,max ] with a given bound v E,max > 0. The position of the car's rear axle is given by z E = (x E , y E ) and its velocity by v E . ψ denotes the yaw angle and α 1 , α 2 ≥ 0 are weights in the objective function. The initial state is fixed by the values x E,0 , y E,0 , ψ 0 , δ 0 , v E,0 . The final time T is determined by the lower level player P, who aims to solve the following optimal control problem, called the lower level problem OCP L (x E,T , y E,T ) with its set of minimizers denoted by M (x E,T , y E,T ): Minimize T = T 0 1dt subject to the constraints z P (t) = v P (t), z P (0) = z P,0 , z P (T ) = (x E,T , y E,T ) , Herein, z P = (x P , y P ) , v P = (v P,1 , v P,2 ) , and u P = (u P,1 , u P,2 ) denote the position vector, the velocity vector, and the acceleration vector, respectively, of P in the two-dimensional plane. z P,0 = (x P,0 , y P,0 ) ∈ R 2 is a given initial position. u max > 0 is a given control bound for the acceleration. The dynamics of the pursuer allow to move in x and y direction independently, which models, e.g., a robot with omnidirectional wheels.

The lower-level problem and its value function
The lower level problem admits an analytical solution. To this end, the Hamilton function (regular case only) reads as The first order necessary optimality conditions for a minimum (ẑ P ,v P ,û P ,T ) are given by the minimum principle, compare [16]. There exist adjoint multipliers and for all u P ∈ [−u max , u max ] 2 for almost every t ∈ [0,T ]. The latter implieŝ In this case, the minimum principle provides no information on the singular control except feasibility. Notice furthermore that not all control components can be singular since this would lead to trivial multipliers in contradiction to the minimum principle. Hence, there is at least one index i for which the control componentû P,i is non-singular. In the non-singular case there can be at most one switch of each componentû P,i , i ∈ {1, 2}, in the time interval [0,T ], since λ v,i is linear in time. The switching timet s,i for the i-th control component computes tot s,i = c v,i /c z,i if c z,i = 0. We discuss several cases for non-singular controls.
Case 1: No switching occurs inû P,i , i.e.û P,i (t) ≡ ±u max for i ∈ {1, 2}. By integration we obtainv P,i (t) = ±u max t and thusv P,i (T ) = 0 in contradiction to the boundary conditions. Consequently, each non-singular control component switches exactly once in [0,T ].

By integration and the boundary conditions we find
The boundary conditions forv P,i (T ) andẑ P,i (T ) yield Case 3: The switching structure for control component i ∈ {1, 2} iŝ This case can be handled analogously to Case 2 and we obtain The above analysis reveals the shortest timesT i , i ∈ {1, 2}, in which the i-th state can reach its terminal boundary condition. The minimum timeT for a given terminal position is thus given by the value function V of OCP L (x E,T , y E,T ) (=minimum time function) with  The equivalent single level problem (SL-OCP) reads as follows: Minimize (8) subject to the constraints (9)- (14), (15)- (17) with (x E,T , y E,T ) = (x E (T ), y E (T )) and the non-smooth constraint

Numerical results
For the numerical solution of the single level problem SL-OCP we applied the direct shooting method OCPID-DAE1, [13]. The non-smooth constraint T ≤ V (x E (T ), y E (T )) with V from (18) was replaced by a continuously differentiable constraint which was obtained by smoothing the maximum function and the absolute value function in (18). Figure 2 shows a numerical solution of the pursuitevasion Stackelberg bilevel optimal control problem for the data v E,0 = 10, ψ E (0) = π/4, α 1 = 10, α 2 = 0, w max = 0.5, v E,max = 20, a min = −5, a max = 1, u max = 5, N = 50, T ≈ 18.01. Figure 3 shows several trajectories for the pursuer and the evader for different initial yaw angles covering the interval [0, 2π).

Remark 2.
The constraint (19) may become infeasible under discretization. Instead, the value function V h of the discretized lower level optimal control problem should be used. However, since V h is hardly available for all kinds of discretizations, we use instead the relaxed constraint T ≤ V (x E (T ), y E (T )) + ε with some ε > 0.

Conclusions and Outlook
The paper discusses a specific bilevel optimal control problem and its reformulation as an equivalent single level problem using the value function of the lower level problem. For a sample problem it is possible to compute the value function analytically and to solve the overall bilevel problem numerically using a direct discretization method. This first numerical study leaves many issues open that have to be investigated in future research for the general problem setting. Amongst them are smoothness properties of the value function, representation of subdifferentials, the development of appropriate solution methods for nonsmooth problems, and the derivation of necessary (and sufficient) conditions of optimality for the class of bilevel optimal control problems.