Optimality conditions
Assume u is the solution of a problem parameterized by r. The state equation \(E(r,u)=0\) defines implicitly u with respect to r. The derivative \(A=E_u\) is the matrix of the system, assumed regular. The derivative of the implicit function gives:
that is:
These are the sensitivities of the solution with respect to the parameters.
Having a cost function \(J = J(r,u)\), the Euler-Lagrange conditions of optimality are derived -derivated- from the Lagrangian function \(L=J+\lambda.E\), where \(\lambda\) is the lagrange multiplier of the state equation, named adjoint state.
The variable \(\lambda\) can be deduced from the second equation re-written as \(A^T\lambda=-J_u\). Substituting \(\lambda\) in the first equation, we get
as could have been derived from the composition of \(J=J(r,u(r))\). This is the total derivative of J vs r. It needs only one resolution of a linear system the matrix of which is \(A^T\), giving the adjoint state \(\lambda\) - that method is called adjoint, meanwhile the direct method needs the resolution of as many linear systems of matrix A as there are parameters r, that is the sensitivities. This is why it is preferred in gradient-base methods of optimisation. For higher derivatives or higher order methods, like second-order so-called Newton’s algorithm, the adjoint state and the derivatives of u are useful.
Solving the optimality conditions by Newton’s algorithm needs to get one order higher derivative than in the simple resolution of the state equation, since the first order derivatives of the state function appear yet in the conditions. Derivating the matrix of the linearized state equation? Isn’it too hard and too big, that is foolish?
In fact, as we want the linearized system of optimality conditions reduced to the parameters space, these worrying derivatives appear in a matrix product (Schur complement in the right-hand side), so that it should remain feasible with standard computations of right-hand sides and matrices.
Inequality conditions
Optimisations seldom come without these inequalities, such as some interval of definition for r or other constraints ensuring the validity of the state equation. Just consider a domain with some thickness necesseraliy positive…
Some examples…
… to come.