Mathematical models of physics abound with gradients, divergences, laplacians, curls, which are all spatial derivatives. Time derivatives occur in unstationnary models.
We are talking here of a more general derivation, aiming at as far as the derivation of the models themselves with respect to different kinds of variables.
Motivation
Such a derivation occurs as soon as the equations are established when they derive from a potential, or because their resolution uses the information given by the Jacobian when the model is linearized.
Beyond the resolution of the equation, the optimization of the parameters towards some criterion upon the solution will benefit from the knowledge of the derivatives of the latter with respect to these, also called sensitivity of the solution to the parameters.
Very different methods of differentiation
Whether they are unknown variables or data, once discretized, the problem consists in deriving a mathematical formula with respect to a list of its calculation inputs. Starting from a calculation code producing the result of the mathematical formula to be derived, there are three methods for obtaining a new code providing the derivative.
1. Finite difference
This is the simplest method: it consists in re-using the original code by slightly varying the input parameter considered and in making the difference between the two results, which is divided by the parameter deviation. With a second calculation for a value with the opposite deviation, a result is obtained with a second order accuracy instead of the first order.
Unfortunately, mathematics is contradicted by computer science, whose rounding will prevent differences from being accurate, because the decimals involved will be smaller and smaller, but always relative to the same central value. The precision is therefore necessarily limited. Moreover, the choice of the difference -epsilon- may sometimes never be satisfactory if the quantity to be derived is a sum the terms of which are of different order of magnitude without correlation to the order of magnitude of their variations… The small term of significant variation will be crushed!
Semi-analytical method
When the solution comes from the resolution of a linear system, rather than operating the difference on the true solutions, it is preferable to do it on the second members and to resolve the corresponding system. It’s faster and better. The matrix needs of course be constant or converged in the linearized case of a Newton algorithm. The analysis hence says the derivation can be transfered to the right-hand side. When the latter is calculated by finite difference, the method is called semi-analytical.
To do this, the code must no longer be a black box, as we need to get access to the resulting right-hand side, but that is only high-level insight. The so called semi-analytical sensitivity is proposed by some publishers.
2. Code differentiation (algorithmic, AD)
This is theoretically the easiest to use, thanks to automatic code differentiation tools: the code is provided, the output and input of which are specified, and the tool generates the code calculating the derivative. To do this, line after line, the tool writes the code containing value and derivative – for the basic functions have a known derivative – by direct chaining or inverse chaining (adjoint method). Initially limited to Fortran for its lack of pointers, the method was extended to C++ by other tools that took advantage of the overload of language operators to minimize the reengineering of the original code. The operations are overloaded to produce both value and derivative chaining instead of the sole value.
The algorithmic differentiation method has its best advantage in the case of complex inherited codes because, relying on the same calculations, it avoids debugging new code. It nevertheless requires expert human assistance to be operational and effective.
3. Symbolic derivation
Ideally, one would like to symbolically derive the formula and encode its calculation. Doing it by hand leads to a development time that was aimed to be avoided, while going through formal computational software (computer-algebra sofware, or C.A.S.) has some major drawbacks.
- On the one hand, it is difficult to express the formula when it is based on complex concepts such as finite elements, integration schemes and meshes.
- On the other hand, the developed result loses the tensor structure from which it is derived and does not give the best performance. The optimization of the code generation by these tools is based on the identification of the repetition of basic operations such as multiplications and sums of scalar variables, whereas there is theoretically better to do using tensors.
According to the authors of the Algorithmic Differentiation Workshop organized by the GDR – Calculation Research Group of the CNRS on January 24, 2019 in Paris, this can lead to an explosion in the complexity of expressions.