Differential machine learning
Antoine Savine and Brian Huge, Danske Bank
AAD revolutionized both ML and finance. In ML, where it is called backpropagation, it allowed training deep neural networks (NN) in reasonable time and the subsequent success of e.g. computer vision or natural language processing. In finance, AAD computes vast numbers of differential sensitivities, with analytic accuracy and miraculous speed, giving us instantaneous model calibration and realtime risk reports of complex trading books.
AAD made differentials massively available in finance, opening a world of possibilities, of which we only scratched the surface. Realtime risk reports are only the beginning.
Recall how we produce Monte-Carlo risk reports. We first compute pathwise differentials, the sensitivities of (adequately smoothed) payoffs (sum of cashflows) wrt market variables, path by path across simulations, see this tutorial for a refresh. Then, we average sensitivities across Monte-Carlo paths to produce the risk report, collapsing a wealth of information into aggregated metrics. For example, consider a delta-hedged European call. The risk report only shows a delta of zero, from that point of view it might as well be an empty book. But before collapsing into a risk report, pathwise differentials measured the nonlinear impact of the underlying price on the final outcome in a large number of scenarios. This is a vast amount of information, from which we can e.g. extract the principal risk factors over the transaction lifetime, identify static hedges and measure their effectiveness, or learn pricing and risk function of market states on future dates.
In Danske Bank, where AAD was fully implemented in production early, the pathwise sensitivities of complex trading books are readily available for research and development of improved risk management strategies. Superfly analytics, Danske Bank’s quantitative research department, initiated a major project to train a new breed of ML models on pathwise differentials to learn effective pricing and risk functions. The trained models are capable of computing value and risks functions of the market state with near analytic speed, effectively resolving computation load of scenario based risk reports, backtesting of hedge strategies or regulations like XVA, CCR, FRTB or SIMM-MVA.
As expected, we found that models trained on differentials with adequate algorithms vastly outperform existing machine learning and deep learning (DL) methodologies.
We published a detailed description in a working paper, along with numerical results and a vast amount of additional material (mathematical proofs, practical implementation details and extensions to other ML models than NN) in online appendices. We also posted a simple TensorFlow implementation on the companion GitHub repo. You can run it on Google Colab. Don’t forget to enable GPU support in the Runtime menu.
As a simple example, consider 15 stocks in a correlated Bachelier model. We want to learn the price of a basket option as a function of the 15 stocks. Of course, the correct solution is known in closed form so we can easily measure performance. We trained a standard DL model on m simulated examples with initial state X (a vector in dimension 15) along with payoff Y (a real number), a la Longstaff-Schwartz. We also trained a differential DL model on a training set augmented with pathwise differentials Z=dY/dX, and tested performance against the correct formula on an independent dataset.
Differential training learns with remarkable accuracy on small datasets, making it applicable in realistic situations. The results carry over to transactions and simulation models or arbitrary complexity. In fact, the improvement from differential ML considerably increases with complexity. For example, we simulated the cashflows of a medium sized Danske Bank netting set, including single and cross currency swaps and swaptions in 10 different currencies in Danske Bank’s proprietary XVA model, where interest rates are simulated with a four-factor, sixteen-state multifactor Cheyette model per currency, and compared performance of a standard DL model trained on 64k paths with a differential model trained on 8k paths. Evidently, we don’t have a closed form to compare with, instead, we ran nested simulations overnight as a reference. The chart below shows performance on an independent test set, with correct (nested) values on the horizontal axis and predictions of the trained ML models on the vertical axis.
We see that differential ML achieves high quality approximation on small datasets, unthinkable with standard ML even on much bigger training sets. A correct articulation of AAD with ML gives us unreasonably effective pricing approximation in realistic time.
Besides, the core idea is very simple, and its implementation is straightforward, as seen on the TensorFlow notebook.
The trained neural net approximates the price in its output layer by feedforward inference from the state variables in the input layer. The gradient of the price wrt the state is effectively computed by backpropagation. By making backprop part of the network, we get a twin network capable of predicting prices together with Greeks.
All that remains is train the twin network to predict correct prices and Greeks by minimisation of a cost function combining prediction errors on values and differentials, as compared with differential labels, a.k.a. the pathwise differentials computed with AAD:
Differential training imposes a penalty on incorrect Greeks in the same way that traditional regularisation like Tikhonov favors small weights. Contrarily to conventional regularisation, differential ML effectively mitigates overfitting without introducing a bias. To see this, consider training on differentials alone. We prove in the mathematical appendix that the trained model converges to an approximation with all the correct differentials, i.e. the correct pricing function modulo an additive constant. Hence, there is no bias-variance tradeoff or necessity to tweak hyperparameters by cross validation. It just works.
Differential machine learning is more similar to data augmentation, which in turn may be seen as a better form of regularisation. Data augmentation is consistently applied e.g. in computer vision with documented success. The idea is to produce multiple labeled images from a single one, e.g. by cropping, zooming, rotation or recoloring. In addition to extending the training set for negligible cost, data augmentation teaches the ML model important invariances. Similarly, derivatives labels, not only increase the amount of information in the training set for very small cost (as long as they are computed with AAD), but also teach ML models the shape of pricing functions.
Prior to joining Danske Bank in 2013, Antoine held multiple leadership positions in quantitative finance, including Head of Research at BNP-Paribas. He also teaches volatility and computational finance at Copenhagen University. He is best known for his work on volatility and rates, and he was influential in the wide adoption of cashflow scripting in finance. At Danske Bank, Antoine wrote the book on AAD with Wiley and was a key contributor to the bank’s XVA system, winner of the In-House System of the Year 2015 Risk award.
Brian works in Danske Bank quantitative research since 2001 and produced very noticeable contributions in quantitative finance with Jesper Andreasen, including the iconic ZABR: expansion for the masses, or the LVI volatility interpolation method coupled with the Random Grid algorithm, winner of the Quant of the Year 2012 Risk award. All those algorithms are implemented in Superfly, Danske Bank’s proprietary risk management platform, and used every day for the management of the bank’s trading books.
Antoine and Brian both hold PhDs in mathematics from Copenhagen University. They are regular speakers on QuantMinds and RiskMinds events.
Neural networks with asymptotics control
Webinar presented by Alexandre Antonov, Chief Analyst, Danske Bank
Alexandre Antonov, Chief Analyst, Danske Bank
Artificial neural networks (ANNs) have recently been proposed as accurate and fast approximators in various derivatives pricing applications. ANNs typically excel in fitting functions they approximate at the input parameters they are trained on, and often are quite good in interpolating between them. However, for standard ANNs, their extrapolation behaviour – an important aspect for financial applications – cannot be controlled due to complex functional forms typically involved. We overcome this significant limitation and develop a new type of neural networks that incorporate large-value asymptotics, when known, allowing explicit control over extrapolation.
This new type of asymptotics-controlled ANNs is based on two novel technical constructs, a multi-dimensional spline interpolator with prescribed asymptotic behaviour, and a custom ANN layer that guarantees zero asymptotics in chosen directions. Asymptotics control brings a number of important benefits to ANN applications in finance such as well-controlled behaviour under stress scenarios, graceful handling of regime switching, and improved interpretability.
Looking forward to backward-looking rates: completing the generalised forward market model
Webinar presented by Fabio Mercurio, Global Head of Quant Analytics, Bloomberg L.P.
Fabio Mercurio, Global Head of Quant Analytics, Bloomberg L. P.
In this talk, we show how the generalised forward market model (FMM) introduced by Lyashenko and Mercurio (2019) can be extended to make it a complete term-structure model describing the evolution of all points on a yield curve, as well as that of the bank account. The ability to model the bank account, in addition to the forward curve, is going to be of crucial importance once Libor rates are replaced with setting-in-arrears rates in derivative and cash contracts, where fixings are defined in terms of the realised bank account values.
To achieve our goal, we “embed” the FMM into a Markovian HJM model with separable volatility structure by aligning the HJM and FMM dynamics of the forward term rates modelled by the FMM. This FMM-aligned HJM model is effectively a hybrid between an instantaneous forward-rate model and a LMM, and shares the advantages of both approaches, with the caveat that the number of variables to simulate could be too high.
A more efficient approach is then derived by expressing the zero-coupon bonds and the bank account as functions of the modelled forward term rates and their volatilities. In this FMM-HJM construct, FMM acts as a “coarse” model capturing a “macro” structure of the market such as the covariance structure of the set of modelled rates, while the FMM-aligned HJM serves as a finer modelling environment used to fill the gaps left by the coarser FMM.
The problem of recovering the whole yield curve evolution from the modelled set of Libor rates has been extensively discussed in the LMM literature, and is often referred to as Libor-rate interpolation, or front- and back-stub interpolations. Contrary to existing methods, the approach we propose is not only arbitrage-free by construction, but it also allows for the generation of bank account values.