Peter Quell, Head of Portfolio Modelling for Market & Credit Risk, DZ Bank
Due to the pandemic, many banks experienced large numbers of backtest outliers in the first quarter of 2020 when they compared actual profit and loss numbers to VaR estimates. The simple reason was that their regulatory VaR systems were not able to adapt to rapidly changing market conditions as volatility spiked to ever-higher levels. The problem lies with the amount of data required to “train” the VaR system.
In a nutshell, the amount of training data is a compromise between having enough observations to compute relevant statistical quantities (here we want lots of data) and the degree to which this data is still relevant for the current environment (here we only want recent data). Whereas in quiet times collecting many data to reduce measurement uncertainty seems to be a priority in some model risk management approaches, the situation changes once there is a rapid shift as experienced in March 2020. Obviously, this may call for some mechanism to “learn” how to spot a crisis environment.
More details as well as additional Monte Carlo Studies can be found in the third edition of the book on Risk Model Validation by my colleague Christian Meyer and me (riskbooks.com).
Of course, that could only be a first step to improve classical regulatory VaR systems, since that approach does not care about other deficiencies (e.g. tail events). Are there any machine learning approaches that could be used to improve the situation?
Even though there are quite a number of success stories e.g. in image / speech recognition, the application of neural networks in time series analysis seems to be quite limited. The first issue relates to the low signal-to-noise ratio usually encountered in financial market data. Due to the large noise component, the algorithm might learn the noise pattern, even though (at least in theory) there is nothing relevant to learn there. A clear indication of overfitting in this case is a good performance of the algorithm applied to the training data versus a deteriorating performance when applied to new data. Because of their complexity, neural networks are prone to overfit training data, i.e. they fit spurious random aspects of the training data, as well as structure that is true for the entire population. There are techniques to handle overfitting, and all machine learning developers should be making use of these techniques.
Finally, what are the main challenges when it comes to the application of machine learning in a regulatory context
Explainability / interpretability: One should be in a position to explain how the algorithm makes a prediction or decision for one specific case at a time.
Robustness and transient environments: One should account for the fact that markets or environments can change, that calls for a good balance of adaptability and robustness.
Bias and adversarial attacks: Compared to classical statistics there is a much more prominent role for (training) data in machine learning applications.