Reinforced learning for hedging: one size fits all?

Dr Svetlana Borovkova and Alexandru Giurca

Will a reinforced learning algorithm, trained on just one type of option and synthetic data, be able to cope with hedging a variety of options in a real trading environment?

Machine learning is resolutely marching its way into finance. From credit scoring to option pricing and from fraud detection to algo trading – machine learning algorithms are proving their effectiveness in many financial applications. The massive generalization power of these algorithms allows them to tackle a wide variety of problems. Machine learning is particularly successful in situations where there is a clear underlying nonlinear

function (which can be very complex and completely unknown), relating the inputs with the output. An excellent example of such a situation is the problem of option pricing, where a highly nonlinear relationship connects input parameters such as the volatility, price of the underlying, time to maturity, dividends, interest rates to the option price. Successful applications of machine learning in this area have been already extensively documented.

state/environment --> action --> reward

Another, related area is hedging of options. Here also, a complex nonlinear function relates the option’s parameters to the option’s delta and other greeks. This problem is not only nonlinear – it is also a sequential decision-making problem, where hedging decisions must be continuously made throughout the lifetime of an option, and these decisions are accompanied by a clear notion of “reward”. A machine learning tool excellently suited for this type of problems is the so-called reinforcement learning: a class of machine learning algorithms which sequentially tune their parameters according to the rewards associated with actions. The main components of reinforcement learning: state/environment --> action --> reward – are all present in a hedging problem, where the action can be thought of as a delta-hedging decision and the reward – as hedging costs, or a P/L of your hedged portfolio.Reinforcement learning has been recently applied to the hedging problem by Hull et al. (2019) and Kolm et al. (2019), who demonstrated, based on simulated option prices, that it is possible to train a reinforcement learning algorithm to hedge a particular option in the framework of a specific model (be it GBM or SV model). In this way they demonstrated a potential of this machine learning technique for making hedging decisions. However, the reality of hedging presents several obstacles to implementing their approach directly. First, we do not know what the exact data-generating mechanism is, i.e., what is the price process of the underlying. Second, typically we do not have enough real trading option data to properly train a reinforcement learning algorithm. Finally, training a separate algorithm for each specific option (i.e., with a specific moneyness and maturity) is infeasible due to time and computational efforts involved.

So, we asked ourselves two questions: will a reinforced learning algorithm, trained on just one type of option, be able to cope with hedging a wide variety of options? And, if we train a reinforcement learning algorithm on a versatile range of price processes, will it transfer its acquired knowledge to the real world hedging environment? This last question, of transferring the knowledge leaned on synthetic data to deal with the real market environment, is considered a “holy grail” of machine learning and it even has a special name: transfer learning. In other words, we wanted to know: can our reinforcement learning machine learn building with toy bricks and then go and build a real house?

So, we trained the two well-known reinforcement learning algorithms (which are also called “agents”) – Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) – in various environments: GBM, stochastic volatility (SV) model of Heston and SV model with jumps in the underlying. Such training environments were meant to be as diverse as possible in terms of price and volatility dynamics, to let the agents generalize well on the real market data. Then we compared their performance to the same algorithms but trained in the same type of environment (and for the same type of options) as in the evaluation set. One would suspect that the agents trained to perform a more specific task would do much better than those trained to deal with for more “general” situations – but this turned out not to be the case.First of all, it turns out that reinforcement learning algorithms perform well when hedging European options with a wide variety of characteristics (moneyness and maturities), while being trained to hedge one specific European call option. So they are able to “generalize” their acquired hedging knowledge to a wider range of options than just those they’ve been trained on.Furthermore, agents trained on a rich data set containing scenarios of different volatility levels have a hedging performance indistinguishable from that of agents trained on only one volatility level, i.e., the same one as that of the testing set. This means that using a versatile training set, which covers many volatility regimes, makes the agents robust in the real application environment.Finally, what about the performance of the agents in the real market environment (i.e., when hedging real traded options), after they were trained on scenarios generated from a multitude of price models? The hedging costs obtained by these agents were 30% lower than Black-Scholes and 10% lower than Wilmott hedging strategies, and had lower variances. So the transfer knowledge was successful – but it could be improved even further. For example, we noticed that there were a few situations when the agents made less than optimal hedging choices. This could be the consequence of them encountering situations never seen before – which can be dealt with by further extending the training environment. Also, we tested our agents on a relatively stable time period – 2019. It would be interesting to see how they would cope with high volatility regime as the one we saw in 2020. This is being currently investigated; but, in any case, we saw that reinforcement learning is a robust and flexible tool that can be used in the real world hedging.Due to scarcity and high costs of historical market data and the computational resources required to train the algorithms, transfer learning is of fundamental importance in this quantitative finance application. The ideal goal – to train an algorithm on synthetic data and for one specific derivative and use it for similar but different derivatives and in real hedging environment – definitely seems within reach.

References

Cao, J., Chen, J., Hull, J.C. and Poulos, Z. (2019). Deep Hedging of Derivatives Using Reinforcement Learning. Available at SSRN: https://ssrn.com/abstract=3514586

Giurca and Borovkova (2021). Delta Hedging of Derivatives using Deep Reinforcement Learning. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3847272

Kolm and Ritter (2019). “Dynamic Replication and Hedging: A Reinforcement Learning Approach,” The Journal of Financial Data Science, Winter 2019, 1 (1), pp. 159-171.