By Thomas Dierckx, Wim Schoutens and Jesse Davis
In this study, we show how both machine learning and alternative data can be successfully leveraged to improve and develop trading strategies. Starting from a trading strategy that harvests the EUR/USD volatility risk premium by selling one-week straddles every weekday, we present a machine learning approach to more skillfully time new trades and thus prevent unfavorable ones. To this end, we build probability-calibrated Random Forests on various predictors, extracted from both traditional market data and financial news, to predict the closing Sharpe ratio of short one-week delta-hedged straddles. We then demonstrate how the output of these calibrated machine learning models can be used to engineer intuitive new trading strategies. Ultimately, we show that our proposed strategies outperform the original strategy on risk-based performance measures. Moreover, the features that we derived from financial news articles significantly improve the performance of the approach.
The price of an option contract is determined by, among other things, the expected risk, or volatility, of the underlying asset for the duration of the contract. It is extremely difficult to predict future volatility. In fact, it is well known that the market tends to overestimate future volatility when trading option contracts.[1] In other words, the volatility implied by option prices, known as implied volatility, often overestimates the historical volatility. The difference between implied and historical volatility is better known as the volatility risk premium, which in turn is a popular target for many trading strategies. Indeed, market participants attempt to isolate and trade this premium through a range of complex derivative strategies.
Most existing studies that investigate trading the volatility premium are situated in stock markets and report underwhelming results (e.g.2,, 3, 4), 4). The presence of the premium fluctuates over time, making it hard to trade profitably. However, recent work by Société Générale suggests the existence of a steady volatility premium on the EUR/USD currency pair.[5] They propose a trading strategy where a new delta-hedged at-the-money straddle with seven days to maturity is systematically sold on a daily basis and show that their approach was profitable throughout the last decade. Naturally, their strategy periodically suffers from disappointing results and on average one out of three trades ends up incurring a loss. Our goal is to improve their approach by reducing the number of loss-making trades. Specifically, we investigate whether machine-learned models trained on both market and alternative data can identify on which days the strategy is likely to make money, and hence should be employed.
The combination of machine learning and alternative data is a promising approach within computational finance. In recent years, the field of finance has seen an explosion of interest in more exotic sources of information to serve alongside traditional market data.
Academic literature suggests that machine learning can be used to extract valuable insights from sources such as social media (e.g. 6, 7 news (e.g 8,9 ), and earning reports (e.g 10, 11) for a variety of different applications. A key distinguishing characteristic of these alternative data sources is that they are typically textual in nature. This is in contrast with traditional market data which is numerical and readily used with modern statistical methods. The ability to extract and quantify information residing in text is therefore an essential problem to solve.
The contribution of this study is two-fold. First, we demonstrate that Random Forests trained on historical market conditions can predict the closing Sharpe ratio of short one-week delta-hedged straddles on EUR/USD. In addition, we propose a number of features that can be derived from financial news and show that using them results in improved performance compared to solely using market-based features.
Second, we show how predictions from probability-calibrated Random Forests can be used in developing new and improved trading strategies. Empirically, our strategies outperform the original one out-of-sample based on risk-based performance measures.
The following sections are structured as follows: Section 2 first details necessary background information on methods used in our study, Section 3 describes our data acquisition and preparation steps, Section 4 outlines the methodology used to study our research objectives, Section 5 then presents the results of our experiments together with a discussion, after which Section 6 offers a conclusion on the performed work.
Click here
[1] P. Carr, L. Wu, Variance risk premiums, Review of Financial Studies, 22 (2009), pp. 1311-1341, 10.1093/rfs/hhn038
[2] J.P. Dapena, J.R. Siri, Index Options Realized Returns Distributions from Passive Investment Strategies, ERN: Asset Pricing Models (Topic) (2015), 10.2139/ssrn.2733774
[3] O. Bondarenko, An analysis of index option writing with monthly and weekly rollover, SSRN Electronic Journal (2016),
[4] D. Schulte, M. Stamos, The performance of equity index option strategies during the financial crisis∗ the performance of equity index option strategies during the financial crisis, SSRN Electronic Journal (2015),
[5] O. Daviaud, O. Korber, A. Mukhopadhyay, S. Ungari, Systematic Trading in Options, Société Générale - Cross Asset Research (2020
[6] M. Checkley, D.A. Higón, H. Alles, The hasty wisdom of the mob: how market sentiment predicts stock market behavior, Expert Systems with Applications, 77 (2017), pp. 256-263, 10.1016/j.eswa.2017.01.029
[7] N. Oliveira, P. Cortez, N. Areal, The impact of microblogging data for stock market prediction: using twitter to predict returns, volatility, trading volume and survey sentiment indices, Expert Systems with Applications, 73 (2017), pp. 125-144, 10.1016/j.eswa.2016.12.036
,[8] C. Curme, Y.D. Zhuo, H.S. Moat, T. Preis, Quantifying the diversity of news around stock market moves, Journal of Network Theory in Finance, 3 (2015), pp. 1-20, 10.21314/JNTF.2017.027
[9] S. Feuerriegel, A. Ratku, D. Neumann, Analysis of how underlying topics in financial news affect stock prices using latent dirichlet allocation, 2016 49th Hawaii International Conference on System Sciences (HICSS) (2016), pp. 1072-1081, 10.1109/HICSS.2016.137
[10] S. Feuerriegel, N. Pröllochs, Investor reaction to financial disclosures across topics: an application of latent dirichlet allocation, Decision Sciences, 52 (2018), 10.1111/deci.12346
[11] K. Theil, S. Stajner, H. Stuckenschmidt, Word embeddings-based uncertainty detection in financial disclosures, ECONLP@ACL (2018), 10.18653/v1/W18-3104