Forecasting inflation and speeding up Python

By Saeed Amen, Cofounder, Turnleaf Analytics

Broadly speaking we can think of several main drivers for inflation. We have both supply and demand in the economy for goods and services. Hence for example, if demand for goods and services increase whilst supply is constrained this will spur inflation.If demand for labour increases, this will push up wages and hence inflation.If energy costs increase this will push up the costs of goods and services. We have seen this during the Russian invasion of Ukraine.We also have the monetary channel that drives inflation, i.e., the supply and demand for money.If central banks hike rates, this will decrease the supply of credit within the economy, and hence help to push inflation lower.

Inflation has always been a critical economic variable. In recent decades, many central banks have adopted inflation targets as part of their mandates. Whilst inflation has always been a major issue in emerging markets given the relatively frequent bouts of hyperinflation, in recent months it has also become a critical issue in developed markets where have seen double digit inflation for the first time in decades.

How can we forecast inflation? There are many different approaches. One way is to use a relatively small number of variables to model some of the factors listed above as our x variables such as money supply, unemployment and so on, to forecast our y variable, inflation, using a linear model for the regression like OLS. However, the difficulty is that many of them interactions between these variables maybe nonlinear, and also unstable, changing over time. Inflation has also gone through different regimes over recent decades. Also, many variables we need to use have a release lag. How can we solve these various problems?
At QuantMinds, I’ll be presenting how we forecast inflation at Turnleaf Analytics. Our key objective has been to forecast inflation (e.g., from several weeks out up to a year), as opposed to nowcasting inflation, where you are effectively trying to forecast the next official inflation number in the very short term, such as the next few days.

Inflation has also gone through different regimes over recent decades. Also, many variables we need to use have a release lag. How can we solve these various problems?

The use cases for forecasting inflation are numerous. Clearly central banks want to forecast inflation. Traders who trade instruments such as interest rate swaps and inflation swaps can also directly profit from accurate inflation forecasts. For corporates, inflation forecasting can help them in their planning, whether it is for wages, or in setting the prices of their goods and services.As with any data science problem there are two major parts which we need to get right, the data and the model which consumes that data to come up with forecasts.For the model, we wanted to strike a balance between simplicity of a model and accuracy of its forecasts. At the same time, we wanted to capture some of the more complex relationships you observe between inflation and its drivers. We looked at various machine learning models. Of course, the first step is usually to use the simplest model, i.e., OLS!What about looking at models which have regularisation? We also looked at various extensions such as ridge and lasso regressions, as well as elastic nets.

We found that these models could be helpful in terms of extrapolation when it came to forecasting inflation, for those situations where inflation was behaving in more unusual ways (i.e., that we had not observed in our training dataset). We also examined models such as random forests. These seemed to be good at interpolating within our dataset, for those situations which more closely resembled our training set. In practice, an ensemble of different models seemed best so we could address both interpolation of scenarios observed before and also extrapolation of those we hadn’t observed.

So, what about the data, we were feeding the model? As well as macroeconomic and market variables which have traditionally been important parts of inflation forecasting, we also use alternative data. Alexander Denev and I wrote The Book of Alternative Data, before we cofounded Turnleaf Analytics, and it’s been an area of interest of ours for many years. If we take data such as the industrial production which is one of the many variables used within inflation forecasting, it is usually available with a lag. For some countries, this might be up to 2-3 months. Using alternative data, we can proxy this variable with a dataset that is available with very little lag (if any). In our case, we use pollution data as a proxy for industrial production. A rise in pollution is indicative of more industrial activity. We also used many other alternative datasets that could give us a high frequency view of the economy, whether it was mobility data, restaurant reservations and so on.

More broadly finding datasets and pre-processing the data for forecasting inflation is very time consuming. Some steps can be automated, other parts require significant human intervention. A large amount of domain knowledge is also very important and understanding of how various economic factors can impact inflation.If we do not apply any domain knowledge the process of finding data, can result in adding datasets which have little or no rationale for using, increasing the chance of including variables which have spurious relationships with inflation. Furthermore, without domain knowledge to guide us towards which data to use, our problem can become intractable.Another key part of the presentation will be explaining our technology stack used to forecast inflation and in particular tips and tricks to speed up Python, with a couple of examples drawn from data science problems. We use Python extensively to fit our models. Fitting hyperparameters and doing sensitivity analysis can use a lot of compute power. If our code is very slow and all single threaded, we will have to wait a very long time for our results!As is fairly widely known Python isn’t the fastest language. I’ll be showing how using Numba, can allow us to speed up loops in Python and mathematical functions, giving a specific example of the normal CDF and also moving averages.I’ll also be exploring libraries when working with very large datasets, such as Vaex and Dask.

As is fairly widely known Python isn’t the fastest language. I’ll be showing how using Numba, can allow us to speed up loops in Python and mathematical functions, giving a specific example of the normal CDF and also moving averages.

Saeed Amen
Cofounder, Turnleaf Analytics

Don't miss Saeeds presentation at QuantMinds International on Monday 7 November.

Read more here.