7 Reasons Most Machine Learning Funds Fail: Summary

I just finished working on a stock market prediction model for a fintech company and before I can kick my feet back and relax, I started reading Advances in Financial Machine Learning by Marcos Lopez de Prado. As I read, I realized a lot of things I could have done to make my project better. Since I wanted to learn more about his thought process before reading the full textbook, I decided to listen to one of his talks and wrote some notes to help guide me in running through his book. There’s a lot of lessons to be learned that can help in creating machine learning algorithms for financial markets.

The first thing to realize is that finance faces several problems that make it difficult to simply implement econometric formulas: relationships are extremely complex, non-linear, non-numeric, unstructured, and correlations that are heavily influenced by outliers. Because of these factors, there are many mistakes that can be made if one is not careful.

1- Sisyphean Quants

Sisyphus is punished with rolling a stone up a hill every day, but every night the stone starts back at the bottom of the hill. Most quants in finance feel they are in a similar situation. In the scientific world of multiple scientists working together on a problem, most firms have multiple quants that are tasked with creating a strategy on their own. The best solution would be to let quants work together to specialize their expertise and share information of the best ways to generate alpha. There should be a data scientist for specific parts of the process instead of several data scientists that are generalists. Even the Manhattan project utilized high school students to calculate yield of the atomic bomb. Finance should be able to separate roles to maximize people working on different parts of the project.

The current approach leads to quants creating models that look great in backtests (overfit) or they just use standard models that are tried and true based on accepted principles but will not yield any results on future data.

Investing is very complex and can be split up in many ways:

Data collection, curation, processing, structuring
HPC infrastructure
Software development
Feature analysis
Execution simulators
Backtesting

The first problem is with the industry itself and how most companies look at quantitative finance. Instead of creating a team of data scientists to examine parts of the market and build a lab to solve problems, quants are treated like fundamental portfolio managers and left to fend for themselves. As we have seen, there’s many tasks involved in creating a successful trading strategy, the best way to go forward is to specialize roles and have a team work to produce better results together.

2 Integer Differentiation

Inferential analysis involves working with invariant processes (change of returns, yield changes, and volatility changes. When we make a prediction on new data, we forget about what we’ve learned in the past. Stationary models need memory to asssess how far the price has drifted away from long-term expected value. Returns are stationary (long-term average), but memory-less. Prices have memory, but are non-stationary.

Creating relationships from data make the series stationary, but at the expense of removing memory from the original series. Since memory is the basis for predictive power, we need to find the best equilibrium between stationary returns and preserving as much memory as possible.

Augmented Dickey-Fuller is the test that checks whether a time series is stationary (mean and variance are constant over time). In running an experiment with many liquid trading instruments, we find that a fractional differentiation of order d < .5 is the level at which most financial series can be made stationary.

Our goal is to generalize returns to consider stationary series where not all memory is erased.

3 Chronological Sampling

We tend to view the market being very efficient, every information is already caked into the price. Instead of looking at the markets on a tick by tick basis where there’s no difference between one buy and one sell, it is suggested to look at volume to see how confident people are in their trades. This in turn should give us an insight into what is happening in the underlying market.

Trade Bars
Volume Bars
Dollar Bars
Volatility Bars
Order Imbalance Bars
Entropy Bars

4 Wrong Labeling

Fixed time horizon methods are the standard when it comes to financial analysis, but do not exhibit good statistical properties. A good alternative would be varying thresholds based on rolling weighted standard deviation of returns. A reason why we shouldn’t look at the traditional way is that overnight prices are only looked at at the opening price. We should not look at prices in the same way when they are having different volume characteristics.

The Triple Barrier Method is a good way of showing actual predictions from a model. Instead of looking at the end result and looking at what a total return would be, we look at the funds potential behavior as we test the algorithm. The 3 barriers are sell at target price, hold, sell at stop-loss.

A good way to implement this for a more traditional fundamental firm is to use machine learning to adjust size of investments based on confidence in trade. What we mostly face is low precision and high recall: the phenomenon of predicting 20 of the last 8 recessions. We can use statistics to find optimal level to optimize harmonic average.

5 Weighting of Independent and Identically Distributed samples

The most common problem faced by financial analysis is that most data is not IID. In science one tube for one person will be independent and identically distributed from another person. Whereas in finance Netflix stock will somehow be related to Home Depot stock. We simply cannot account for correlations. Finance does not have clean observations: multicollinear, non-normal distributions, pseudo-correlations, or coross-correlations can be very high.

Once sampled observations, OOB samples will increase dramatically.

6 Cross-Validation Leakage

purged k-fold cv: removing from in-sample items that are very close in training set.
embargoed: when we remove rows in our training set that are in the testing set data. We’ll skew the information by showing that eventually, price is supposed to get to __ level first.

7 Backtest Overfitting

computing investment strategies
when conducting 1000 experiements on series thats unpredictable: the expected maximum sharpe ratio is 3. Now expected maximum sharpe ratio is 5.
Conducting multiple tests without adjusting for phenomenon of false positives appear more frequently as the result of more testing
Most discoveries in finance are false
Strategy development: we will always find profitable strategies where there is none.
“False Strategy” element: two variables that are never included (# of trials and variance of outcomes).
We expected a sharpe ratio of 5 if we keep increasing number of trials
Deflated Sharpe Ratio: mean of returns and sd of returns

Q and A

The rate of paper withdrawal and rescinded (recalled) is 0 where most of the time its 20% in health.
t and t+1 are very similar, if t is in training set, we have to purge because it is very similar to t+1.
Depends on quality of data and hard to process is the best data
feature importance is a big part of creating variables or strategies.
Once in backtesting, research is done. We’ll have to
Find what features are important and then create a strategy based on those features
You can extract a lot of information that other people have it. We are looking for fractional working.