Mutual Fund Recommendation System

  • Background
  • Reason for creating the algorithm
  • Future projects

My background has been in financial services and there’s a lot of places that machine learning would be applicable. I was surprised that there hasn’t been much in terms of recommendation systems and mutual funds. It seemed like such a simple step for a broker dealer to implement. The fact that broker dealers get sponsorship dollars from mutual fund companies would make implementing this much less appealing.

There is a need for wholesalers of investment products. Some asset classes are too difficult for one advisor to understand and a good wholesaler will be able to help their advisor navigate a complicated space or investment. A lot of costs are included into the price of a mutual fund because of this overhead of highly paid external wholesalers, internal wholesalers, and their marketing team. This cost of distribution does not even touch the actual management of the product from the portfolio managers, service professionals, and technology specialists.

The average mutual fund expense ratio is between .5 and 1%. This has gone down in the last decade because of passively managed mutual funds and ETFs have started garnering attention. Mutual funds such as those by Vanguard have outperformed their actively managed peers in the last bull market while priced at a fraction of the cost. The largest mutual fund by assets is the vanguard 500 Index Fund (ticker: VFIAX) and has a net expense ratio of .04%. If VFIAX has the same performance as an actively managed mutual fund that has an expense ratio of 1%, VFIAX will have a net outperformance of 96bp year over year for every year that the funds tie. If the actively managed fund underperforms the benchmark, it is easier to see why investors have gone to the cheaper options.

My idea for a mutual fund recommendation system stems from the need for advisors to have an easy choice when it comes to finding appropriate investments. One of the most frustrating things that an advisor can come across is the sudden lose of their favorite fund due to a closing or the fund not being available at their broker dealer’s platform.

The first and simplest way to use my algorithm is to simply find the funds that are close in terms of similarity. While we can look at the cheapest option as a further filter, it just makes sense to look at a fund in terms of performance, yield, and holdings. Once we have a list of potential replacement, it can be easier to choose based on criteria that the clients are looking for.

Fund Inception of 1991.

The 2nd way to use the recommendation system is as a check to see whether rating agencies have the fund in the correct category. Even in fact sheets, official literature that can be used to solicit business, show great performance despite the fund being a completely different strategy. The fund shows performance against the 3yr municipal bond index since 1991. The fund before 2015 was a taxable short term bond and when the fund changed names and strategy in 2015, this fund became a top performer in the short term bond space, because it kept the returns of the previous strategy. When a fund is ran in the recommendation system, it will show who the actual funds that it should be compared to. The algorithm not only looks at the similarities in holdings and yield, it will also look at performance and risk metrics to group funds together.

My Approach

The most difficult part of the project was how to set up information. I scraped all the data from a popular aggregator of investment data.

I scraped 14,000 mutual funds and came back with around 200 categories. Making sense of the data was easy, but cleaning the data was more time consuming for the mere fact that different asset classes had different information in them. For example, bond funds would have interest rate risk and bond maturity exposures, while those columns were not found in equities.

Once everything was in the right place, I had to scale the dataframe. Since we are looking for similarities, I don’t want the algorithm to mistake columns with large numbers (AUM) as having more important features that columns with smaller numbers (expense ratio). I looked at different scaling methods and eventually fell to the default that everyone uses: Min/Max Scaler. The min/max scaler works because we are getting rid of outliers and comparing each column among themselves.