This post will outline an easy-to-make mistake in writing vectorized backtests–namely in using a signal obtained at the end of a period to enter (or exit) a position in that same period. The difference in results one obtains is massive.
Today, I saw two separate posts from Alpha Architect and Mike Harris both referencing a paper by Valeriy Zakamulin on the fact that some previous trend-following research by Glabadanidis was done with shoddy results, and that Glabadanidis’s results were only reproducible through instituting lookahead bias.
The following code shows how to reproduce this lookahead bias.
First, the setup of a basic moving average strategy on the S&P 500 index from as far back as Yahoo data will provide.
require(quantmod) require(xts) require(TTR) require(PerformanceAnalytics) getSymbols('^GSPC', src='yahoo', from = '1900-01-01') monthlyGSPC <- Ad(GSPC)[endpoints(GSPC, on = 'months')] # change this line for signal lookback movAvg <- SMA(monthlyGSPC, 10) signal <- monthlyGSPC > movAvg gspcRets <- Return.calculate(monthlyGSPC)
And here is how to institute the lookahead bias.
lookahead <- signal * gspcRets correct <- lag(signal) * gspcRets
These are the “results”:
compare <- na.omit(cbind(gspcRets, lookahead, correct)) colnames(compare) <- c("S&P 500", "Lookahead", "Correct") charts.PerformanceSummary(compare) rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare)) logRets <- log(cumprod(1+compare)) chart.TimeSeries(logRets, legend.loc='topleft')
Of course, this equity curve is of no use, so here’s one in log scale.
As can be seen, lookahead bias makes a massive difference.
Here are the numerical results:
S&P 500 Lookahead Correct Annualized Return 0.0740000 0.15550000 0.0695000 Annualized Std Dev 0.1441000 0.09800000 0.1050000 Annualized Sharpe (Rf=0%) 0.5133000 1.58670000 0.6623000 Worst Drawdown 0.5255586 0.08729914 0.2699789 Calmar Ratio 0.1407286 1.78119192 0.2575219
Again, absolutely ridiculous.
Note that when using Return.Portfolio (the function in PerformanceAnalytics), that package will automatically give you the next period’s return, instead of the current one, for your weights. However, for those writing “simple” backtests that can be quickly done using vectorized operations, an off-by-one error can make all the difference between a backtest in the realm of reasonable, and pure nonsense. However, should one wish to test for said nonsense when faced with impossible-to-replicate results, the mechanics demonstrated above are the way to do it.
Now, onto other news: I’d like to thank Gerald M for staying on top of one of the Logical Invest strategies–namely, their simple global market rotation strategy outlined in an article from an earlier blog post.
Up until March 2015 (the date of the blog post), the strategy had performed well. However, after said date?
It has been a complete disaster, which, in hindsight, was evident when I passed it through the hypothesis-driven development framework process I wrote about earlier.
So, while there has been a great deal written about not simply throwing away a strategy because of short-term underperformance, and that anomalies such as momentum and value exist because of career risk due to said short-term underperformance, it’s never a good thing when a strategy creates historically large losses, particularly after being published in such a humble corner of the quantitative financial world.
In any case, this was a post demonstrating some mechanics, and an update on a strategy I blogged about not too long ago.
Thanks for reading.
NOTE: I am always interested in hearing about new opportunities which may benefit from my expertise, and am always happy to network. You can find my LinkedIn profile here.
There are other examples of problems with LI strategies post publication. To their credit, they post results showing the date of publication. There also appears to be widespread problems with many non-peer reviewed, pseudo-academic publications (SSRN is a breeding ground for this). Whenever I see in inflection point in 2008, for example, I know the strategy has been curve fit through selection bias. The author just selected instruments that worked in that unique period. But there are other biases – some pretty subtle. I’ve been monitoring many of the strategies published on your blog with troubling results. Even EAA, a fantastic idea and paper, has some red flags. EAA never booked a single yearly loss from 1998 through to publication in January 2015 (using MF data from Yahoo). Not even in 2008! Yet it lost nearly 5% in 2015 and is slightly negative YTD. Thank you for creating great code. Your blog is a treasure!
To be fair, 2015 was a horrid year for momentum, so I can let that one slide. A 5% loss isn’t the end of the world. This year I think is a bit lukewarm as well.
Momentum generally looks bad when the markets are in a sort of sustained consolidation wishy-washy phase. I’m sure there’s some technique out there that can more scientifically say what state a market is in a bit more definitively than looking at a chart, though.
Pingback: Build Awesome Searching Backtests With This A person Wrong–I Indicate Weird–Trick! (And Some Troubling Logical Spend Final results) | A bunch of data
Ilya, thanks for this useful article.
GeraldM, if you invest money with systems you pick from (il)logical invest then you deserve to lose your money to someone who can make better use of it.
Well, they do charge for their services. So, it’s never a good thing when those who pay for signals get crappy signals.
Tarantino, I have never used LI. It’s an obvious curve fit. It has been entertaining to monitor though. They publish charts that show when they went live which is commendable because it shows that the systems are actually not working. That makes me think that they don’t really appreciate the problems of over optimization. Just an opinion of course.
I think one problem with your code could be that you eliminate rows with NAs (na.omit) from a matrix of *returns*. You have to first merge the price-vectors, then remove the NAs and only after that calculate returns, otherwise you can get distorted results.
It makes next to no difference. All it https://widgets.wp.com/notificationsbeta/2416197980#means is that things start from when everything has returns.
It might be true in this special case but think of the following situation: you have two price timeseries, one goes like this: 100, 110, 100, 120, the other one is missing the third price. Now you convert to returns: 1.1, 0.91, 1.2, after that you merge, the first one now has: 1.1, 1.2 which is wrong. This wouldn’t have happened if you had first merged and then converted to returns.
Ah, fair enough. Yes, when I use ETF data, there aren’t missing daily prints, thankfully.
Other years in the back test weren’t good for momentum either. Market momentum characteristics in 2015 weren’t much different than prior periods (see http://www.priceactionlab.com/Blog/2015/08/momersion-indicator/). A likely problem with momentum is the explosion in papers, blogs posts and algorithms detailing every aspect of momentum. Everyone knows about it and so how an there be an edge for something that is so commonly understood? I agree that one year does not make for a robust data point but the fact remains that EAA proceeded to lose once applied to real out-of-sample data where as it didn’t during the biggest bubble (2000) or the worst financial crisis (2008) in a generation. The defensive portfolio is also in the deepest draw down since start of the back test period (1998). I find that troubling. The end result is that EAA went into draw down immediately after publication and has not recovered since. The problems with Logical Invest are over optimization (the unrealistic CAGR’s are enough to think that). The problems with EAA are not easy to identify. Maybe there isn’t a problem and 2015 (and so far in 2016) are somehow worse than any period since 1998. Or maybe momentum is now fully arbitraged out and will not work for the foreseeable future. I don’t know. I will keep monitoring these strategies though.