# Hypothesis-Driven Development Part II

This post will evaluate signals based on the rank regression hypotheses covered in the last post.

The last time around, we saw that rank regression had a very statistically significant result. Therefore, the next step would be to evaluate the basic signals — whether or not there is statistical significance in the actual evaluation of the signal–namely, since the strategy from SeekingAlpha simply selects the top-ranked ETF every month, this is a very easy signal to evaluate.

Simply, using the 1-24 month formation periods for cumulative sum of monthly returns, select the highest-ranked ETF and hold it for one month.

Here’s the code to evaluate the signal (continued from the last post), given the returns, a month parameter, and an EW portfolio to compare with the signal.

```
signalBacktest <- function(returns, nMonths, ewPortfolio) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(returns))
selection <- (nMonthAvgRank==5) * 1 #select highest average performance
sigTest <- Return.portfolio(R = returns, weights = selection)
difference <- sigTest - ewPortfolio
diffZscore <- mean(difference)/sd(difference)
sigZscore <- mean(sigTest)/sd(sigTest)
return(list(sigTest, difference, mean(sigTest), sigZscore, mean(difference), diffZscore))
}

ewPortfolio <- Return.portfolio(monthRets, rebalance_on="months")

sigBoxplots <- list()
excessBoxplots <- list()
sigMeans <- list()
sigZscores <- list()
diffMeans <- list()
diffZscores <- list()
for(i in 1:24) {
tmp <- signalBacktest(monthRets, nMonths = i, ewPortfolio)
sigBoxplots[[i]] <- tmp[[1]]
excessBoxplots[[i]] <- tmp[[2]]
sigMeans[[i]] <- tmp[[3]]
sigZscores[[i]] <- tmp[[4]]
diffMeans[[i]] <- tmp[[5]]
diffZscores[[i]] <- tmp[[6]]
}

sigBoxplots <- do.call(cbind, sigBoxplots)
excessBoxplots <- do.call(cbind, excessBoxplots)
sigMeans <- do.call(c, sigMeans)
sigZscores <- do.call(c, sigZscores)
diffMeans <- do.call(c, diffMeans)
diffZscores <- do.call(c, diffZscores)

par(mfrow=c(2,1))
plot(as.numeric(sigMeans)*100, type='h', main = 'signal means',
ylab = 'percent per month', xlab='formation period')
plot(as.numeric(sigZscores), type='h', main = 'signal Z scores',
ylab='Z scores', xlab='formation period')

plot(as.numeric(diffMeans)*100, type='h', main = 'mean difference between signal and EW',
ylab = 'percent per month', xlab='formation period')
plot(as.numeric(diffZscores), type='h', main = 'difference Z scores',
ylab = 'Z score', xlab='formation period')

boxplot(as.matrix(sigBoxplots), main = 'signal boxplots', xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(sigBoxplots[,1:12]), main = 'signal boxplots 1 through 12 month formations',
xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')

boxplot(as.matrix(excessBoxplots), main = 'difference (signal - EW) boxplots',
xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(excessBoxplots[,1:12]), main = 'difference (signal - EW) boxplots 1 through 12 month formations',
xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')
```

Okay, so what’s going on here is that I compare the signal against the equal weight portfolio, and take means and z scores of both the signal values in general, and against the equal weight portfolio. I plot these values, along with boxplots of the distributions of both the signal process, and the difference between the signal process and the equal weight portfolio.

Here are the results:

To note, the percents are already multiplied by 100, so in the best cases, the rank strategy outperforms the equal weight strategy by about 30 basis points per month. However, these results are…not even in the same parking lot as statistical significance, let alone in the same ballpark.

Now, at this point, in case some people haven’t yet read Brian Peterson’s paper on strategy development, the point of hypothesis-driven development is to *reject* hypothetical strategies ASAP before looking at any sort of equity curve and trying to do away with periods of underperformance. So, at this point, I would like to reject this entire strategy because there’s no statistical evidence to actually continue. Furthermore, because August 2015 was a rather interesting month, especially in terms of volatility dispersion, I want to return to volatility trading strategies, now backed by hypothesis-driven development.

If anyone wants to see me continue to rule testing with this process, let me know. If not, I have more ideas on the way.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

## 12 thoughts on “Hypothesis-Driven Development Part II”

1. Keep it up, I love this more skeptical approach to system validation! Please do it to your volatility ETF strategy!

2. Really dig the new approach of seeking statistical validation before proceeding. That said, looking beyond the BTFD (RSI2), short vol and momentum work, it would be great if you could conduct some research on other sources of risk-premia that tend to get overlooked (at least by the blog community).

• BTFD? Also, the reason that it’s hard to do research on other sources of risk premia is that it’s hard to find data for those other sources of risk premia for free.

• BTFD: buy the ‘failed’ dip.

re: data, sigh that makes sense… one area of interest could be carry, mainly commodity and interest rates carry. you could it approximate it for all asset classes using the futures term structure.

just my 2c.

3. Pingback: Best Links of the Week | Quantocracy

4. hi
how do you reconcile the statistical
significance shown in the first post w the lack of significance in this post?
many thx.

• Because you’re measuring different things.