Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses

This post will begin to apply a hypothesis-driven development framework (that is, the framework written by Brian Peterson on how to do strategy construction correctly, found here) to a strategy I’ve come across on SeekingAlpha. Namely, Cliff Smith posted about a conservative bond rotation strategy, which makes use of short-term treasuries, long-term treasuries, convertibles, emerging market debt, and high-yield corporate debt–that is, SHY, TLT, CWB, PCY, and JNK. What this post will do is try to put a more formal framework on whether or not this strategy is a valid one to begin with.

One note: For the sake of balancing succinctness for blog consumption and to demonstrate the computational techniques more quickly, I’ll be glossing over background research write-ups for this post/strategy, since it’s yet another take on time-series/cross-sectional momentum, except pared down to something more implementable for individual investors, as opposed to something that requires a massive collection of different instruments for massive, institutional-class portfolios.

Introduction, Overview, Objectives, Constraints, Assumptions, and Hypotheses to be Tested:

Momentum. It has been documented many times. For the sake of brevity, I’ll let readers follow the links if they’re so inclined, but among them are Jegadeesh and Titman’s seminal 1993 paper, Mark Carhart’s 1997 paper, Andreu et. Al (2012), Barroso and Santa-Clara (2013), Ilmanen’s Expected Returns (which covers momentum), and others. This list, of course, is far from exhaustive, but the point stands. Formation periods of several months (up to a year) should predict returns moving forward on some holding period, be it several months, or as is more commonly seen, one month.

Furthermore, momentum applies in two varieties–cross sectional, and time-series. Cross-sectional momentum asserts that assets that outperformed among a group will continue to outperform, while time-series momentum asserts that assets that have risen in price during a formation period will continue to do so for the short-term future.

Cliff Smith’s strategy depends on the latter, effectively, among a group of five bond ETFs. I am not certain of the objective of the strategy (he didn’t mention it), as PCY, JNK, and CWB, while they may be fixed-income in name, possess volatility on the order of equities. I suppose one possible “default” objective would be to achieve an outperforming total return against an equal-weighted benchmark, both rebalanced monthly.

The constraints are that one would need a sufficient amount of capital such that fixed transaction costs are negligible, since the strategy is a single-instrument rotation type, meaning that each month may have two-way turnover of 200% (sell one ETF, buy another). On the other hand, one would assume that the amount of capital deployed is small enough such that execution costs of trading do not materially impact the performance of the strategy. That is to say, moving multiple billions from one of these ETFs to the other is a non-starter. As all returns are computed close-to-close for the sake of simplicity, this creates the implicit assumption that the market impact and execution costs are very small compared to overall returns.

There are two overarching hypotheses to be tested in order to validate the efficacy of this strategy:

1) Time-series momentum: while it has been documented for equities and even industry/country ETFs, it may not have been formally done so yet for fixed-income ETFs, and their corresponding mutual funds. In order to validate this strategy, it should be investigated if the particular instruments it selects adhere to the same phenomena.

2) Cross-sectional momentum: again, while this has been heavily demonstrated in the past with regards to equities, ETFs are fairly new, and of the five mutual funds Cliff Smith selected, the latest one only has data going back to 1997, thus allowing less sophisticated investors to easily access diversified fixed income markets a relatively new innovation.

Essentially, both of these can be tested over a range of parameters (1-24 months).

Another note: with hypothesis-driven strategy development, the backtest is to be *nothing more than a confirmation of all the hypotheses up to that point*. That is, re-optimizing on the backtest itself means overfitting. Any proposed change to a strategy should be done in the form of tested hypotheses, as opposed to running a bunch of backtests and selecting the best trials. Taken another way, this means that every single proposed element of a strategy needs to have some form of strong hypothesis accompanying it, in order to be justified.

So, here are the two hypotheses I tested on the corresponding mutual funds:

require(quantmod)
require(PerformanceAnalytics)
require(reshape2)
symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
  prices[[symbol]] <- Ad(get(symbol))
}
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
returns <- na.omit(Return.calculate(prices))

sample <- returns['1997-08/2009-03']
monthRets <- apply.monthly(sample, Return.cumulative)

returnRegression <- function(returns, nMonths) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAverage <- na.omit(lag(nMonthAverage))
  returns <- returns[index(nMonthAverage)]
  
  rankAvg <- t(apply(nMonthAverage, 1, rank))
  rankReturn <- t(apply(returns, 1, rank))
  
  
  meltedAverage <- melt(data.frame(nMonthAverage))
  meltedReturns <- melt(data.frame(returns))
  meltedRankAvg <- melt(data.frame(rankAvg))
  meltedRankReturn <- melt(data.frame(rankReturn))
  lmfit <- lm(meltedReturns$value ~ meltedAverage$value - 1)
  rankLmfit <- lm(meltedRankReturn$value ~ meltedRankAvg$value)
  return(rbind(summary(lmfit)$coefficients, summary(rankLmfit)$coefficients))
}

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)

Essentially, in this case, I take a pooled regression (that is, take the five instruments and pool them together into one giant vector), and regress the cumulative sum of monthly returns against the next month’s return. Also, I do the same thing as the above, except also using cross-sectional ranks for each month, and performing a rank-rank regression. The sample I used was the five mutual funds (CNSAX, FAHDX, VUSTX, VFISX, and PREMX) since their inception to March 2009, since the data for the final ETF begins in April of 2009, so I set aside the ETF data for out-of-sample backtesting.

Here are the results:

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)


plot(estimates, type='h', xlab = 'Months regressed on', ylab='momentum coefficient', 
     main='future returns regressed on past momentum')
plot(pvals, type='h', xlab='Months regressed on', ylab='p-value', main='momentum significance')
abline(h=.05, col='green')
abline(h=.1, col='red')

plot(rankEstimates, type='h', xlab='Months regressed on', ylab="Rank coefficient",
     main='future return ranks regressed on past momentum ranks', ylim=c(0,3))
plot(rankPs, type='h', xlab='Months regressed on', ylab='P-values')




Of interest to note is that while much of the momentum literature specifies a reversion effect on time-series momentum at 12 months or greater, all the regression coefficients in this case (even up to 24 months!) proved to be positive, with the very long-term coefficients possessing more statistical significance than the short-term ones. Nevertheless, Cliff Smith’s chosen parameters (the two and four month settings) possess statistical significance at least at the 10% level. However, if one were to be highly conservative in terms of rejecting strategies, that in and of itself may be reason enough to reject this strategy right here.

However, the rank-rank regression (that is, regressing the future month’s cross-sectional rank on the past n month sum cross sectional rank) proved to be statistically significant beyond any doubt, with all p-values being effectively zero. In short, there is extremely strong evidence for cross-sectional momentum among these five assets, which extends out to at least two years. Furthermore, since SHY or VFISX, aka the short-term treasury fund, is among the assets chosen, since it’s a proxy for the risk-free rate, by including it among the cross-sectional rankings, the cross-sectional rankings also implicitly state that in order to be invested into (as this strategy is a top-1 asset rotation strategy), it must outperform the risk-free asset, otherwise, by process of elimination, the strategy will invest into the risk-free asset itself.

In upcoming posts, I’ll look into testing hypotheses on signals and rules.

Lastly, Volatility Made Simple has just released a blog post on the performance of volatility-based strategies for the month of August. Given the massive volatility spike, the dispersion in performance of strategies is quite interesting. I’m happy that in terms of YTD returns, the modified version of my strategy is among the top 10 for the year.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

21 thoughts on “Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses

  1. Pingback: Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses | Mubashir Qasim

  2. Pingback: Quantocracy's Daily Wrap for 09/03/2015 | Quantocracy

  3. Pingback: Distilled News | Data Analytics & R

  4. Pingback: Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses « Manipulate Magazine: Math 4 You By Us Group Illinois

  5. Pingback: IMHO BEST LINKS FROM QUANTOCRACY FOR THE WEEK 31 AUG 15 — 6 SEP 15 | Quantitative Investor Blog

  6. Ilya, good post. I have two questions:
    Why are you not removing the intersect for the rankings as you do for the returns (y~x-1 vs y~x). The estimates and probabilityes actualy refer to the intersect in the case of rankings.
    Why do you use averages of discrete returns rather than cumulative returns or averages of log returns?

    Keep up the good work.

    • Hello Hugo,

      Actually, I do use the p-value for the regression estimate. The second row is the regression estimate, not the intercept, which you can find accessed inside the loop here:

      for(i in 1:24) {
      tmp <- returnRegression(monthRets, nMonths=i)
      pvals[[i]] <- tmp[1,4]
      estimates[[i]] <- tmp[1,1]
      rankPs[[i]] <- tmp[2,4]
      rankEstimates[[i]] <- tmp[2,1]
      }

      As for averages of discrete returns instead of cumulative returns, it's that ROC is the difference between two points. So this gives me more data. But it's most likely very similar in nature.

      And I don't remove the intersect for ranking because returns are already zero-centered, ranks aren't, so I keep the intercept there.

      • Maybe I am missing something… The second row seems like the intersection of the rank linear regression.
        rbind(summary(lmfit)$coefficients, summary(rankLmfit)$coefficients)
        Estimate Std. Error t value Pr(>|t|)
        meltedAverage$value 0.01829089 0.006436298 2.841835 4.643492e-03
        (Intercept) 2.69224138 0.137225579 19.619093 4.568979e-66
        meltedRankAvg$value 0.10258621 0.041375069 2.479421 1.344357e-02

        Thanks for the explanation about why the intersect is needed.

      • Hugo,

        > a b lmfit summary(lmfit)

        Call:
        lm(formula = a ~ b)

        Residuals:
        Min 1Q Median 3Q Max
        -2.56744 -0.76535 0.06351 0.76057 2.46539

        Coefficients:
        Estimate Std. Error t value Pr(>|t|)
        (Intercept) -0.002372 0.105546 -0.022 0.982
        b -0.002547 0.113137 -0.023 0.982

        Residual standard error: 1.047 on 98 degrees of freedom
        Multiple R-squared: 5.17e-06, Adjusted R-squared: -0.0102
        F-statistic: 0.0005067 on 1 and 98 DF, p-value: 0.9821

        The value is the second row of the coefficients.

        Hope this helps.

  7. Pingback: Hypothesis-Driven Development Part II | QuantStrat TradeR

  8. Thanks for your post. I suggest adding the line of code

    require(reshape2)

    below the other “require” lines. When I first ran your script R complained about not finding the “melt” function.

  9. Pingback: Hypothesis Driven Development Part IV: Testing The Barroso/Santa Clara Rule | QuantStrat TradeR

  10. I tested the code with random portfolio and the rank-rank regression looks very similar. Any thoughts about that?

    This was the code to generate the random rankings. I hope that I got it right

    nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
    nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
    nMonthAverage <- na.omit(lag(nMonthAverage))

    random <- returns
    for(i in 1:nrow(random)) {
    random[i,] <- runif(ncol(random))
    }
    nMonthAverage <- random

  11. Pingback: Create Amazing Looking Backtests With This One Wrong–I Mean Weird–Trick! (And Some Troubling Logical Invest Results) | QuantStrat TradeR

  12. Why do you subtract 1 when running the regression here?

    lmfit <- lm(meltedReturns$value ~ meltedAverage$value – 1)

      • My stats knowledge isn’t great. How are you sure the intercept is zero here? I checked the qqplot and it looks fine but I don’t get the intuition. Thanks

  13. I don’t understand your answer to Hugo.
    As you used rbind, the object tmp consists of three rows.
    First row for regression coefficient in meltedAverage
    Second row for intercept in meltedRankAvg
    Third row for regression coefficient in meltedRankAvg

    So, I guess tmp[1,], tmp[3,] is needed to show regression coefficient.

Leave a comment