# Are R^2s Useful In Finance? Hypothesis-Driven Development In Reverse

This post will shed light on the values of R^2s behind two rather simplistic strategies — the simple 10 month SMA, and its relative, the 10 month momentum (which is simply a difference of SMAs, as Alpha Architect showed in their book DIY Financial Advisor.

Not too long ago, a friend of mine named Josh asked me a question regarding R^2s in finance. He’s finishing up his PhD in statistics at Stanford, so when people like that ask me questions, I’d like to answer them. His assertion is that in some instances, models that have less than perfect predictive power (EG R^2s of .4, for instance), can still deliver very promising predictions, and that if someone were to have a financial model that was able to explain 40% of the variance of returns, they could happily retire with that model making them very wealthy. Indeed, .4 is a very optimistic outlook (to put it lightly), as this post will show.

In order to illustrate this example, I took two “staple” strategies — buy SPY when its closing monthly price is above its ten month simple moving average, and when its ten month momentum (basically the difference of a ten month moving average and its lag) is positive. While these models are simplistic, they are ubiquitously talked about, and many momentum strategies are an improvement upon these baseline, “out-of-the-box” strategies.

Here’s the code to do that:

```require(xts)
require(quantmod)
require(PerformanceAnalytics)
require(TTR)

getSymbols('SPY', from = '1990-01-01', src = 'yahoo')

smaSig <- smaRatio > 0
rocSig <- spyROC > 0

smaRets <- lag(smaSig) * spyRets
rocRets <- lag(rocSig) * spyRets
```

And here are the results:

```strats <- na.omit(cbind(smaRets, rocRets, spyRets))
charts.PerformanceSummary(strats, main = "strategies")
rbind(table.AnnualizedReturns(strats), maxDrawdown(strats), CalmarRatio(strats))
``` ```                              SMA10     MOM10   BuyHold
Annualized Return         0.0975000 0.1039000 0.0893000
Annualized Std Dev        0.1043000 0.1080000 0.1479000
Annualized Sharpe (Rf=0%) 0.9346000 0.9616000 0.6035000
Worst Drawdown            0.1663487 0.1656176 0.5078482
Calmar Ratio              0.5860332 0.6270657 0.1757849
```

In short, the SMA10 and the 10-month momentum (aka ROC 10 aka MOM10) both handily outperform the buy and hold, not only in absolute returns, but especially in risk-adjusted returns (Sharpe and Calmar ratios). Again, simplistic analysis, and many models get much more sophisticated than this, but once again, simple, illustrative example using two strategies that outperform a benchmark (over the long term, anyway).

Now, the question is, what was the R^2 of these models? To answer this, I took a rolling five-year window that essentially asked: how well did these quantities (the ratio between the closing price and the moving average – 1, or the ten month momentum) predict the next month’s returns? That is, what proportion of the variance is explained through the monthly returns regressed against the previous month’s signals in numerical form (perhaps not the best framing, as the signal is binary as opposed to continuous which is what is being regressed, but let’s set that aside, again, for the sake of illustration).

Here’s the code to generate the answer.

```predictorsAndPredicted <- na.omit(cbind(lag(smaRatio), lag(spyROC), spyRets))
R2s <- list()
for(i in 1:(nrow(predictorsAndPredicted)-59))  { #rolling five-year regression
subset <- predictorsAndPredicted[i:(i+59),]
smaLM <- lm(subset[,3]~subset[,1])
smaR2 <- summary(smaLM)\$r.squared
rocLM <- lm(subset[,3]~subset[,2])
rocR2 <- summary(rocLM)\$r.squared
R2row <- xts(cbind(smaR2, rocR2), order.by=last(index(subset)))
R2s[[i]] <- R2row
}
R2s <- do.call(rbind, R2s)
par(mfrow=c(1,1))
colnames(R2s) <- c("SMA", "Momentum")
chart.TimeSeries(R2s, main = "R2s", legend.loc = 'topleft')
```

And the answer, in pictorial form: In short, even in the best case scenarios, namely, crises which provide momentum/trend-following/call it what you will its raison d’etre, that is, its risk management appeal, the proportion of variance explained by the actual signal quantities was very small. In the best of times, around 20%. But then again, think about what the R^2 value actually is–it’s the percentage of variance explained by a predictor. If a small set of signals (let alone one) was able to explain the majority of the change in the returns of the S&P 500, or even a not-insignificant portion, such a person would stand to become very wealthy. More to the point, given that two strategies that handily outperform the market have R^2s that are exceptionally low for extended periods of time, it goes to show that holding the R^2 up as some form of statistical holy grail certainly is incorrect in the general sense, and anyone who does so either is painting with too broad a brush, is creating disingenuous arguments, or should simply attempt to understand another field which may not work the way their intuition tells them.

# Hypothesis-Driven Development Part V: Stop-Loss, Deflating Sharpes, and Out-of-Sample

This post will demonstrate a stop-loss rule inspired by Andrew Lo’s paper “when do stop-loss rules stop losses”? Furthermore, it will demonstrate how to deflate a Sharpe ratio to account for the total number of trials conducted, which is presented in a paper written by David H. Bailey and Marcos Lopez De Prado. Lastly, the strategy will be tested on the out-of-sample ETFs, rather than the mutual funds that have been used up until now (which actually cannot be traded more than once every two months, but have been used simply for the purpose of demonstration).

First, however, I’d like to fix some code from the last post and append some results.

A reader asked about displaying the max drawdown for each of the previous rule-testing variants based off of volatility control, and Brian Peterson also recommended displaying max leverage, which this post will provide.

Here’s the updated rule backtest code:

```ruleBacktest <- function(returns, nMonths, dailyReturns,
nSD=126, volTarget = .1) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- na.omit(xts(nMonthAverage, order.by = index(returns)))
nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(nMonthAverage))
selection <- (nMonthAvgRank==5) * 1 #select highest average performance
dailyBacktest <- Return.portfolio(R = dailyReturns, weights = selection)
constantVol <- volTarget/(runSD(dailyBacktest, n = nSD) * sqrt(252))
monthlyLeverage <- na.omit(constantVol[endpoints(constantVol), on ="months"])
wts <- cbind(monthlyLeverage, 1-monthlyLeverage)
constantVolComponents <- cbind(dailyBacktest, 0)
out <- Return.portfolio(R = constantVolComponents, weights = wts)
out <- apply.monthly(out, Return.cumulative)
maxLeverage <- max(monthlyLeverage, na.rm = TRUE)
return(list(out, maxLeverage))
}

t1 <- Sys.time()
allPermutations <- list()
allDDs <- list()
leverages <- list()
for(i in seq(21, 252, by = 21)) {
monthVariants <- list()
ddVariants <- list()
leverageVariants <- list()
for(j in 1:12) {
trial <- ruleBacktest(returns = monthRets, nMonths = j, dailyReturns = sample, nSD = i)
sharpe <- table.AnnualizedReturns(trial[])[3,]
dd <- maxDrawdown(trial[])
monthVariants[[j]] <- sharpe
ddVariants[[j]] <- dd
leverageVariants[[j]] <- trial[]
}
allPermutations[[i]] <- do.call(c, monthVariants)
allDDs[[i]] <- do.call(c, ddVariants)
leverages[[i]] <- do.call(c, leverageVariants)
}
allPermutations <- do.call(rbind, allPermutations)
allDDs <- do.call(rbind, allDDs)
leverages <- do.call(rbind, leverages)
t2 <- Sys.time()
print(t2-t1)
```

Drawdowns: Leverage: Here are the results presented as a hypothesis test–a linear regression of drawdowns and leverage against momentum formation period and volatility calculation period:

```ddLM <- lm(meltedDDs\$MaxDD~meltedDDs\$volFormation + meltedDDs\$momentumFormation)
summary(ddLM)

Call:
lm(formula = meltedDDs\$MaxDD ~ meltedDDs\$volFormation + meltedDDs\$momentumFormation)

Residuals:
Min 1Q Median 3Q Max
-0.08022 -0.03434 -0.00135 0.02911 0.20077

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.240146 0.010922 21.99 < 2e-16 ***
meltedDDs\$volFormation -0.000484 0.000053 -9.13 6.5e-16 ***
meltedDDs\$momentumFormation 0.001533 0.001112 1.38 0.17
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0461 on 141 degrees of freedom
Multiple R-squared: 0.377, Adjusted R-squared: 0.368
F-statistic: 42.6 on 2 and 141 DF, p-value: 3.32e-15

levLM <- lm(meltedLeverage\$MaxLeverage~meltedLeverage\$volFormation + meltedDDs\$momentumFormation)
summary(levLM)

Call:
lm(formula = meltedLeverage\$MaxLeverage ~ meltedLeverage\$volFormation +
meltedDDs\$momentumFormation)

Residuals:
Min 1Q Median 3Q Max
-0.9592 -0.5179 -0.0908 0.3679 3.1022

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.076870 0.164243 24.82 <2e-16 ***
meltedLeverage\$volFormation -0.009916 0.000797 -12.45 <2e-16 ***
meltedDDs\$momentumFormation 0.009869 0.016727 0.59 0.56
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.693 on 141 degrees of freedom
Multiple R-squared: 0.524, Adjusted R-squared: 0.517
F-statistic: 77.7 on 2 and 141 DF, p-value: <2e-16
```

Easy interpretation here–the shorter-term volatility estimates are unstable due to the one-asset rotation nature of the system. Particularly silly is using the one-month volatility estimate. Imagine the system just switched from the lowest-volatility instrument to the highest. It would then take excessive leverage and get blown up that month for no particularly good reason. A longer-term volatility estimate seems to do much better for this system. So, while the Sharpe is generally improved, the results become far more palatable when using a more stable calculation for volatility, which sets maximum leverage to about 2 when targeting an annualized volatility of 10%. Also, to note, the period to compute volatility matters far more than the momentum formation period when addressing volatility targeting, which lends credence (at least in this case) to so many people that say “the individual signal rules matter far less than the position-sizing rules!”. According to some, position sizing is often a way for people to mask only marginally effective (read: bad) strategies with a separate layer to create a better result. I’m not sure which side of the debate (even assuming there is one) I fall upon, but for what it’s worth, there it is.

Moving on, I want to test out one more rule, which is inspired by Andrew Lo’s stop-loss rule. Essentially, the way it works is this (to my interpretation): it evaluates a running standard deviation, and if the drawdown exceeds some threshold of the running standard deviation, to sit out for some fixed period of time, and then re-enter. According to Andrew Lo, stop-losses help momentum strategies, so it seems as good a rule to test as any.

However, rather than test different permutations of the stop rule on all 144 prior combinations of volatility-adjusted configurations, I’m going to take an ensemble strategy, inspired by a conversation I had with Adam Butler, the CEO of ReSolve Asset Management, who stated that “we know momentum exists, but we don’t know the perfect way to measure it”, from the section I just finished up and use an equal weight of all 12 of the momentum formation periods with a 252-day rolling annualized volatility calculation, and equal weight them every month.

Here are the base case results from that trial (bringing our total to 169).

```strat <- list()
for(i in 1:12) {
strat[[i]] <- ruleBacktest(returns = monthRets, nMonths = i, dailyReturns = sample, nSD = 252)[]
}
strat <- do.call(cbind, strat)
strat <- Return.portfolio(R = na.omit(strat), rebalance_on="months")

rbind(table.AnnualizedReturns(strat), maxDrawdown(strat), CalmarRatio(strat))
```

With the following result:

```portfolio.returns
Annualized Return 0.12230
Annualized Std Dev 0.10420
Annualized Sharpe (Rf=0%) 1.17340
Worst Drawdown 0.09616
Calmar Ratio 1.27167
```

Of course, also worth nothing is that the annualized standard deviation is indeed very close to 10%, even with the ensemble. And it’s nice that there is a Sharpe past 1. Of course, given that these are mutual funds being backtested, these results are optimistic due to the unrealistic execution assumptions (can’t trade sooner than once every *two* months).

Anyway, let’s introduce our stop-loss rule, inspired by Andrew Lo’s paper.

```loStopLoss <- function(returns, sdPeriod = 12, sdScaling = 1, sdThresh = 1.5, cooldown = 3) {
stratRets <- list()
count <- 1
stratComplete <- FALSE
originalRets <- returns
ddThresh <- -runSD(returns, n = sdPeriod) * sdThresh * sdScaling
while(!stratComplete) {
retDD <- PerformanceAnalytics:::Drawdowns(returns)
DDbreakthrough <- retDD < ddThresh & lag(retDD) > ddThresh
firstBreak <- which.max(DDbreakthrough) #first threshold breakthrough, if 1, we have no breakthrough
#the above line is unintuitive since this is a boolean vector, so it returns the first value of TRUE
if(firstBreak > 1) { #we have a drawdown breakthrough if this is true
stratRets[[count]] <- returns[1:firstBreak,] #subset returns through our threshold breakthrough
nextPoint <- firstBreak + cooldown + 1 #next point of re-entry is the point after the cooldown period
if(nextPoint <= (nrow(returns)-1)) { #if we can re-enter, subset the returns and return to top of loop
returns <- returns[nextPoint:nrow(returns),]
ddThresh <- ddThresh[nextPoint:nrow(ddThresh),]
count <- count+1
} else { #re-entry point is after data exhausted, end strategy
stratComplete <- TRUE
}
} else { #there are no more critical drawdown breakthroughs, end strategy
stratRets[[count]] <- returns
stratComplete <- TRUE
}
}
stratRets <- do.call(rbind, stratRets) #combine returns
expandRets <- cbind(stratRets, originalRets) #account for all the days we missed
expandRets[is.na(expandRets[,1]), 1] <- 0 #cash positions will be zero
rets <- expandRets[,1]
colnames(rets) <- paste(cooldown, sdThresh, sep="_")
return(rets)
}
```

Essentially, the way it works is like this: the function computes all the drawdowns for a return series, along with its running standard deviation (non-annualized–if you want to annualize it, change the sdScaling parameter to something like sqrt(12) for monthly or sqrt(252) for daily data). Next, it looks for when the drawdown crossed a critical threshold, then cuts off that portion of returns and standard deviation history, and moves ahead in history by the cooldown period specified, and repeats. Most of the code is simply dealing with corner cases (is there even a time to use the stop rule? What about iterating when there isn’t enough data left?), and then putting the results back together again.

In any case, for the sake of simplicity, this function doesn’t use two different time scales (IE compute volatility using daily data, make decisions monthly), so I’m sticking with using a 12-month rolling volatility, as opposed to 252 day rolling volatility multiplied by the square root of 21.

Finally, here are another 54 runs to see if Andrew Lo’s stop-loss rule works here. Essentially, the intuition behind this is that if the strategy breaks down, it’ll continue to break down, so it would be prudent to just turn it off for a little while.

Here are the trial runs:

```
threshVec <- seq(0, 2, by=.25)
cooldownVec <- c(1:6)
sharpes <- list()
params <- expand.grid(threshVec, cooldownVec)
for(i in 1:nrow(params)) {
configuration <- loStopLoss(returns = strat, sdThresh = params[i,1],
cooldown = params[i, 2])
sharpes[[i]] <- table.AnnualizedReturns(configuration)[3,]
}
sharpes <- do.call(c, sharpes)

loStoplossFrame <- cbind(params, sharpes)
loStoplossFrame\$improvement <- loStoplossFrame[,3] - table.AnnualizedReturns(strat)[3,]

colnames(loStoplossFrame) <- c("Threshold", "Cooldown", "Sharpe", "Improvement")
```

And a plot of the results.

```ggplot(loStoplossFrame, aes(x = Threshold, y = Cooldown, fill=Improvement)) +
geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red", midpoint = 0)
```

Result: Result: at this level, and at this frequency (retaining the monthly decision-making process), the stop-loss rule basically does nothing in order to improve the risk-reward trade-off in the best case scenarios, and in most scenarios, simply hurts. 54 trials down the drain, bringing us up to 223 trials. So, what does the final result look like?

```charts.PerformanceSummary(strat)
```

Here’s the final in-sample equity curve–and the first one featured in this entire series. This is, of course, a *feature* of hypothesis-driven development. Playing whack-a-mole with equity curve bumps is what is a textbook case of overfitting. So, without further ado: And now we can see why stop-loss rules generally didn’t add any value to this strategy. Simply, it had very few periods of sustained losses at the monthly frequency, and thus, very little opportunity for a stop-loss rule to add value. Sure, the occasional negative month crept in, but there was no period of sustained losses. Furthermore, Yahoo Finance may not have perfect fidelity on dividends on mutual funds from the late 90s to early 2000s, so the initial flat performance may also be a rather conservative estimate on the strategy’s performance (then again, as I stated before, using mutual funds themselves is optimistic given the unrealistic execution assumptions, so maybe it cancels out). Now, if this equity curve were to be presented without any context, one may easily question whether or not it was curve-fit. To an extent, one can argue that the volatility computation period may be optimized, though I’d hardly call a 252-day (one-year) rolling volatility estimate a curve-fit.

Next, I’d like to introduce another concept on this blog that I’ve seen colloquially addressed in other parts of the quantitative blogging space, particularly by Mike Harris of Price Action Lab, namely that of multiple hypothesis testing, and about the need to correct for that.

Luckily for that, Drs. David H. Bailey and Marcos Lopez De Prado wrote a paper to address just that. Also, I’d like to note one very cool thing about this paper: it actually has a worked-out numerical example! In my opinion, there are very few things as helpful as showing a simple result that transforms a collection of mathematical symbols into a result to demonstrate what those symbols actually mean in the span of one page. Oh, and it also includes *code* in the appendix (albeit Python — even though, you know, R is far more developed. If someone can get Marcos Lopez De Prado to switch to R–aka the better research language, that’d be a godsend!).

In any case, here’s the formula for the deflated Sharpe ratio, implemented straight from the paper.

```deflatedSharpe <- function(sharpe, nTrials, varTrials, skew, kurt, numPeriods, periodsInYear) {
emc <- .5772
sr0_term1 <- (1 - emc) * qnorm(1 - 1/nTrials)
sr0_term2 <- emc * qnorm(1 - 1/nTrials * exp(-1))
sr0 <- sqrt(varTrials * 1/periodsInYear) * (sr0_term1 + sr0_term2)

numerator <- (sharpe/sqrt(periodsInYear) - sr0)*sqrt(numPeriods - 1)

skewnessTerm <- 1 - skew * sharpe/sqrt(periodsInYear)
kurtosisTerm <- (kurt-1)/4*(sharpe/sqrt(periodsInYear))^2

denominator <- sqrt(skewnessTerm + kurtosisTerm)

result <- pnorm(numerator/denominator)
pval <- 1-result
return(pval)
}
```

The inputs are the strategy’s Sharpe ratio, the number of backtest runs, the variance of the sharpe ratios of those backtest runs, the skewness of the candidate strategy, its non-excess kurtosis, the number of periods in the backtest, and the number of periods in a year. Unlike the De Prado paper, I choose to return the p-value (EG 1-.

Let’s collect all our Sharpe ratios now.

```allSharpes <- c(as.numeric(table.AnnualizedReturns(sigBoxplots)[3,]),
meltedSharpes\$Sharpe,
as.numeric(table.AnnualizedReturns(strat)[3,]),
loStoplossFrame\$Sharpe)
```

And now, let’s plug and chug!

```stratSignificant <- deflatedSharpe(sharpe = as.numeric(table.AnnualizedReturns(strat)[3,]),
nTrials = length(allSharpes), varTrials = var(allSharpes),
skew = as.numeric(skewness(strat)), kurt = as.numeric(kurtosis(strat)) + 3,
numPeriods = nrow(strat), periodsInYear = 12)
```

And the result!

```> stratSignificant
 0.01311
```

Success! At least at the 5% level…and a rejection at the 1% level, and any level beyond that.

So, one last thing! Out-of-sample testing on ETFs (and mutual funds during the ETF burn-in period)!

```symbols2 <- c("CWB", "JNK", "TLT", "SHY", "PCY")
getSymbols(symbols2, from='1900-01-01')
prices2 <- list()
for(tmp in symbols2) {
}
prices2 <- do.call(cbind, prices2)
colnames(prices2) <- substr(colnames(prices2), 1, 3)
returns2 <- na.omit(Return.calculate(prices2))

monthRets2 <- apply.monthly(returns2, Return.cumulative)

oosStrat <- list()
for(i in 1:12) {
oosStrat[[i]] <- ruleBacktest(returns = monthRets2, nMonths = i, dailyReturns = returns2, nSD = 252)[]
}
oosStrat <- do.call(cbind, oosStrat)
oosStrat <- Return.portfolio(R = na.omit(oosStrat), rebalance_on="months")

symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
}
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
oosMFreturns <- na.omit(Return.calculate(prices))
oosMFmonths <- apply.monthly(oosMFreturns, Return.cumulative)

oosMF <- list()
for(i in 1:12) {
oosMF[[i]] <- ruleBacktest(returns = oosMFmonths, nMonths = i, dailyReturns = oosMFreturns, nSD=252)[]
}
oosMF <- do.call(cbind, oosMF)
oosMF <- Return.portfolio(R = na.omit(oosMF), rebalance_on="months")
oosMF <- oosMF["2009-04/2011-03"]

fullOOS <- rbind(oosMF, oosStrat)

rbind(table.AnnualizedReturns(fullOOS), maxDrawdown(fullOOS), CalmarRatio(fullOOS))
charts.PerformanceSummary(fullOOS)
```

And the results:

```portfolio.returns
Annualized Return 0.1273
Annualized Std Dev 0.0901
Annualized Sharpe (Rf=0%) 1.4119
Worst Drawdown 0.1061
Calmar Ratio 1.1996
```

And one more equity curve (only the second!). In other words, the out-of-sample statistics compare to the in-sample statistics. The Sharpe ratio is higher, the Calmar slightly lower. But on a whole, the performance has kept up. Unfortunately, the strategy is currently in a drawdown, but that’s the breaks.

So, whew. That concludes my first go at hypothesis-driven development, and has hopefully at least demonstrated the process to a satisfactory degree. What started off as a toy strategy instead turned from a rejection to a not rejection to demonstrating ideas from three separate papers, and having out-of-sample statistics that largely matched if not outperformed the in-sample statistics. For those thinking about investing in this strategy (again, here is the strategy: take 12 different portfolios, each selecting the asset with the highest momentum over months 1-12, target an annualized volatility of 10%, with volatility defined as the rolling annualized 252-day standard deviation, and equal-weight them every month), what I didn’t cover was turnover and taxes (this is a bond ETF strategy, so dividends will play a large role).

Now, one other request–many of the ideas for this blog come from my readers. I am especially interested in things to think about from readers with line-management responsibilities, as I think many of the questions from those individuals are likely the most universally interesting ones. If you’re one such individual, I’d appreciate an introduction, and knowing who more of the individuals in my reader base are.

NOTE: while I am currently consulting, I am always open to networking, meeting up, consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

# Hypothesis Driven Development Part IV: Testing The Barroso/Santa Clara Rule

This post will deal with applying the constant-volatility procedure written about by Barroso and Santa Clara in their paper “Momentum Has Its Moments”.

The last two posts dealt with evaluating the intelligence of the signal-generation process. While the strategy showed itself to be marginally better than randomly tossing darts against a dartboard and I was ready to reject it for want of moving onto better topics that are slightly less of a toy in terms of examples than a little rotation strategy, Brian Peterson told me to see this strategy through to the end, including testing out rule processes.

First off, to make a distinction, rules are not signals. Rules are essentially a way to quantify what exactly to do assuming one acts upon a signal. Things such as position sizing, stop-loss processes, and so on, all fall under rule processes.

This rule deals with using leverage in order to target a constant volatility.

So here’s the idea: in their paper, Pedro Barroso and Pedro Santa Clara took the Fama-French momentum data, and found that the classic WML strategy certainly outperforms the market, but it has a critical downside, namely that of momentum crashes, in which being on the wrong side of a momentum trade will needlessly expose a portfolio to catastrophically large drawdowns. While this strategy is a long-only strategy (and with fixed-income ETFs, no less), and so would seem to be more robust against such massive drawdowns, there’s no reason to leave money on the table. To note, not only have Barroso and Santa Clara covered this phenomena, but so have others, such as Tony Cooper in his paper “Alpha Generation and Risk Smoothing Using Volatility of Volatility”.

In any case, the setup here is simple: take the previous portfolios, consisting of 1-12 month momentum formation periods, and every month, compute the annualized standard deviation, using a 21-252 (by 21) formation period, for a total of 12 x 12 = 144 trials. (So this will put the total trials run so far at 24 + 144 = 168…bonus points if you know where this tidbit is going to go).

Here’s the code (again, following on from the last post, which follows from the second post, which follows from the first post in this series).

```require(reshape2)
require(ggplot2)

ruleBacktest <- function(returns, nMonths, dailyReturns,
nSD=126, volTarget = .1) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(returns))
selection <- (nMonthAvgRank==5) * 1 #select highest average performance
dailyBacktest <- Return.portfolio(R = dailyReturns, weights = selection)
constantVol <- volTarget/(runSD(dailyBacktest, n = nSD) * sqrt(252))
monthlyLeverage <- na.omit(constantVol[endpoints(constantVol), on ="months"])
wts <- cbind(monthlyLeverage, 1-monthlyLeverage)
constantVolComponents <- cbind(dailyBacktest, 0)
out <- Return.portfolio(R = constantVolComponents, weights = wts)
out <- apply.monthly(out, Return.cumulative)
return(out)
}

t1 <- Sys.time()
allPermutations <- list()
for(i in seq(21, 252, by = 21)) {
monthVariants <- list()
for(j in 1:12) {
trial <- ruleBacktest(returns = monthRets, nMonths = j, dailyReturns = sample, nSD = i)
sharpe <- table.AnnualizedReturns(trial)[3,]
monthVariants[[j]] <- sharpe
}
allPermutations[[i]] <- do.call(c, monthVariants)
}
allPermutations <- do.call(rbind, allPermutations)
t2 <- Sys.time()
print(t2-t1)

rownames(allPermutations) <- seq(21, 252, by = 21)
colnames(allPermutations) <- 1:12

baselineSharpes <- table.AnnualizedReturns(algoPortfolios)[3,]
baselineSharpeMat <- matrix(rep(baselineSharpes, 12), ncol=12, byrow=TRUE)

diffs <- allPermutations - as.numeric(baselineSharpeMat)
require(reshape2)
require(ggplot2)
meltedDiffs <-melt(diffs)

colnames(meltedDiffs) <- c("volFormation", "momentumFormation", "sharpeDifference")
ggplot(meltedDiffs, aes(x = momentumFormation, y = volFormation, fill=sharpeDifference)) +

meltedSharpes <- melt(allPermutations)
colnames(meltedSharpes) <- c("volFormation", "momentumFormation", "Sharpe")
ggplot(meltedSharpes, aes(x = momentumFormation, y = volFormation, fill=Sharpe)) +
geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red", midpoint = mean(allPermutations))
```

Again, there’s no parallel code since this is a relatively small example, and I don’t know which OS any given instance of R runs on (windows/linux have different parallelization infrastructure).

So the idea here is to simply compare the Sharpe ratios with different volatility lookback periods against the baseline signal-process-only portfolios. The reason I use Sharpe ratios, and not say, CAGR, volatility, or drawdown is that Sharpe ratios are scale-invariant. In this case, I’m targeting an annualized volatility of 10%, but with a higher targeted volatility, one can obtain higher returns at the cost of higher drawdowns, or be more conservative. But the Sharpe ratio should stay relatively consistent within reasonable bounds.

So here are the results:

Sharpe improvements: In this case, the diagram shows that on a whole, once the volatility estimation period becomes long enough, the results are generally positive. Namely, that if one uses a very short estimation period, that volatility estimate is more dependent on the last month’s choice of instrument, as opposed to the longer-term volatility of the system itself, which can create poor forecasts. Also to note is that the one-month momentum formation period doesn’t seem overly amenable to the constant volatility targeting scheme (there’s basically little improvement if not a slight drag on risk-adjusted performance). This is interesting in that the baseline Sharpe ratio for the one-period formation is among the best of the baseline performances. However, on a whole, the volatility targeting actually does improve risk-adjusted performance of the system, even one as simple as throwing all your money into one asset every month based on a single momentum signal.

Absolute Sharpe ratios: In this case, the absolute Sharpe ratios look fairly solid for such a simple system. The 3, 7, and 9 month variants are slightly lower, but once the volatility estimation period reaches between 126 and 252 days, the results are fairly robust. The Barroso and Santa Clara paper uses a period of 126 days to estimate annualized volatility, which looks solid across the entire momentum formation period spectrum.

In any case, it seems the verdict is that a constant volatility target improves results.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

# Hypothesis Driven Development Part III: Monte Carlo In Asset Allocation Tests

This post will show how to use Monte Carlo to test for signal intelligence.

Although I had rejected this strategy in the last post, I was asked to do a monte-carlo analysis of a thousand random portfolios to see how the various signal processes performed against said distribution. Essentially, the process is quite simple: as I’m selecting one asset each month to hold, I simply generate a random number between 1 and the amount of assets (5 in this case), and hold it for the month. Repeat this process for the number of months, and then repeat this process a thousand times, and see where the signal processes fall across that distribution.

I didn’t use parallel processing here since Windows and Linux-based R have different parallel libraries, and in the interest of having the code work across all machines, I decided to leave it off.

Here’s the code:

```randomAssetPortfolio <- function(returns) {
numAssets <- ncol(returns)
numPeriods <- nrow(returns)
assetSequence <- sample.int(numAssets, numPeriods, replace=TRUE)
wts <- matrix(nrow = numPeriods, ncol=numAssets, 0)
wts <- xts(wts, order.by=index(returns))
for(i in 1:nrow(wts)) {
wts[i,assetSequence[i]] <- 1
}
randomPortfolio <- Return.portfolio(R = returns, weights = wts)
return(randomPortfolio)
}

t1 <- Sys.time()
randomPortfolios <- list()
set.seed(123)
for(i in 1:1000) {
randomPortfolios[[i]] <- randomAssetPortfolio(monthRets)
}
randomPortfolios <- do.call(cbind, randomPortfolios)
t2 <- Sys.time()
print(t2-t1)

algoPortfolios <- sigBoxplots[,1:12]
randomStats <- table.AnnualizedReturns(randomPortfolios)
algoStats <- table.AnnualizedReturns(algoPortfolios)

par(mfrow=c(3,1))
hist(as.numeric(randomStats[1,]), breaks = 20, main = 'histogram of monte carlo annualized returns',
xlab='annualized returns')
abline(v=as.numeric(algoStats[1,]), col='red')
hist(as.numeric(randomStats[2,]), breaks = 20, main = 'histogram of monte carlo volatilities',
xlab='annualized vol')
abline(v=as.numeric(algoStats[2,]), col='red')
hist(as.numeric(randomStats[3,]), breaks = 20, main = 'histogram of monte carlo Sharpes',
xlab='Sharpe ratio')
abline(v=as.numeric(algoStats[3,]), col='red')

allStats <- cbind(randomStats, algoStats)
aggregateMean <- apply(allStats, 1, mean)
aggregateDevs <- apply(allStats, 1, sd)

algoPs <- 1-pnorm(as.matrix((algoStats - aggregateMean)/aggregateDevs))

plot(as.numeric(algoPs[1,])~c(1:12), main='Return p-values',
xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(1-as.numeric(algoPs[2,])~c(1:12), ylim=c(0, .5), main='Annualized vol p-values',
xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(as.numeric(algoPs[3,])~c(1:12), main='Sharpe p-values',
xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')
```

And here are the results:  In short, compared to monkeys throwing darts, to use some phrasing from the Price Action Lab blog, these signal processes are only marginally intelligent, if at all, depending on the variation one chooses. Still, I was recommended to see this process through the end, and evaluate rules, so next time, I’ll evaluate one easy-to-implement rule.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

# Hypothesis-Driven Development Part II

This post will evaluate signals based on the rank regression hypotheses covered in the last post.

The last time around, we saw that rank regression had a very statistically significant result. Therefore, the next step would be to evaluate the basic signals — whether or not there is statistical significance in the actual evaluation of the signal–namely, since the strategy from SeekingAlpha simply selects the top-ranked ETF every month, this is a very easy signal to evaluate.

Simply, using the 1-24 month formation periods for cumulative sum of monthly returns, select the highest-ranked ETF and hold it for one month.

Here’s the code to evaluate the signal (continued from the last post), given the returns, a month parameter, and an EW portfolio to compare with the signal.

```
signalBacktest <- function(returns, nMonths, ewPortfolio) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(returns))
selection <- (nMonthAvgRank==5) * 1 #select highest average performance
sigTest <- Return.portfolio(R = returns, weights = selection)
difference <- sigTest - ewPortfolio
diffZscore <- mean(difference)/sd(difference)
sigZscore <- mean(sigTest)/sd(sigTest)
return(list(sigTest, difference, mean(sigTest), sigZscore, mean(difference), diffZscore))
}

ewPortfolio <- Return.portfolio(monthRets, rebalance_on="months")

sigBoxplots <- list()
excessBoxplots <- list()
sigMeans <- list()
sigZscores <- list()
diffMeans <- list()
diffZscores <- list()
for(i in 1:24) {
tmp <- signalBacktest(monthRets, nMonths = i, ewPortfolio)
sigBoxplots[[i]] <- tmp[]
excessBoxplots[[i]] <- tmp[]
sigMeans[[i]] <- tmp[]
sigZscores[[i]] <- tmp[]
diffMeans[[i]] <- tmp[]
diffZscores[[i]] <- tmp[]
}

sigBoxplots <- do.call(cbind, sigBoxplots)
excessBoxplots <- do.call(cbind, excessBoxplots)
sigMeans <- do.call(c, sigMeans)
sigZscores <- do.call(c, sigZscores)
diffMeans <- do.call(c, diffMeans)
diffZscores <- do.call(c, diffZscores)

par(mfrow=c(2,1))
plot(as.numeric(sigMeans)*100, type='h', main = 'signal means',
ylab = 'percent per month', xlab='formation period')
plot(as.numeric(sigZscores), type='h', main = 'signal Z scores',
ylab='Z scores', xlab='formation period')

plot(as.numeric(diffMeans)*100, type='h', main = 'mean difference between signal and EW',
ylab = 'percent per month', xlab='formation period')
plot(as.numeric(diffZscores), type='h', main = 'difference Z scores',
ylab = 'Z score', xlab='formation period')

boxplot(as.matrix(sigBoxplots), main = 'signal boxplots', xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(sigBoxplots[,1:12]), main = 'signal boxplots 1 through 12 month formations',
xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')

boxplot(as.matrix(excessBoxplots), main = 'difference (signal - EW) boxplots',
xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(excessBoxplots[,1:12]), main = 'difference (signal - EW) boxplots 1 through 12 month formations',
xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')
```

Okay, so what’s going on here is that I compare the signal against the equal weight portfolio, and take means and z scores of both the signal values in general, and against the equal weight portfolio. I plot these values, along with boxplots of the distributions of both the signal process, and the difference between the signal process and the equal weight portfolio.

Here are the results:    To note, the percents are already multiplied by 100, so in the best cases, the rank strategy outperforms the equal weight strategy by about 30 basis points per month. However, these results are…not even in the same parking lot as statistical significance, let alone in the same ballpark.

Now, at this point, in case some people haven’t yet read Brian Peterson’s paper on strategy development, the point of hypothesis-driven development is to *reject* hypothetical strategies ASAP before looking at any sort of equity curve and trying to do away with periods of underperformance. So, at this point, I would like to reject this entire strategy because there’s no statistical evidence to actually continue. Furthermore, because August 2015 was a rather interesting month, especially in terms of volatility dispersion, I want to return to volatility trading strategies, now backed by hypothesis-driven development.

If anyone wants to see me continue to rule testing with this process, let me know. If not, I have more ideas on the way.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

# Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses

This post will begin to apply a hypothesis-driven development framework (that is, the framework written by Brian Peterson on how to do strategy construction correctly, found here) to a strategy I’ve come across on SeekingAlpha. Namely, Cliff Smith posted about a conservative bond rotation strategy, which makes use of short-term treasuries, long-term treasuries, convertibles, emerging market debt, and high-yield corporate debt–that is, SHY, TLT, CWB, PCY, and JNK. What this post will do is try to put a more formal framework on whether or not this strategy is a valid one to begin with.

One note: For the sake of balancing succinctness for blog consumption and to demonstrate the computational techniques more quickly, I’ll be glossing over background research write-ups for this post/strategy, since it’s yet another take on time-series/cross-sectional momentum, except pared down to something more implementable for individual investors, as opposed to something that requires a massive collection of different instruments for massive, institutional-class portfolios.

Introduction, Overview, Objectives, Constraints, Assumptions, and Hypotheses to be Tested:

Momentum. It has been documented many times. For the sake of brevity, I’ll let readers follow the links if they’re so inclined, but among them are Jegadeesh and Titman’s seminal 1993 paper, Mark Carhart’s 1997 paper, Andreu et. Al (2012), Barroso and Santa-Clara (2013), Ilmanen’s Expected Returns (which covers momentum), and others. This list, of course, is far from exhaustive, but the point stands. Formation periods of several months (up to a year) should predict returns moving forward on some holding period, be it several months, or as is more commonly seen, one month.

Furthermore, momentum applies in two varieties–cross sectional, and time-series. Cross-sectional momentum asserts that assets that outperformed among a group will continue to outperform, while time-series momentum asserts that assets that have risen in price during a formation period will continue to do so for the short-term future.

Cliff Smith’s strategy depends on the latter, effectively, among a group of five bond ETFs. I am not certain of the objective of the strategy (he didn’t mention it), as PCY, JNK, and CWB, while they may be fixed-income in name, possess volatility on the order of equities. I suppose one possible “default” objective would be to achieve an outperforming total return against an equal-weighted benchmark, both rebalanced monthly.

The constraints are that one would need a sufficient amount of capital such that fixed transaction costs are negligible, since the strategy is a single-instrument rotation type, meaning that each month may have two-way turnover of 200% (sell one ETF, buy another). On the other hand, one would assume that the amount of capital deployed is small enough such that execution costs of trading do not materially impact the performance of the strategy. That is to say, moving multiple billions from one of these ETFs to the other is a non-starter. As all returns are computed close-to-close for the sake of simplicity, this creates the implicit assumption that the market impact and execution costs are very small compared to overall returns.

There are two overarching hypotheses to be tested in order to validate the efficacy of this strategy:

1) Time-series momentum: while it has been documented for equities and even industry/country ETFs, it may not have been formally done so yet for fixed-income ETFs, and their corresponding mutual funds. In order to validate this strategy, it should be investigated if the particular instruments it selects adhere to the same phenomena.

2) Cross-sectional momentum: again, while this has been heavily demonstrated in the past with regards to equities, ETFs are fairly new, and of the five mutual funds Cliff Smith selected, the latest one only has data going back to 1997, thus allowing less sophisticated investors to easily access diversified fixed income markets a relatively new innovation.

Essentially, both of these can be tested over a range of parameters (1-24 months).

Another note: with hypothesis-driven strategy development, the backtest is to be *nothing more than a confirmation of all the hypotheses up to that point*. That is, re-optimizing on the backtest itself means overfitting. Any proposed change to a strategy should be done in the form of tested hypotheses, as opposed to running a bunch of backtests and selecting the best trials. Taken another way, this means that every single proposed element of a strategy needs to have some form of strong hypothesis accompanying it, in order to be justified.

So, here are the two hypotheses I tested on the corresponding mutual funds:

```require(quantmod)
require(PerformanceAnalytics)
require(reshape2)
symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
}
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
returns <- na.omit(Return.calculate(prices))

sample <- returns['1997-08/2009-03']
monthRets <- apply.monthly(sample, Return.cumulative)

returnRegression <- function(returns, nMonths) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
nMonthAverage <- na.omit(lag(nMonthAverage))
returns <- returns[index(nMonthAverage)]

rankAvg <- t(apply(nMonthAverage, 1, rank))
rankReturn <- t(apply(returns, 1, rank))

meltedAverage <- melt(data.frame(nMonthAverage))
meltedReturns <- melt(data.frame(returns))
meltedRankAvg <- melt(data.frame(rankAvg))
meltedRankReturn <- melt(data.frame(rankReturn))
lmfit <- lm(meltedReturns\$value ~ meltedAverage\$value - 1)
rankLmfit <- lm(meltedRankReturn\$value ~ meltedRankAvg\$value)
return(rbind(summary(lmfit)\$coefficients, summary(rankLmfit)\$coefficients))
}

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
tmp <- returnRegression(monthRets, nMonths=i)
pvals[[i]] <- tmp[1,4]
estimates[[i]] <- tmp[1,1]
rankPs[[i]] <- tmp[2,4]
rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)
```

Essentially, in this case, I take a pooled regression (that is, take the five instruments and pool them together into one giant vector), and regress the cumulative sum of monthly returns against the next month’s return. Also, I do the same thing as the above, except also using cross-sectional ranks for each month, and performing a rank-rank regression. The sample I used was the five mutual funds (CNSAX, FAHDX, VUSTX, VFISX, and PREMX) since their inception to March 2009, since the data for the final ETF begins in April of 2009, so I set aside the ETF data for out-of-sample backtesting.

Here are the results:

```pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
tmp <- returnRegression(monthRets, nMonths=i)
pvals[[i]] <- tmp[1,4]
estimates[[i]] <- tmp[1,1]
rankPs[[i]] <- tmp[2,4]
rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)

plot(estimates, type='h', xlab = 'Months regressed on', ylab='momentum coefficient',
main='future returns regressed on past momentum')
plot(pvals, type='h', xlab='Months regressed on', ylab='p-value', main='momentum significance')
abline(h=.05, col='green')
abline(h=.1, col='red')

plot(rankEstimates, type='h', xlab='Months regressed on', ylab="Rank coefficient",
main='future return ranks regressed on past momentum ranks', ylim=c(0,3))
plot(rankPs, type='h', xlab='Months regressed on', ylab='P-values')
```    Of interest to note is that while much of the momentum literature specifies a reversion effect on time-series momentum at 12 months or greater, all the regression coefficients in this case (even up to 24 months!) proved to be positive, with the very long-term coefficients possessing more statistical significance than the short-term ones. Nevertheless, Cliff Smith’s chosen parameters (the two and four month settings) possess statistical significance at least at the 10% level. However, if one were to be highly conservative in terms of rejecting strategies, that in and of itself may be reason enough to reject this strategy right here.

However, the rank-rank regression (that is, regressing the future month’s cross-sectional rank on the past n month sum cross sectional rank) proved to be statistically significant beyond any doubt, with all p-values being effectively zero. In short, there is extremely strong evidence for cross-sectional momentum among these five assets, which extends out to at least two years. Furthermore, since SHY or VFISX, aka the short-term treasury fund, is among the assets chosen, since it’s a proxy for the risk-free rate, by including it among the cross-sectional rankings, the cross-sectional rankings also implicitly state that in order to be invested into (as this strategy is a top-1 asset rotation strategy), it must outperform the risk-free asset, otherwise, by process of elimination, the strategy will invest into the risk-free asset itself.

In upcoming posts, I’ll look into testing hypotheses on signals and rules.

Lastly, Volatility Made Simple has just released a blog post on the performance of volatility-based strategies for the month of August. Given the massive volatility spike, the dispersion in performance of strategies is quite interesting. I’m happy that in terms of YTD returns, the modified version of my strategy is among the top 10 for the year.