Trading The Odds Volatility Risk Premium: Addressing Data Mining and Curve-Fitting

Several readers, upon seeing the risk and return ratio along with other statistics in the previous post stated that the result may have been the result of data mining/over-optimization/curve-fitting/overfitting, or otherwise bad practice of creating an amazing equity curve whose performance will decay out of sample.

Fortunately, there’s a way to test that assertion. In their book “Trading Systems: A New Approach to System Development and Portfolio Optimization”, Urban Jaekle and Emilio Tomasini use the concept of the “stable region” to demonstrate a way of visualizing whether or not a parameter specification is indeed overfit. The idea of a stable region is that going forward, how robust is a parameter specification to slight changes? If the system just happened to find one good small point in a sea of losers, the strategy is likely to fail going forward. However, if small changes in the parameter specifications still result in profitable configurations, then the chosen parameter set is a valid configuration.

As Frank’s trading strategy only has two parameters (standard deviation computation period, aka runSD for the R function, and the SMA period), rather than make line graphs, I decided to do a brute force grid search just to see other configurations, and plotted the results in the form of heatmaps.

Here’s the modified script for the computations (no parallel syntax in use for the sake of simplicity):

download("https://dl.dropboxusercontent.com/s/jk6der1s5lxtcfy/XIVlong.TXT",
         destfile="longXIV.txt")

download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT", 
         destfile="longVXX.txt") #requires downloader package

xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxmt <- xts(read.zoo("vxmtdailyprices.csv", format="%m/%d/%Y", sep=",", header=TRUE))

getSymbols("^VIX", from="2004-03-29")

vixvxmt <- merge(Cl(VIX), Cl(vxmt))
vixvxmt[is.na(vixvxmt[,2]),2] <- vixvxmt[is.na(vixvxmt[,2]),1]

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

getSymbols("^GSPC", from="1990-01-01")
spyRets <- diff(log(Cl(GSPC)))

t1 <- Sys.time()
MARmatrix <- list()
SharpeMatrix <- list()
for(i in 2:21) {
  
  smaMAR <- list()
  smaSharpe <- list()
  for(j in 2:21){
    spyVol <- runSD(spyRets, n=i)
    annSpyVol <- spyVol*100*sqrt(252)
    vols <- merge(vixvxmt[,2], annSpyVol, join='inner')
    
    
    vols$smaDiff <- SMA(vols[,1] - vols[,2], n=j)
    vols$signal <- vols$smaDiff > 0
    vols$signal <- lag(vols$signal, k = 1)
    
    stratRets <- vols$signal*xivRets + (1-vols$signal)*vxxRets
    #charts.PerformanceSummary(stratRets)
    #stratRets[is.na(stratRets)] <- 0
    #plot(log(cumprod(1+stratRets)))
    
    stats <- data.frame(cbind(Return.annualized(stratRets)*100, 
                              maxDrawdown(stratRets)*100, 
                              SharpeRatio.annualized(stratRets)))
    
    colnames(stats) <- c("Annualized Return", "Max Drawdown", "Annualized Sharpe")
    MAR <- as.numeric(stats[1])/as.numeric(stats[2])    
    smaMAR[[j-1]] <- MAR
    smaSharpe[[j-1]] <- stats[,3]
  }
  rm(vols)
  smaMAR <- do.call(c, smaMAR)
  smaSharpe <- do.call(c, smaSharpe)
  MARmatrix[[i-1]] <- smaMAR
  SharpeMatrix[[i-1]] <- smaSharpe
}
t2 <- Sys.time()
print(t2-t1)

Essentially, just wrap the previous script in a nested for loop over the two parameters.

I chose GGplot2 to plot the heatmaps for more control with coloring.

Here’s the heatmap for the MAR ratio (that is, returns over max drawdown):

MARmatrix <- do.call(cbind, MARmatrix)
rownames(MARmatrix) <- paste0("SMA", c(2:21))
colnames(MARmatrix) <- paste0("runSD", c(2:21))
MARlong <- melt(MARmatrix)
colnames(MARlong) <- c("SMA", "runSD", "MAR")
MARlong$SMA <- as.numeric(gsub("SMA", "", MARlong$SMA))
MARlong$runSD <- as.numeric(gsub("runSD", "", MARlong$runSD))
MARlong$scaleMAR <- scale(MARlong$MAR)
ggplot(MARlong, aes(x=SMA, y=runSD, fill=scaleMAR))+geom_tile()+scale_fill_gradient2(high="skyblue", mid="blue", low="red")

Here’s the result:

Immediately, we start to see some answers to questions regarding overfitting. First off, is the parameter set published by TradingTheOdds optimized? Yes. In fact, not only is it optimized, it’s by far and away the best value on the heatmap. However, when discussing overfitting, curve-fitting, and the like, the question to ask isn’t “is this the best parameter set available”, but rather “is the parameter set in a stable region?” The answer, in my opinion to that, is yes, as noted by the differing values of the SMA for the 2-day sample standard deviation. Note that this quantity, due to being the sample standard deviation, is actually the square root of the two squared residuals of that time period.

Here are the MAR values for those configurations:

> MARmatrix[1:10,1]
    SMA2     SMA3     SMA4     SMA5     SMA6     SMA7     SMA8     SMA9    SMA10    SMA11 
2.471094 2.418934 2.067463 3.027450 2.596087 2.209904 2.466055 1.394324 1.860967 1.650588 

In this case, not only is the region stable, but the MAR values are all above 2 until the SMA9 value.

Furthermore, note that aside from the stable region of the 2-day sample standard deviation, a stable region using a standard deviation of ten days with less smoothing from the SMA (because there’s already an average inherent in the sample standard deviation) also exists. Let’s examine those values.

> MARmatrix[2:5, 9:16]
      runSD10  runSD11  runSD12  runSD13  runSD14  runSD15  runSD16   runSD17
SMA3 1.997457 2.035746 1.807391 1.713263 1.803983 1.994437 1.695406 1.0685859
SMA4 2.167992 2.034468 1.692622 1.778265 1.828703 1.752648 1.558279 1.1782665
SMA5 1.504217 1.757291 1.742978 1.963649 1.923729 1.662687 1.248936 1.0837615
SMA6 1.695616 1.978413 2.004710 1.891676 1.497672 1.471754 1.194853 0.9326545

Apparently, a standard deviation between 2 and 3 weeks with minimal SMA smoothing also produced some results comparable to the 2-day variant.

Off to the northeast of the plot, using longer periods for the parameters simply causes the risk-to-reward performance to drop steeply. This is essentially an illustration of the detriments of lag.

Finally, there’s a small rough patch between the two aforementioned stable regions. Here’s the data for that.

> MARmatrix[1:5, 4:8]
       runSD5    runSD6    runSD7   runSD8   runSD9
SMA2 1.928716 1.5825265 1.6624751 1.033216 1.245461
SMA3 1.528882 1.5257165 1.2348663 1.364103 1.510653
SMA4 1.419722 0.9497827 0.8491229 1.227064 1.396193
SMA5 1.023895 1.0630939 1.3632697 1.547222 1.465033
SMA6 1.128575 1.3793244 1.4085513 1.440324 1.964293

As you can see, there are some patches where the MAR is below 1, and many where it’s below 1.5. All of these are pretty detached from the stable regions.

Let’s repeat this process with the Sharpe Ratio heatmap.

SharpeMatrix <- do.call(cbind, SharpeMatrix)
rownames(SharpeMatrix) <- paste0("SMA", c(2:21))
colnames(SharpeMatrix) <- paste0("runSD", c(2:21))
sharpeLong <- melt(SharpeMatrix)
colnames(sharpeLong) <- c("SMA", "runSD", "Sharpe")
sharpeLong$SMA <- as.numeric(gsub("SMA", "", sharpeLong$SMA))
sharpeLong$runSD <- as.numeric(gsub("runSD", "", sharpeLong$runSD))
ggplot(sharpeLong, aes(x=SMA, y=runSD, fill=Sharpe))+geom_tile()+
  scale_fill_gradient2(high="skyblue", mid="blue", low="darkred", midpoint=1.5)

And the result:

Again, the TradingTheOdds parameter configuration lights up, but among a region of strong configurations. This time, we can see that in comparison to the rest of the heatmap, the northern stable region seems to have become clustered around the 10-day standard deviation (or 11) with SMAs of 2, 3, and 4. The regions to the northeast are also more subdued by comparison, with the Sharpe ratio bottoming out around 1.

Let’s look at the numerical values again for the same regions.

Two-day standard deviation region:

> SharpeMatrix[1:10,1]
    SMA2     SMA3     SMA4     SMA5     SMA6     SMA7     SMA8     SMA9    SMA10    SMA11 
1.972256 2.210515 2.243040 2.496178 1.975748 1.965730 1.967022 1.510652 1.963970 1.778401 

Again, numbers the likes of which I myself haven’t been able to achieve with more conventional strategies, and numbers the likes of which I haven’t really seen anywhere for anything on daily data. So either the strategy is fantastic, or something is terribly wrong outside the scope of the parameter optimization.

Two week standard deviation region:

> SharpeMatrix[1:5, 9:16]
      runSD10  runSD11  runSD12  runSD13  runSD14  runSD15  runSD16  runSD17
SMA2 1.902430 1.934403 1.687430 1.725751 1.524354 1.683608 1.719378 1.506361
SMA3 1.749710 1.758602 1.560260 1.580278 1.609211 1.722226 1.535830 1.271252
SMA4 1.915628 1.757037 1.560983 1.585787 1.630961 1.512211 1.433255 1.331697
SMA5 1.684540 1.620641 1.607461 1.752090 1.660533 1.500787 1.359043 1.276761
SMA6 1.735760 1.765137 1.788670 1.687369 1.507831 1.481652 1.318751 1.197707

Again, pretty outstanding numbers.

The rough patch:

> SharpeMatrix[1:5, 4:8]
       runSD5   runSD6   runSD7   runSD8   runSD9
SMA2 1.905192 1.650921 1.667556 1.388061 1.454764
SMA3 1.495310 1.399240 1.378993 1.527004 1.661142
SMA4 1.591010 1.109749 1.041914 1.411985 1.538603
SMA5 1.288419 1.277330 1.555817 1.753903 1.685827
SMA6 1.278301 1.390989 1.569666 1.650900 1.777006

All Sharpe ratios higher than 1, though some below 1.5

So, to conclude this post:

Was the replication using optimized parameters? Yes. However, those optimized parameters were found within a stable (and even strong) region. Furthermore, it isn’t as though the strategy exhibits poor risk-to-return metrics beyond those regions, either. Aside from raising the lookback period on both the moving average and the standard deviation to levels that no longer resemble the original replication, performance was solid to stellar.

Does this necessarily mean that there is nothing wrong with the strategy? No. It could be that the performance is an artifact of “observe the close, enter at the close” optimistic execution assumptions. For instance, quantstrat (the go-to backtest engine in R for more trading-oriented statistics) uses a next-bar execution method that defaults on the *next* day’s close (so if you look back over my quantstrat posts, I use prefer=”open” so as to get the open of the next bar, instead of its close). It could also be that VXMT itself is an instrument that isn’t very well known in the public sphere, either, seeing as how Yahoo finance barely has any data on it. Lastly, it could simply be the fact that although the risk to reward ratios seem amazing, many investors/mutual fund managers/etc. probably don’t want to think “I’m down 40-60% from my peak”, even though it’s arguably easier to adjust a strategy with a good reward to risk ratio with excess risk by adding cash (to use a cooking analogy, think about your favorite spice. Good in small quantities.), than it is to go and find leverage for a good reward to risk strategy with very small returns (not to mention incurring all the other risks that come with leverage to begin with, such as a 50% drawdown wiping out an account leveraged two to one).

However, to address the question of overfitting, through a modified technique from Jaekle and Tomasini (2009), these are the results I found.

Thanks for reading.

Note: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

14 thoughts on “Trading The Odds Volatility Risk Premium: Addressing Data Mining and Curve-Fitting

  1. Pingback: The Whole Street’s Daily Wrap for 11/19/2014 | The Whole Street

  2. Ilya, great post !
    Just a few thoughts regarding Optimization (not in a general sense only in the special case of this VIX strategy )
    I agree with you that the posting from [“tradingtheodds”] (http://www.tradingtheodds.com/2014/11/ddns-volatility-risk-premium-strategy-revisited-2/)might be overly optimistic, but I think this is not the point. I trade a variation of the strategy since more than a year and you can play around – and download the file – by pointing your browser to [my app](https://alphaminer.shinyapps.io/VolaStrat/). The strategy employs a SINGLE variable (the SMA of a ratio of 2 points on the term structure) . The strategy is robust over the whole range. Even if you want there is almost no way to get a mediocre performance ! This gets me to my point. There must be something other than simple Optimization in play regarding the many VIX (VXX/XIV) strategies – all of them with excellent backtested results – that pop up all over the investing/quant blogs.
    To put the results in perspective: Out of the 10 years charted – because of available price history, understanding of the products and liquidity – in reality only the last 2 or 3 could have been traded in realtime. ( and the returns are probably coming down already ).

    I think the reason why this strategy has worked so well in the past is because this is not a kind of inefficiency of the markets but a real risk premium! Future returns will depend on the shrinkage/expansion of this premium. If the cake is gone ( or the pie much smaller so to speak ) , it’s gone ! No clever optimization will change this fact. To “harvest” the premium I think the simplest strategies are the most efficient, because by adding to many variables and making it overly complicated you risk not getting a piece of the cake even while it is there :-).
    A side note concerning the entry : an entry on the NEXT close is even more profitable than entering on the close as you can check with [the app](https://alphaminer.shinyapps.io/VolaStrat/) . ( there is some short term mean-reversion)

  3. Nice validation of parameter sensitivity! I’ve done this in my own work (brute force loop through the parameters) and some strategies do not hold up. I appreciate the thoroughness. Thanks!

  4. Ilya, great post !
    Just a few thoughts regarding Optimization (not in a general sense only in the special case of this VIX strategy )
    I agree with you that the posting from [“tradingtheodds”] (http://www.tradingtheodds.com/2014/11/ddns-volatility-risk-premium-strategy-revisited-2/)might be overly optimistic, but I think this is not the point. I trade a variation of the strategy since more than a year and you can play around – and download the file – by pointing your browser to [my app](https://alphaminer.shinyapps.io/VolaStrat/). The strategy employs a SINGLE variable (the SMA of a ratio of 2 points on the term structure) . The strategy is robust over the whole range. Even if you want there is almost no way to get a mediocre performance ! This gets me to my point. There must be something other than simple Optimization in play regarding the many VIX (VXX/XIV) strategies – all of them with excellent backtested results – that pop up all over the investing/quant blogs.
    To put the results in perspective: Out of the 10 years charted – because of available price history, understanding of the products and liquidity – in reality only the last 2 or 3 could have been traded in realtime. ( and the returns are probably coming down already ).
    I think the reason why this strategy has worked so well in the past is because this is not a kind of inefficiency of the markets but a real risk premium! Future returns will depend on the shrinkage/expansion of this premium. If the cake is gone ( or the pie much smaller so to speak ) , it’s gone ! No clever optimization will change this fact. To “harvest” the premium I think the simplest strategies are the most efficient, because by adding to many variables and making it overly complicated you risk not getting a piece of the cake even while it is there :-).
    A side note concerning the entry : an entry on the NEXT close is even more profitable than entering on the close as you can check with [the app](https://alphaminer.shinyapps.io/VolaStrat/) . ( there is some short term mean-reversion)

  5. Ilya, I will email you this weekend as I am quite busy today. In the meantime you can try to find out the answer yourself as if you read my comment carefully and download the file it’s not too difficult to find out :-)

  6. It’s dubious to say “optimized parameters were found within a stable (and even strong) region.” The region may not be stable and the peaks may not even be statistically significant.

    You are looking at cross-sectional stability. But remember that strategies that are close together in your heatmap are very highly correlated since they have many trades in common. So if one square has high returns the squares around it will too. That does NOT make for stability. The high returns in a region may be due to a lucky couple of months – that is not stability. Also you omitted a bootstrap analysis which will determine if the peaks in the heatmap are statistically significant or not.

    You need to look at time series stability – do the same parameters produce the best returns over all periods of time? If they do THEN you have robustness.

    Finayy, what is runSD for n=2? It’s just the difference between yesterday’s and today’s returns. That’s an extremely noisy quantity. Trading on that is out of my realm of expertise because it is subject to market microstructure effects, trading hours effects, and is pretty close to day trading. Then there are market frictions on top of that. I doubt that any of the gains are due to roll yield or the VRP or anything discussed in my paper. It’s more to do with S&P 500 serial correlation effects. If you want to make money from those there are (I’m guessing) possibly better ways of doing that.

    • Tony,

      I believe this is why Frank used a moving average for this quantity. As you can see on the heatmap, as you move to the left of SMA5, the MAR decreases (though it’s still strong).

      With bootstrapping, do you mean simply to do a random repeated drawing of the squares on the heatmap?

      Regarding time series stability, that’s actually something that I don’t remember Jaekle and Tomasini formally specifying. But come to think of it, that’s actually a very good point. I actually haven’t considered that since I generally don’t like to over-optimize my parameters, but simply go with round numbers. (EG short term MA crossover would be say, a 10/50, while a medium term might be a 50/100, even though I know for certain those are probably not the best values to use).

      Come to think of it, I really, *really* like the idea of the time-series returns (or MAR, or whatever else) comparison. What would you say would be the proper method of looking at that? Monthly-aggregated cross-sectional ranks?

  7. Pingback: A New Volatility Strategy, And A Heuristic For Analyzing Robustness | QuantStrat TradeR

  8. Pingback: Backtesting Introduction - Bespoke Options | Bespoke Options

  9. Pingback: Robustness Testing – Volatility Strategy – Time Series Bootstrapping – Quantitative Analysis And Back Testing

Leave a reply to Ilya Kipnis Cancel reply