An Introduction to Change Points (packages: ecp and BreakoutDetection)

A forewarning, this post is me going out on a limb, to say the least. In fact, it’s a post/project requested from me by Brian Peterson, and it follows a new paper that he’s written on how to thoroughly replicate research papers. While I’ve replicated results from papers before (with FAA and EAA, for instance), this is a first for me in terms of what I’ll be doing here.

In essence, it is a thorough investigation into the paper “Leveraging Cloud Data to Mitigate User Experience from ‘Breaking Bad’”, and follows the process from the aforementioned paper. So, here we go.


Twitter Breakout Detection Package
Leveraging Cloud Data to Mitigate User Experience From ‘Breaking Bad’

Summary of Paper

Introduction: in a paper detailing the foundation of the breakout detection package (arXiv ID 1411.7955v1), James, Kejariwal, and Matteson demonstrate an algorithm that detects breakouts in twitter’s production-level cloud data. The paper begins by laying the mathematical foundation and motivation for energy statistics, the permutation test, and the E-divisive with medians algorithm, which create a fast way of detecting a shift in median between two nonparametric distributions that is robust to the presence of anomalies. Next, the paper demonstrates a trial run through some of twitter’s production cloud data, and compares the non-parametric E-divisive with medians to an algorithm called PELT. For the third topic, the paper discusses potential applications, one of which is quantitative trading/computational finance. Lastly, the paper states its conclusion, which is the addition of the E-divisive with medians algorithm to the existing literature of change point detection methodologies.

The quantitative and computational methodologies for the paper use a modified variant of energy statistics more resilient against anomalies through the use of robust statistics (viz. median). The idea of energy statistics is to compare the distances of means of two random variables contained within a larger time series. The hypothesis test to determine if this difference is statistically significant is called the permutation test, which permutes data from the two time series a finite number of times to make the process of comparing permuted time series computationally tractable. However, the presence of anomalies, such as in twitter’s production cloud data, would limit the effectiveness of using this process when using simple means. To that end, the paper proposes using the median, and due to the additional computational time resulting from the weaker distribution assumptions to extend the generality of the procedure, the paper devises the E-divisive with medians algorithms, one of which works off of distances between observations, and one works with the medians of the observations themselves (as far as I understand). To summarize, the E-divisive with medians algorithms exist as a way of creating a computationally tractable procedure for determining whether or not a new chunk of time series data is considerably different from the previous through the use of advanced distance statistics robust to anomalies such as those present in twitter’s cloud data.

To compare the performance of the E-divisive with medians algorithms, the paper compares the algorithms to an existing algorithm called PELT (which stands for Pruned Extract Linear Time) in various quantitative metrics, such as “Time To Detect”, meaning the exact moment of the breakout to when the algorithms report it (if at all), along with precision, recall, and the F-measure, defined as the product of precision and recall over their respective sum. Comparing PELT to the E-divisive with medians algorithm showed that the E-divisive algorithm outperformed the PELT algorithm in the majority of data sets. Even when anomalies were either smoothed by taking the rolling median of their neighbors, or by removing them altogether, the E-divisive algorithm still outperformed PELT. Of the variants of the EDM algorithm (EDM head, EDM tail, and EDM-exact), the EDM-tail variant (i.e. the one using the most recent observations) was also quickest to execute. However, due to fewer assumptions about the nature of the underlying generating distributions, the various E-divisive algorithms take longer to execute than the PELT algorithm, with its stronger assumptions, but worse general performance. To summarize, the EDM algorithms outperform PELT in the presence of anomalies, and generally speaking, the EDM-tail variant seems to work best when considering computational running time as well.

The next section dealt with the history and applications of change-point/breakout detection algorithms, in fields such as finance, medical applications, and signal processing. As finance is of a particular interest, the paper acknowledges the ARCH and various flavors of GARCH models, along with the work of James and Matteson in devising a trading strategy based on change-point detection. Applications in genomics to detect cancer exist as well. In any case, the paper cites many sources showing the extension and applications of change-point/breakout detection algorithms, of which finance is one area, especially through work done by Matteson. This will be covered further in the literature review.

To conclude, the paper proposes a new algorithm called the E-divisive with medians, complete with a new statistical permutation test using advanced distance statistics to determine whether or not a time series has had a change in its median. This method makes fewer assumptions about the nature of the underlying distribution than a competitive algorithm, and is robust in the face of anomalies, such as those found in twitter’s production cloud data. This algorithm outperforms a competing algorithm which possessed stronger assumptions about the underlying distribution, detecting a breakout sooner in a time series, even if it took longer to run. The applications of such work range from finance to medical devices, and further beyond. As change-point detection is a technique around which trading strategies can be constructed, it has particular relevance to trading applications.

Statement of Hypothesis

Breakouts can occur in data which does not conform to any known regular distribution, thus rendering techniques that assume a certain distribution less effective. Using the E-divisive with medians algorithm, the paper attempts to predict the presence of breakouts using time series with innovations from no regular distribution as inputs, and if effective, will outperform an existing algorithm that possesses stronger assumptions about distributions. To validate or refute a more general form of this hypothesis, which is the ability of the algorithm to detect breakouts in a timely fashion, this summary test it on the cumulative squared returns of the S&P 500, and compare the analysis created by the breakpoints to the analysis performed by Dr. Robert J. Frey of Keplerian Finance, a former managing director at Renaissance Technologies.

Literature Review


A good portion of the practical/applied motivation of this paper stems from the explosion of growth in mobile internet applications, A/B testing, and other web-specific reasons to detect breakouts. For instance, longer loading time on a mobile web page necessarily results in lower revenues. To give another example, machines in the cloud regularly fail.

However, the more salient literature regarding the topic is the literature dealing with the foundations of the mathematical ideas behind the paper.

Key References

Paper 1:

David S. Matteson and Nicholas A. James. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505):334–345, 2013.

Thesis of work: this paper is the original paper for the e-divisive and e-agglomerative algorithms, which are offline, nonparametric methods of detecting change points in time series. Unlike Paper 3, this paper lays out the mathematical assumptions, lemmas, and proofs for a formal and mathematical presentation of the algorithms. Also, it documents performance against the PELT algorithm, presented in Paper 6 and technically documented in Paper 5. This performance compares favorably. The source paper being replicated builds on the exact mathematics presented in this paper, and the subject of this report uses the ecp R package that is the actual implementation/replication of this work to form a comparison for its own innovations.

Paper 2:

M. L. Rizzo and G. J. Sz´ekely. DISCO analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics, 4(2):1034–1055, 2010

Thesis of work: this paper generalizes the ANOVA using distance statistics. This technique aims to find differences among distributions outside their sample means. Through the use of distance statistics, the techniques aim to more generally answer queries about the nature of distributions (EG identical means, but different distributions as a result of different factors). Its applicability to the source paper is that it forms the basis of the ideas for the paper’s divergence measure, as detailed in its second section.

Paper 3:

Nicholas A. James and David S. Matteson. ecp: An R package for nonparametric multiple change point analysis of multivariate data. Technical report, Cornell University, 2013.

Thesis of work: the paper introduces the ecp package which contains the e-agglomerative and e-divisive algorithms for detecting change points in time series in the R statistical programming language (in use on at least one elite trading desk). The e-divisive method recursively partitions a time series and uses a permutation test to determine change points, but it is computationally intensive. The e-agglomerative algorithm allows for inputs from the user for initial time-series segmentation and is a computationally faster algorithm. Unlike most academic papers, this paper also includes examples of data and code in order to facilitate the use of these algorithms. Furthermore, the paper includes applications to real data, such as the companies found in the Dow Jones Industrial Index, further proving the effectiveness of these methods. This paper is important to the topic in question as the E-divisive algorithm created by James and Matteson form the base changepoint detection process for which the paper builds its own innovations for, and visually compares against; furthermore, the source paper restates many of the techniques found in this paper.

Paper 4:

Owen Vallis, Jordan Hochenbaum, and Arun Kejariwal. A novel technique for long-term anomaly detection in the cloud. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), June 2014.

Thesis of work: the paper proposes the use of piecewise median and median absolute deviation statistics to detect anomalies in time series. The technique builds upon the ESD (Extreme Studentized Deviate) technique and uses piecewise medians to approximate a long-term trend, before extracting seasonality effects from periods shorter than two weeks. The piecewise median method of anomaly detection has a greater F-measure of detecting anomalies than does the standard STL (seasonality trend loess decomposition) or quantile regression techniques. Furthermore, piecewise median executes more than three times faster. The relevance of this paper to the source paper is that it forms the idea of using robust statistics and building the techniques in the paper upon the median as opposed to the mean.

Paper 5:

Rebecca Killick and Kaylea Haynes. changepoint: An R package for changepoint analysis

Thesis of work: manual for the implementation of the PELT algorithm written by Rebecca Killick and Kaylea Haynes. This package is a competing change-point detection package, mainly focused around the Pruned Extraction Linear Time algorithm, although containing other worse algorithms, such as the segment neighborhoods algorithm. Essentially, it is a computational implementation of the work in Paper 2. Its application toward the source paper is that the paper at hand compares its own methodology against PELT, and often outperforms it.

Paper 6:

Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598, 2012

Thesis of work: the paper proposes an algorithm (PELT) that scales linearly in running time with the size of the input time series to detect exact locations of change points. The paper aims to replace both an approximate binary partitioning algorithm, and an optimal segmentation algorithm that doesn’t involve a pruning mechanism to speed up the running time. The paper uses an MLE algorithm at the heart of its dynamic partitioning in order to locate change points. The relevance to the source paper is that through the use of the non-robust MLE procedure, this algorithm is vulnerable to poor performance due to the presence of anomalies/outliers in the data, and thus underperforms the new twitter change point detection methodology which employs robust statistics.

Paper 7:

Wassily Hoeffding. The strong law of large numbers for u-statistics. Institute of Statistics mimeo series, 302, 1961.

Thesis of work: this paper establishes a convergence of the mean of tuples of many random variables to the mean of said random variables, given enough such observations. This paper is a theoretical primer on establishing the above thesis. The mathematics involve use of measure theory and other highly advanced and theoretical manipulations. Its relevance to the source paper is in its use to establish a convergence of an estimated characteristic function.

Similar Work

In terms of financial applications, the papers covering direct applications of change points to financial time series are listed above. Particularly, David Matteson presented his ecp algorithms at R/Finance several years ago, and his work is already in use on at least one professional trading desk. Beyond this, the paper cites works on technical analysis and the classic ARCH and GARCH papers as similar work. However, as this change point algorithm is created to be a batch process, direct comparison with other trend-following (that is, breakout) methods would seem to be a case of apples and oranges, as indicators such as MACD, Donchian channels, and so on, are online methods (meaning they do not have access to the full data set like the e-divisive and the e-divisive with medians algorithms do). However, they are parameterized in terms of their lookback period, and are thus prone to error in terms of inaccurate parameterization resulting from a static lookback value.

In his book Cycle Analytics for Traders, Dr. John Ehlers details an algorithm for computing the dominant cycle of a security—that is, a way to dynamically parameterize the lookback parameter, and if this were to be successfully implemented in R, it may very well allow for improved breakout detection methods than the classic parameterized indicators popularized in the last century.

References With Implementation Hints

Reference 1: Breakout Detection In The Wild

This blog post is a reference contains the actual example included in the R package for the model, written by one of the authors of the source paper. As the data used in the source paper is proprietary twitter production data, and the model is already implemented in the package discussed in this blog post, this makes the package and the included data the go-to source for starting to work with the results presented in the source paper.

Reference 2: Twitter BreakoutDetection R package evaluation

This blog post is that of a blogger altering the default parameters in the model. His analysis of traffic to his blog contains valuable information as to greater flexibility in the use of the R package that is the implementation of the source paper.


The data contained in the source paper comes from proprietary twitter cloud production data. Thus, it is not realistic to obtain a copy of that particular data set. However, one of the source paper’s co-authors, Arun Kejariwal, was so kind as to provide a tutorial, complete with code and sample data, for users to replicate at their convenience. It is this data that we will use for replication.

Building The Model

Stemming from the above, we are fortunate that the results of the source paper have already been implemented in twitter’s released R package, BreakoutDetection. This package has been written by Nicholas A. James, a PhD candidate at Cornell University studying under Dr. David S. Matteson. His page is located here.

In short, all that needs to be done on this end is to apply the model to the aforementioned data.

Validate the Results

To validate the results—that is, to obtain the same results as one of the source paper’s authors, we will execute the code on the data that he posted on his blog post (see Reference 1).

install_github(repo="BreakoutDetection", username="twitter")

res = breakout(Scribe, min.size=24, method='multi', beta=.001, degree=1, plot=TRUE)

This is the resulting image, identical from the blog post.

Validation of the Hypothesis

This validation was inspired by the following post:

The Relevance of History

The post was written by Dr. Robert J. Frey, professor of Applied Math and Statistics at Stony Brook University, the head of its Quantitative Finance program, and former managing director at Renaissance Technologies (yes, the Renaissance Technologies founded by Dr. Jim Simons). While the blog is inactive at the moment, I sincerely hope it will become more active again.

Essentially, it uses mathematica to detect changes in the slope of cumulative squared returns, and the final result is a map of spikes, mountains, and plains, the x-axis being time, and the y-axis the annualized standard deviation. Using the more formalized e-divisive and e-divisive with medians algorithms, this analysis will attempt to detect change points, and use the PerformanceAnalytics library to compute the annualized standard deviation from the data of the GSPC returns itself, and output a similarly-formatted plot.

Here’s the code:


getSymbols("^GSPC", from = "1984-12-25", to = "2013-05-31")
monthlyEp <- endpoints(GSPC, on = "months")
GSPCmoCl <- Cl(GSPC)[monthlyEp,]
GSPCmoRets <- Return.calculate(GSPCmoCl)
GSPCsqRets <- GSPCmoRets*GSPCmoRets
GSPCsqRets <- GSPCsqRets[-1,] #remove first NA as a result of return computation
GSPCcumSqRets <- cumsum(GSPCsqRets)

This results in the following image:

So far, so good. Let’s now try to find the amount of changepoints that Dr. Frey’s graph alludes to.

t1 <- Sys.time()
ECPmonthRes <- e.divisive(X = GSPCsqRets, min.size = 2)
t2 <- Sys.time()
print(t2 - t1)

t1 <- Sys.time()
BDmonthRes <- breakout(Z = GSPCsqRets, min.size = 2, beta=0, degree=1)
t2 <- Sys.time()
print(t2 - t1)


With the following results:

> ECPmonthRes$estimates
[1]   1 285 293 342
> BDres$loc
[1] 47 87

In short, two changepoints for each. Far from the 20 or so regimes present in Dr. Frey’s analysis. So, not close to anything that was expected. My intuition tells me that the main reason for this is that these algorithms are data-hungry, and there is too little data for them to do much more than what they have done thus far. So let’s go the other way and use daily data.

dailySqRets <- Return.calculate(Cl(GSPC))*Return.calculate(Cl(GSPC))
dailySqRets <- dailySqRets["1985::"]


And here’s the new plot:

First, let’s try the e-divisive algorithm from the ecp package to find our changepoints, with a minimum size of 20 days between regimes. (Blog note: this is a process that takes an exceptionally long time. For me, it took more than 2 hours.)

t1 <- Sys.time()
ECPres <- e.divisive(X = dailySqRets, min.size=20)
t2 <- Sys.time()
print(t2 - t1)
Time difference of 2.214813 hours

With the following results:

 [1] "1985-01-02" "1987-10-14" "1987-11-11" "1998-07-21" "2002-07-01" "2003-07-28" "2008-09-15" "2008-12-09"
 [9] "2009-06-02" NA   

The first and last are merely the endpoints of the data. So essentially, it encapsulates Black Monday and the crisis, among other things. Let’s look at how the algorithm split the volatility regimes. For this, we will use the xtsExtra package for its plotting functionality (thanks to Ross Bennett for the work he did in implementing it).

xtsExtra::addLines(index(dailySqRets)[ECPres$estimates[-c(1, length(ECPres$estimates))]], on = 1, col = "blue", lwd = 2)

With the resulting plot:

In this case, the e-divisive algorithm from the ecp package does a pretty great job segmenting the various volatility regimes, as can be thought of roughly as the slope of the cumulative squared returns. The algorithm’s ability to accurately cluster the Black Monday events, along with the financial crisis, shows its industrial-strength applicability. How does this look on the price graph?

xtsExtra::addLines(index(dailySqRets)[ECPres$estimates[-c(1, length(ECPres$estimates))]], on = 1, col = "blue", lwd = 2)

In this case, Black Monday is clearly visible, along with the end of the Clinton bull run through the dot-com bust, the consolidation, the run-up to the crisis, the crisis itself, the consolidation, and the new bull market.

Note that the presence of a new volatility regime may not necessarily signify a market top or bottom, but the volatility regime detection seems to have worked very well in this case.

For comparison, let’s examine the e-divisive with medians algorithm.

t1 <- Sys.time()
BDres <- breakout(Z = dailySqRets, min.size = 20, beta=0, degree=1)
t2 <- Sys.time()


With the following result:

Time difference of 2.900167 secs
> BDres$loc
[1] 5978
> BDres$loc
[1] 5978
> index(dailySqRets)[BDres$loc]
[1] "2008-09-12"

So while the algorithm is a lot faster, its volatility regime detection, it only sees the crisis as the one major change point. Beyond that, to my understanding, the e-divisive with medians algorithm may be “too robust” (even without any penalization) against anomalies (after all, the median is robust to changes in 50% of the data). In short, I think that while it clearly has applications, such as twitter cloud production data, it doesn’t seem to obtain a result that’s in the ballpark of two other separate procedures.

Lastly, let’s try and create a plot similar to Dr. Frey’s, with spikes, mountains, and plains.

GSPCrets <- Return.calculate(Cl(GSPC))
GSPCrets <- GSPCrets["1985::"]
GSPCrets$regime <- ECPres$cluster
GSPCrets$annVol <- NA

for(i in unique(ECPres$cluster)) {
  regime <- GSPCrets[GSPCrets$regime==i,]
  annVol <- StdDev.annualized(regime[,1])
  GSPCrets$annVol[GSPCrets$regime==i,] <- annVol

plot(GSPCrets$annVol, ylim=c(0, max(GSPCrets$annVol)), main="GSPC volatility regimes, 1985 to 2013-05")

With the corresponding image, inspired by Dr. Robert Frey:

This concludes the research replication.


Whew. Done. While I gained some understanding of what change points are useful for, I won’t profess to be an expert on them (some of the math involved uses PhD-level mathematics such as characteristic functions that I never learned). However, it was definitely interesting pulling together several different ideas and uniting them under a rigorous process.

Special thanks for this blog post:

Brian Peterson, for the process paper and putting a formal structure to the research replication process (and requesting this post).
Robert J. Frey, for the “volatility landscape” idea that I could objectively point to as an objective benchmark to validate the hypothesis of the paper.
David S. Matteson, for the ecp package.
Nicholas A. James, for the work done in the BreakoutDetection package (and clarifying some of its functionality for me).
Arun Kejariwal, for the tutorial on using the BreakoutDetection package.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

An Update On EAA and a Volatility Strategy

Again, before starting this post, I’d like to inform readers that the book Quantitative Trading With R, written by Harry Georgakopoulos, with contributions from myself, is now available for order on Amazon. Already, it has garnered a pair of five-star reviews, and it deals not only with quantstrat, but with aspects such as spread trading, high frequency data, and options. I highly recommend it.

So, first things first, I want to inform everyone that EAA (that is, Elastic Asset Allocation, the new algorithm recently released by Dr. Wouter Keller a couple of weeks ago) is now in my IKTrading package. I made some modifications to deal with incongruous security starting dates (that is, handled NA momentum, and so on, similarly to the process in FAA). Again, no particular guarantees, but at this point, I think the algorithm won’t regularly break (but I may be missing some edge case, so feedback is always appreciated). Also, after thinking about it a bit more, I don’t foresee EAA as it stands being able to make use of a conditional correlation algorithm, since rather than using correlation simply for security selection, it uses correlations to make weighting decisions, which raises the question of what the correlation value of the first security would be. 0? -1? Ideas on how to address this are always welcome, since applying conditional correlation outside of a ranking context is a topic now of interest to me.

Furthermore, TrendXplorer has recently posted his own post on EAA after seeing mine on his blog. It is *very* comprehensive, and for those that are more inclined towards AmiBroker, you’ll be in Nirvana. It can be found here. Also, it seems he has done some work with another SeekingAlpha contributor named Cliff Smith (and seems to have worked hand in hand with him), and thus, had a far more positive experience than I did going solo replicating Harry Long’s strategies (or, if some of you may like, marketing materials). TrendXplorer did some work with a strategy called QTS, which I hope I’ll be able to cover in the near future. That can all be found here. So, I’d like to formally extend thanks to TrendXplorer for the work he has done with both EAA, and also pointing me towards yet another viable asset allocation strategy.

In terms of my own updated EAA, to test it out, I added Tesla Motors to the original seven securities. So let’s look at the as-of-now-current EAA.

"EAA" <- function(monthlyPrices, wR=1, wV=0, wC=.5, wS=2, errorJitter=1e-6, 
                cashAsset=NULL, bestN=1+ceiling(sqrt(ncol(monthlyPrices))),
                enableCrashProtection = TRUE, returnWeights=FALSE, monthlyRiskFree=NULL) {
  returns <- Return.calculate(monthlyPrices)
  returns <- returns[-1,] #return calculation uses one observation
  if(!is.null(monthlyRiskFree)) {
    returnsRF <- Return.calculate(monthlyRiskFree)
    returnsRF <- returnsRF[-1,]
  if(is.null(cashAsset)) {
    returns$zeroes <- 0
    cashAsset <- "zeroes"
    warning("No cash security specified. Recommended to use one of: quandClean('CHRIS/CME_US'), SHY, or VFISX. 
            Using vector of zeroes instead.")
  cashCol <- grep(cashAsset, colnames(returns))
  weights <- list()
  for(i in 1:(nrow(returns)-11)) {
    returnsData <- returns[i:(i+11),] #each chunk will be 12 months of returns data
    #per-month mean of cumulative returns of 1, 3, 6, and 12 month periods
    periodReturn <- ((returnsData[12,] + Return.cumulative(returnsData[10:12,]) + 
                      Return.cumulative(returnsData[7:12,]) + Return.cumulative(returnsData)))/22
    if(!is.null(monthlyRiskFree)) {
      rfData <- returnsRF[i:(i+11),]
      rfReturn <- ((rfData[12,] + Return.cumulative(rfData[10:12,]) + 
                    Return.cumulative(rfData[7:12,]) + Return.cumulative(rfData)))/22
      periodReturn <- periodReturn - as.numeric(rfReturn)
    vols <- StdDev.annualized(returnsData) 
    mktIndex <- xts(rowMeans(returnsData, na.rm=TRUE), #equal weight returns of universe
    cors <- cor(returnsData, mktIndex) #correlations to market index
    weightedRets <- periodReturn ^ wR
    weightedCors <- (1 - as.numeric(cors)) ^ wC
    weightedVols <- (vols + errorJitter) ^ wV
    wS <- wS + errorJitter
    z <- (weightedRets * weightedCors / weightedVols) ^ wS #compute z_i and zero out negative returns
    z[periodReturn < 0] <- 0
    crashProtection <- sum(z==0, na.rm=TRUE)/sum(! #compute crash protection cash cushion
    orderedZ <- sort(as.numeric(z), decreasing=TRUE)
    selectedSecurities <- z >= orderedZ[bestN]
    preNormalizedWeights <- z*selectedSecurities #select top N securities, keeping z_i scores
    periodWeights <- preNormalizedWeights/sum(preNormalizedWeights, na.rm=TRUE) #normalize
    if (enableCrashProtection) {
      periodWeights <- periodWeights * (1-crashProtection) #CP rule
    periodWeights[] <- 0
    weights[[i]] <- periodWeights
  weights <-, weights)
  weights[, cashCol] <- weights[, cashCol] + 1-rowSums(weights) #add to risk-free asset all non-invested weight
  strategyReturns <- Return.rebalancing(R = returns, weights = weights) #compute strategy returns
  if(returnWeights) {
    return(list(weights, strategyReturns))
  } else {

Essentially, little changed aside from some lines dealing with NAs (AKA securities that were not yet around at the time whose prices are given as NAs).

To test out whether the algorithm worked, I added TSLA to see if it didn’t break the code. Here is the new test code.


symbols <- c("VTSMX", "FDIVX", "VEIEX", "VBMFX", "VFISX", "VGSIX", "QRAAX", "TSLA")

getSymbols(symbols, from="1990-01-01")
prices <- list()
for(i in 1:length(symbols)) {
  prices[[i]] <- Ad(get(symbols[i]))  
prices <-, prices)
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))
ep <- endpoints(prices, "months")
prices <- prices[ep,]
prices <- prices["1997-03::"]

getSymbols("^IRX", from="1990-01-01")
dailyYield <- (1+(Cl(IRX)/100))^(1/252) - 1
threeMoPrice <- cumprod(1+dailyYield)
threeMoPrice <- threeMoPrice["1997-03::"]
threeMoPrice <- threeMoPrice[endpoints(threeMoPrice, "months"),]

offensive <- EAA(prices, cashAsset="VBMFX", bestN=3)
defensive <- EAA(prices, cashAsset="VBMFX", bestN=3, wS=.5, wC=1)
offRF <- EAA(prices, cashAsset="VBMFX", bestN=3, monthlyRiskFree = threeMoPrice)
defRF <- EAA(prices, cashAsset="VBMFX", bestN=3, wS=.5, wC=1, monthlyRiskFree = threeMoPrice)
compare <- cbind(offensive, defensive, offRF, defRF)
colnames(compare) <- c("Offensive", "Defensive", "OffRF", "DefRF")
stats <- rbind(Return.annualized(compare)*100, StdDev.annualized(compare)*100, maxDrawdown(compare)*100, SharpeRatio.annualized(compare))
rownames(stats)[3] <- "Worst Drawdown"

With the following statistics table and equity curve:

> stats
                                 Offensive Defensive      OffRF     DefRF
Annualized Return               17.6174693 13.805683 16.7376777 13.709368
Annualized Standard Deviation   22.7328695 13.765444 22.3854966 13.504313
Worst Drawdown                  25.3534015 12.135310 25.3559118 12.146654
Annualized Sharpe Ratio (Rf=0%)  0.7749778  1.002923  0.7477019  1.015184

Essentially, TSLA — a high momentum, high-volatility stock causes some consternation in the offensive variant of the algorithm. Let’s look at the weight statistics of TSLA when it was in the portfolio.

test <- EAA(prices, cashAsset = "VBMFX", bestN=3, returnWeights=TRUE)
weights <- test[[1]]
summary(weights$TSLA[weights$TSLA > 0])

With the results:

    Index                 TSLA        
 Min.   :2011-07-29   Min.   :0.01614  
 1st Qu.:2012-09-14   1st Qu.:0.32345  
 Median :2013-07-31   Median :0.48542  
 Mean   :2013-06-20   Mean   :0.51415  
 3rd Qu.:2014-04-15   3rd Qu.:0.75631  
 Max.   :2014-12-31   Max.   :0.95793  

Also, to be clear, R’s summary function was not created with xts type objects in mind, so the Index statistics are just pure nonsense (R is trying to do summary statistics on the underlying numerical values of the date index — they have no relation to the TSLA weights), so if you ever call summary on anything in an xts, be aware that it isn’t actually providing you the dates of the corresponding weights (if they exist at all — E.G. the mean of the weights isn’t an actual weight at any point in time).

In any case, it seems that the offensive variant of the algorithm is susceptible to creating portfolios that are very poorly diversified, since the offensive variant doesn’t place any weight on security volatility–simply correlation. So if there was a very volatile instrument that was on a roaring trend, EAA would tell you to just place your entire portfolio in that one instrument–which of course, can be the correct thing to do if you know for certain that said trend will continue, until, of course, it doesn’t.

I’m sure there are still some methods to account for instruments of wildly different risk/return profiles, even without the need of additional code, by varying the parameters. I just wanted to demonstrate the need to be aware of this phenomenon, which I happened upon simply by testing the portfolio for incongruous starting dates and just so happened to pick a “hot topic” stock.

Last (for this post), I’d like to make readers aware that the blogger Volatility Made Simple has created a variant of a strategy I had written about earlier (again, thanks to Mr. Helmuth Vollmeier for providing the initial foundation), in which he mixed signals from the three variants I had found to be in stable regions, and I’m really happy he has done so, as he’s one of the first people who have explicitly extended my work.

Unfortunately, said strategy is currently in drawdown. However, looking at its drawdown curve against that of XIV itself, it seems that volatility has been doing crazy things lately, and the drawdown has been worse in the past. I am concerned, however, that it may be a strategy prone to overfitting, and it’s a constant reminder that there is still more to learn, and more techniques to use to convince oneself that a backtest isn’t just an overfit, data-mined, sample-dependent illusion with good marketing that will break down immediately upon looking at a larger sample. However, as I did not originate the strategy myself, I’d at least like to hope that whoever was the first person who came up with the VXV/VXMT ratio idea had some good rationale for the strategy to begin with.

In the immediate future, I’ll be looking into change point analysis and twitter’s new breakout detection package.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

An Update to the Robustness Heuristic and a Variation of a Volatility Strategy

So, before revealing a slight wrinkle on the last strategy I wrote about, I’d like to clear up a bit of confusion regarding Jaekle and Tomasini’s idea of a stable region.

Essentially, the entire idea *is* that similar parameter configurations behave in very similar ways, and so, are supposed to be highly correlated. It does not mean the strategy may not be overfit in other ways, but that incremental changes to a parameter should mean incremental changes to performance, rather than seeing some sort of lucky spike in a sea of poor performance.

In any case, the one change to the strategy from last week is that rather than get in at the current close (aka observe close, execute at close), to get in at the next day’s close.

Again, here’s the strategy script:



vxv <- xts(read.zoo("vxvData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
vxmt <- xts(read.zoo("vxmtData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
ratio <- Cl(vxv)/Cl(vxmt)


         destfile="longVXX.txt") #requires downloader package

xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

xiv <- merge(xiv, ratio, join='inner')
vxx <- merge(vxx, ratio, join='inner')
colnames(xiv)[5] <- colnames(vxx)[5] <- "ratio"

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

retsList <- list()
count <- 1
for(i in 10:200) {
  ratioSMA <- SMA(ratio, n=i)
  vxxSig <- lag(ratio > 1 & ratio > ratioSMA, 2)
  xivSig <- lag(ratio < 1 & ratio < ratioSMA, 2)
  rets <- vxxSig*vxxRets + xivSig*xivRets
  colnames(rets) <- i
  retsList[[i]]  <- rets
  count <- count+1  
retsList <-, retsList)
colnames(retsList) <- gsub("X", "", colnames(retsList))
retsList <- retsList[![,191]),]
retsList <- retsList[-1,]

The one change I made is that rather than go with the default lag value, I went with a lag of 2. A lag of zero induces look-ahead bias. In any case, let’s run through the process again of analyzing for robustness.

rankComparison <- function(rets, perfAfun="Return.cumulative") {
  fun <-
  monthlyFun <- apply.monthly(rets, fun)
  monthlyRank <- t(apply(monthlyFun, MARGIN=1, FUN=rank))
  meanMonthlyRank <- apply(monthlyRank, MARGIN=2, FUN=mean)
  rankMMR <- rank(meanMonthlyRank)
  aggFun <- fun(rets)
  aggFunRank <- rank(aggFun)
  bothRanks <- data.frame(cbind(aggFunRank, rankMMR, names(rankMMR)), stringsAsFactors=FALSE)
  names(bothRanks) <- c("aggregateRank", "averageMonthlyRank", "configName")
  bothRanks$aggregateRank <- as.numeric(bothRanks$aggregateRank)
  bothRanks$averageMonthlyRank <- as.numeric(bothRanks$averageMonthlyRank)
  bothRanks$sum <- bothRanks[,1] + bothRanks[,2]
  bothRanks <- bothRanks[order(bothRanks$sum, decreasing=TRUE),]
  plot(aggFunRank~rankMMR, main=perfAfun)
  print(cor(aggFunRank, meanMonthlyRank))

retRank <- rankComparison(retsList)
sharpeRank <- rankComparison(retsList, perfAfun="SharpeRatio.annualized")

In this case, I added some functionality to not only do the plotting and correlation, but to spit out a table comparing both the aggregate metric along with the rank of the average monthly rank (again, dual ranking layer), and ordered the table by the sum of both the aggregate and the monthly metric, starting with the highest.

For instance, here’s the output from the returns comparison:

> retRank <- rankComparison(retsList)
[1] 0.736377

> head(retRank, 20)
    aggregateRank averageMonthlyRank configName sum
62            190                191         62 381
63            189                187         63 376
60            185                189         60 374
66            191                182         66 373
65            187                183         65 370
59            184                185         59 369
56            181                186         56 367
64            188                179         64 367
152           174                190        152 364
61            183                178         61 361
67            186                167         67 353
151           165                188        151 353
57            179                173         57 352
153           167                184        153 351
58            182                164         58 346
154           170                175        154 345
53            164                180         53 344
155           166                176        155 342
158           163                177        158 340
150           157                181        150 338

So, for this configuration, the correlation went down from above .8 to around .74…which is still strong and credence that the strategy configurations have validity outside some lucky months. The new feature I added was the data frame of the two ranks side by side, along with their configuration name (in this case, my names were simply the SMA parameter, but the names could be anything such as say, SMA_60_lag_2), and the sum of the two rankings, which orders the configurations. As there were 191 configurations (SMA ranging from 10 to 200), the best score that could be achieved was 382. Furthermore, note that although there seems to be a strong region from SMA 53 to SMA 67, there also seems to be another region, at least when it comes to absolute return, of an SMA parameter at SMA 150+.

Here’s the same table for annualized Sharpe (this variation takes a bit longer to compute due to the monthly annualized Sharpes).

> sharpeRank <- rankComparison(retsList, perfAfun="SharpeRatio.annualized")
[1] 0.5590881
> head(sharpeRank, 20)
    aggregateRank averageMonthlyRank configName   sum
62            190              191.0         62 381.0
59            185              190.0         59 375.0
61            183              186.5         61 369.5
60            186              181.0         60 367.0
63            189              175.0         63 364.0
66            191              164.0         66 355.0
152           166              173.0        152 339.0
58            182              155.0         58 337.0
56            181              151.0         56 332.0
53            174              153.0         53 327.0
57            179              148.0         57 327.0
151           159              162.0        151 321.0
76            177              143.0         76 320.0
150           152              163.0        150 315.0
54            173              140.0         54 313.0
77            178              131.0         77 309.0
65            187              119.0         65 306.0
143           146              156.0        143 302.0
74            167              132.0         74 299.0
153           161              138.0        153 299.0

So, largely the same sort of results as we see with the annualized returns. A correlation of .5 gives some cause for concern, which will hopefully show up in the line plot of the rank of the four metrics (returns, Sharpe, drawdowns, and return to drawdown), which will reveal the regions with strong performance, and not-so-strong performances.

Here’s the ranking line plot.

aggReturns <- Return.annualized(retsList)
aggSharpe <- SharpeRatio.annualized(retsList)
aggMAR <- Return.annualized(retsList)/maxDrawdown(retsList)
aggDD <- maxDrawdown(retsList)

plot(rank(aggReturns)~as.numeric(colnames(aggReturns)), type="l", ylab="annualized returns rank", xlab="SMA", 
     main="Risk and return rank comparison")
lines(rank(aggSharpe)~as.numeric(colnames(aggSharpe)), type="l", ylab="annualized Sharpe rank", xlab="SMA", col="blue")
lines(rank(aggMAR)~as.numeric(colnames(aggMAR)), type="l", ylab="Max return over max drawdown", xlab="SMA", col="red")
lines(rank(-aggDD)~as.numeric(colnames(aggDD)), type="l", ylab="max DD", xlab="SMA", col="green")
legend("bottomright", c("Return rank", "Sharpe rank", "MAR rank", "Drawdown rank"), pch=0, col=c("black", "blue", "red", "green"))

And the resulting plot:

There are several regions that show similar, strong metrics for similar parameter choices for the value of SMA when we use a “delayed” entry. Namely, the regions around the 60 day SMA, the 150 day SMA, and the 125 day SMA.

Let’s look at those configurations.

truncRets <- retsList[,c(51, 116, 141)]
stats <- data.frame(cbind(t(Return.annualized(truncRets)),
colnames(stats) <- c("A.Return", "A.Sharpe", "Worst_Drawdown")
stats$MAR <- stats[,1]/stats[,3]
stats <- round(stats, 3)

And the results:

> stats
    A.Return A.Sharpe Worst_Drawdown   MAR
60     1.103    2.490          0.330 3.342
125    0.988    2.220          0.368 2.683
150    0.983    2.189          0.404 2.435

And the resulting performance, on both a regular, and log scale:


logRets <- log(cumprod(1+truncRets))

Perfect strategies? There’s probably room for improvement. As good if not better than the volatility strategies posted elsewhere on the internet? Probably. Is there more investigation that can be done regarding the differences in signal delay? Yes.

So, in conclusion for this post, I’m hoping that the rank comparison heuristic and its new output gives people another tool to consider, along with another vol strategy to consider as well.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

A New Volatility Strategy, And A Heuristic For Analyzing Robustness

This post is motivated by a discussion that arose when I tested a strategy by Frank of Trading The Odds (post here). One point, brought up by Tony Cooper of Double Digit Numerics, the original author of the paper that Trading The Odds now trades (I consider it a huge honor that my blog is read by authors of original trading strategies), is that my heatmap analysis only looked cross-sectional performance, as opposed to performance over time–that is, performance that could have been outstanding over the course of the entire backtest could have been the result of a few lucky months. This is a fair point, which I hope this post will address in terms of a heuristic using both visual and analytical outputs.

The strategy for this post is the following, provided to me kindly by Mr. Helmuth Vollmeier (whose help in all my volatility-related investigations cannot be understated):

Consider VXV and VXMT, the three month and six month implied volatility on the usual SP500. Define contango as VXV/VXMT < 1, and backwardation vice versa. Additionally, take an SMA of said ratio. Go long VXX when the ratio is greater than 1 and above its SMA, and go long XIV when the converse holds. Or in my case, get in at the close when that happens and exit at the next day's close after the converse occurs (that is, my replication is slightly off due to using some rather simplistic coding for illustrative purposes).

In any case, here is the script for setting up the strategy, most of which is just downloading the data–the strategy itself is just a few lines of code:



vxv <- xts(read.zoo("vxvData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
vxmt <- xts(read.zoo("vxmtData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
ratio <- Cl(vxv)/Cl(vxmt)


         destfile="longVXX.txt") #requires downloader package

xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

xiv <- merge(xiv, ratio, join="inner")
vxx <- merge(vxx, ratio, join="inner")
colnames(xiv)[5] <- colnames(vxx)[5] <- "ratio"

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

retsList <- list()
count <- 1
for(i in 10:200) {
  ratioSMA <- SMA(ratio, n=i)
  vxxSig <- lag(ratio > 1 & ratio > ratioSMA)
  xivSig <- lag(ratio < 1 & ratio < ratioSMA)
  rets <- vxxSig*vxxRets + xivSig*xivRets
  colnames(rets) <- i
  retsList[[i]]  <- rets
  count <- count+1  
retsList <-, retsList)
colnames(retsList) <- gsub("X", "", colnames(retsList))
retsList <- retsList[![,191]),]
retsList <- retsList[-1,]

retsList <- retsList["::2014-11-28"] #for monthly aggregation, remove start of Dec 2014

About as straightforward as things get (the results, as we'll see, are solid, as well). In this case, I tested every SMA between a 10 day and a classic 200 day SMA. And since this strategy is a single-parameter strategy (unless you want to get into adjusting the ratio critical values up and down away from 1), instead of heatmaps, we'll suffice with basic scatter plots and line plots (which make things about as simple as they come).

The heuristic I decided upon was to take some PerfA functions (Return.annualized, SharpeRatio.annualized, for instance), and compare the rank of the average of their monthly ranks (that is, a two-layer rank, very similar to the process in Flexible Asset Allocation) to the aggregate, whole time-period rank. The idea here is that performance based on a few lucky months may have a high aggregate ranking, but a much lower monthly ranking, which would be reflected in a scatter plot. Ideally, the scatter plot would go from lower left to upper right in terms of ranks comparisons, with a correlation of 1, meaning that the strategy with the best overall return will have the best average monthly return rank, and so on down the list. This can also apply to the Sharpe ratio, and so on.

Here is my off-the-cuff implementation of such an idea:

rankComparison <- function(rets, perfAfun="Return.cumulative") {
  fun <-
  monthlyFun <- apply.monthly(rets, fun)
  monthlyRank <- t(apply(monthlyFun, MARGIN=1, FUN=rank))
  meanMonthlyRank <- apply(monthlyRank, MARGIN=2, FUN=mean)
  rankMMR <- rank(meanMonthlyRank)
  aggFun <- fun(rets)
  aggFunRank <- rank(aggFun)
  plot(aggFunRank~rankMMR, main=perfAfun)
  print(cor(aggFunRank, meanMonthlyRank))

So, I get a chart and a correlation of average monthly ranks against a single-pass whole-period rank. Here are the results for cumulative returns and Sharpe ratio:

> rankComparison(retsList)
[1] 0.8485374

Basically, the interpretation is this: the outliers above and to the left of the main cluster can be interpreted as those having those “few lucky months”, while those to the lower right consistently perform somewhat well, but for whatever reason, are just stricken with bad luck. However, the critical results that we’re looking for is that the best overall performers (the highest aggregate rank) are also those with the most *consistent* performance (the highest monthly rank), which is generally what we see.

Furthermore, the correlation of .85 also lends credence that this is a robust strategy.

Here’s the process repeated with the annualized Sharpe ratio:

> rankComparison(retsList, perfAfun="SharpeRatio.annualized")
[1] 0.8647353

In other words, an even clearer relationship here, and again, we see that the best performers overall are also the best monthly performers, so we can feel safe in the robustness of the strategy.

So what’s the punchline? Well, the idea is now that we’ve established that the best results on aggregate are also the strongest results when analyzing the results across time, let’s look to see if the various rankings of risk and reward metrics reveal which configurations those are.

Here’s a chart of the aggregate rankings of annualized return (aka cumulative return), annualized Sharpe, MAR (return over max drawdown), and max drawdown.

aggReturns <- Return.annualized(retsList)
aggSharpe <- SharpeRatio.annualized(retsList)
aggMAR <- Return.annualized(retsList)/maxDrawdown(retsList)
aggDD <- maxDrawdown(retsList)

plot(rank(aggReturns)~as.numeric(colnames(aggReturns)), type="l", ylab="annualized returns rank", xlab="SMA",
     main="Risk and return rank comparison")
lines(rank(aggSharpe)~as.numeric(colnames(aggSharpe)), type="l", ylab="annualized Sharpe rank", xlab="SMA", col="blue")
lines(rank(aggMAR)~as.numeric(colnames(aggMAR)), type="l", ylab="Max return over max drawdown", xlab="SMA", col="red")
lines(rank(-aggDD)~as.numeric(colnames(aggDD)), type="l", ylab="max DD", xlab="SMA", col="green")
legend("bottomright", c("Return rank", "Sharpe rank", "MAR rank", "Drawdown rank"), pch=0, col=c("black", "blue", "red", "green"))

And the resulting plot itself:

So, looking at these results, here are some interpretations, moving from left to right:

At the lower end of the SMA, the results are just plain terrible. Sure, the drawdowns are lower, but the returns are in the basement.
The spike around the 50-day SMA makes me question if there is some sort of behavioral bias at work here.
Next, there’s a region with fairly solid performance between that and the 100-day SMA, but is surrounded on both sides by pretty abysmal performance.
Moving onto the 100-day SMA region, the annualized returns and Sharpe ratios are strong, but get the parameter estimation incorrect going forward, and there’s a severe risk of incurring heavy drawdowns. The jump improvement in the drawdown metric is also interesting. Again, is there some sort of bias towards some of the round numbers? (50, 100, etc.)
Lastly, there’s nothing particularly spectacular about the performances until we get to the high 100s and the 200 day SMA, at which point, we see a stable region of configurations with high ranks in all categories.

Let’s look at that region more closely:

truncRets <- retsList[,161:191]
stats <- data.frame(cbind(t(Return.annualized(truncRets)),
colnames(stats) <- c("A.Return", "A.Sharpe", "Worst_Drawdown")
stats$MAR <- stats[,1]/stats[,3]
stats <- round(stats, 3)

And the results:

> stats
    A.Return A.Sharpe Worst_Drawdown   MAR
170    0.729    1.562          0.427 1.709
171    0.723    1.547          0.427 1.693
172    0.723    1.548          0.427 1.694
173    0.709    1.518          0.427 1.661
174    0.711    1.522          0.427 1.665
175    0.711    1.522          0.427 1.665
176    0.711    1.522          0.427 1.665
177    0.711    1.522          0.427 1.665
178    0.696    1.481          0.427 1.631
179    0.667    1.418          0.427 1.563
180    0.677    1.441          0.427 1.586
181    0.677    1.441          0.427 1.586
182    0.677    1.441          0.427 1.586
183    0.675    1.437          0.427 1.582
184    0.738    1.591          0.427 1.729
185    0.760    1.637          0.403 1.886
186    0.794    1.714          0.403 1.970
187    0.798    1.721          0.403 1.978
188    0.802    1.731          0.403 1.990
189    0.823    1.775          0.403 2.042
190    0.823    1.774          0.403 2.041
191    0.823    1.774          0.403 2.041
192    0.819    1.765          0.403 2.031
193    0.822    1.772          0.403 2.040
194    0.832    1.792          0.403 2.063
195    0.832    1.792          0.403 2.063
196    0.802    1.723          0.403 1.989
197    0.810    1.741          0.403 2.009
198    0.782    1.677          0.403 1.941
199    0.781    1.673          0.403 1.937
200    0.779    1.670          0.403 1.934

So starting from SMA 186 through SMA 200, we see some fairly strong performance–returns in the high 70s to the low 80s, and MARs in the high 1s to low 2s. And of course, since this is about a trading strategy, equity curves are of course, obligatory. Here is what that looks like:

strongRets <- retsList[,177:191]

Basically, on aggregate, some very strong performance. However, it is certainly not *smooth* performance. New equity highs are followed by strong drawdowns, which are then followed by a recovery and new, higher equity highs.

To conclude (for the moment, I’ll have a new post on this next week with a slight wrinkle that gets even better results), I hope that I presented not only a simple but effective strategy, but also a simple but effective (if a bit time consuming, to do all the monthly computations on 191 return streams) heuristic suggested/implied by Tony Cooper of double digit numerics for analyzing the performance and robustness of your trading strategies. Certainly, while many professors and theorists elucidate on robustness (with plenty of math that makes stiff bagels look digestible), I believe not a lot of attention is actually paid to it in more common circles, using more intuitive methods. After all, if someone would want to be an unscrupulous individual selling trading systems or signals (instead of worrying about the strategy’s capacity for capital), it’s easy to show an overfit equity curve while making up some excuse so as to not reveal the (most likely overfit) strategy. One thing I’d hope this post inspires is for individuals to ask not only look at equity curves, but also plots such as aggregate against average monthly (or higher frequencies, if the strategies are tested over mere months, for instance, such as intraday trading) metric rankings when performing parameter optimization.

Is this heuristic the most robust and advanced that can be done? Probably not. Would one need to employ even more advanced techniques if computing time becomes an issue? Probably (bootstrapping and sampling come to mind). Can this be built on? Of course. *Will* someone build on it? I certainly plan on revisiting this topic in the future.

Lastly, on the nature of the strategy itself: while Trading The Odds presented a strategy functioning on a very short time frame, I’m surprised that instead, we have a strategy whose parameters are on a much higher end of the numerical spectrum.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

Trading The Odds Volatility Risk Premium: Addressing Data Mining and Curve-Fitting

Several readers, upon seeing the risk and return ratio along with other statistics in the previous post stated that the result may have been the result of data mining/over-optimization/curve-fitting/overfitting, or otherwise bad practice of creating an amazing equity curve whose performance will decay out of sample.

Fortunately, there’s a way to test that assertion. In their book “Trading Systems: A New Approach to System Development and Portfolio Optimization”, Urban Jaekle and Emilio Tomasini use the concept of the “stable region” to demonstrate a way of visualizing whether or not a parameter specification is indeed overfit. The idea of a stable region is that going forward, how robust is a parameter specification to slight changes? If the system just happened to find one good small point in a sea of losers, the strategy is likely to fail going forward. However, if small changes in the parameter specifications still result in profitable configurations, then the chosen parameter set is a valid configuration.

As Frank’s trading strategy only has two parameters (standard deviation computation period, aka runSD for the R function, and the SMA period), rather than make line graphs, I decided to do a brute force grid search just to see other configurations, and plotted the results in the form of heatmaps.

Here’s the modified script for the computations (no parallel syntax in use for the sake of simplicity):


         destfile="longVXX.txt") #requires downloader package

xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxmt <- xts(read.zoo("vxmtdailyprices.csv", format="%m/%d/%Y", sep=",", header=TRUE))

getSymbols("^VIX", from="2004-03-29")

vixvxmt <- merge(Cl(VIX), Cl(vxmt))
vixvxmt[[,2]),2] <- vixvxmt[[,2]),1]

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

getSymbols("^GSPC", from="1990-01-01")
spyRets <- diff(log(Cl(GSPC)))

t1 <- Sys.time()
MARmatrix <- list()
SharpeMatrix <- list()
for(i in 2:21) {
  smaMAR <- list()
  smaSharpe <- list()
  for(j in 2:21){
    spyVol <- runSD(spyRets, n=i)
    annSpyVol <- spyVol*100*sqrt(252)
    vols <- merge(vixvxmt[,2], annSpyVol, join='inner')
    vols$smaDiff <- SMA(vols[,1] - vols[,2], n=j)
    vols$signal <- vols$smaDiff > 0
    vols$signal <- lag(vols$signal, k = 1)
    stratRets <- vols$signal*xivRets + (1-vols$signal)*vxxRets
    #stratRets[] <- 0
    stats <- data.frame(cbind(Return.annualized(stratRets)*100, 
    colnames(stats) <- c("Annualized Return", "Max Drawdown", "Annualized Sharpe")
    MAR <- as.numeric(stats[1])/as.numeric(stats[2])    
    smaMAR[[j-1]] <- MAR
    smaSharpe[[j-1]] <- stats[,3]
  smaMAR <-, smaMAR)
  smaSharpe <-, smaSharpe)
  MARmatrix[[i-1]] <- smaMAR
  SharpeMatrix[[i-1]] <- smaSharpe
t2 <- Sys.time()

Essentially, just wrap the previous script in a nested for loop over the two parameters.

I chose GGplot2 to plot the heatmaps for more control with coloring.

Here’s the heatmap for the MAR ratio (that is, returns over max drawdown):

MARmatrix <-, MARmatrix)
rownames(MARmatrix) <- paste0("SMA", c(2:21))
colnames(MARmatrix) <- paste0("runSD", c(2:21))
MARlong <- melt(MARmatrix)
colnames(MARlong) <- c("SMA", "runSD", "MAR")
MARlong$SMA <- as.numeric(gsub("SMA", "", MARlong$SMA))
MARlong$runSD <- as.numeric(gsub("runSD", "", MARlong$runSD))
MARlong$scaleMAR <- scale(MARlong$MAR)
ggplot(MARlong, aes(x=SMA, y=runSD, fill=scaleMAR))+geom_tile()+scale_fill_gradient2(high="skyblue", mid="blue", low="red")

Here’s the result:

Immediately, we start to see some answers to questions regarding overfitting. First off, is the parameter set published by TradingTheOdds optimized? Yes. In fact, not only is it optimized, it’s by far and away the best value on the heatmap. However, when discussing overfitting, curve-fitting, and the like, the question to ask isn’t “is this the best parameter set available”, but rather “is the parameter set in a stable region?” The answer, in my opinion to that, is yes, as noted by the differing values of the SMA for the 2-day sample standard deviation. Note that this quantity, due to being the sample standard deviation, is actually the square root of the two squared residuals of that time period.

Here are the MAR values for those configurations:

> MARmatrix[1:10,1]
    SMA2     SMA3     SMA4     SMA5     SMA6     SMA7     SMA8     SMA9    SMA10    SMA11 
2.471094 2.418934 2.067463 3.027450 2.596087 2.209904 2.466055 1.394324 1.860967 1.650588 

In this case, not only is the region stable, but the MAR values are all above 2 until the SMA9 value.

Furthermore, note that aside from the stable region of the 2-day sample standard deviation, a stable region using a standard deviation of ten days with less smoothing from the SMA (because there’s already an average inherent in the sample standard deviation) also exists. Let’s examine those values.

> MARmatrix[2:5, 9:16]
      runSD10  runSD11  runSD12  runSD13  runSD14  runSD15  runSD16   runSD17
SMA3 1.997457 2.035746 1.807391 1.713263 1.803983 1.994437 1.695406 1.0685859
SMA4 2.167992 2.034468 1.692622 1.778265 1.828703 1.752648 1.558279 1.1782665
SMA5 1.504217 1.757291 1.742978 1.963649 1.923729 1.662687 1.248936 1.0837615
SMA6 1.695616 1.978413 2.004710 1.891676 1.497672 1.471754 1.194853 0.9326545

Apparently, a standard deviation between 2 and 3 weeks with minimal SMA smoothing also produced some results comparable to the 2-day variant.

Off to the northeast of the plot, using longer periods for the parameters simply causes the risk-to-reward performance to drop steeply. This is essentially an illustration of the detriments of lag.

Finally, there’s a small rough patch between the two aforementioned stable regions. Here’s the data for that.

> MARmatrix[1:5, 4:8]
       runSD5    runSD6    runSD7   runSD8   runSD9
SMA2 1.928716 1.5825265 1.6624751 1.033216 1.245461
SMA3 1.528882 1.5257165 1.2348663 1.364103 1.510653
SMA4 1.419722 0.9497827 0.8491229 1.227064 1.396193
SMA5 1.023895 1.0630939 1.3632697 1.547222 1.465033
SMA6 1.128575 1.3793244 1.4085513 1.440324 1.964293

As you can see, there are some patches where the MAR is below 1, and many where it’s below 1.5. All of these are pretty detached from the stable regions.

Let’s repeat this process with the Sharpe Ratio heatmap.

SharpeMatrix <-, SharpeMatrix)
rownames(SharpeMatrix) <- paste0("SMA", c(2:21))
colnames(SharpeMatrix) <- paste0("runSD", c(2:21))
sharpeLong <- melt(SharpeMatrix)
colnames(sharpeLong) <- c("SMA", "runSD", "Sharpe")
sharpeLong$SMA <- as.numeric(gsub("SMA", "", sharpeLong$SMA))
sharpeLong$runSD <- as.numeric(gsub("runSD", "", sharpeLong$runSD))
ggplot(sharpeLong, aes(x=SMA, y=runSD, fill=Sharpe))+geom_tile()+
  scale_fill_gradient2(high="skyblue", mid="blue", low="darkred", midpoint=1.5)

And the result:

Again, the TradingTheOdds parameter configuration lights up, but among a region of strong configurations. This time, we can see that in comparison to the rest of the heatmap, the northern stable region seems to have become clustered around the 10-day standard deviation (or 11) with SMAs of 2, 3, and 4. The regions to the northeast are also more subdued by comparison, with the Sharpe ratio bottoming out around 1.

Let’s look at the numerical values again for the same regions.

Two-day standard deviation region:

> SharpeMatrix[1:10,1]
    SMA2     SMA3     SMA4     SMA5     SMA6     SMA7     SMA8     SMA9    SMA10    SMA11 
1.972256 2.210515 2.243040 2.496178 1.975748 1.965730 1.967022 1.510652 1.963970 1.778401 

Again, numbers the likes of which I myself haven’t been able to achieve with more conventional strategies, and numbers the likes of which I haven’t really seen anywhere for anything on daily data. So either the strategy is fantastic, or something is terribly wrong outside the scope of the parameter optimization.

Two week standard deviation region:

> SharpeMatrix[1:5, 9:16]
      runSD10  runSD11  runSD12  runSD13  runSD14  runSD15  runSD16  runSD17
SMA2 1.902430 1.934403 1.687430 1.725751 1.524354 1.683608 1.719378 1.506361
SMA3 1.749710 1.758602 1.560260 1.580278 1.609211 1.722226 1.535830 1.271252
SMA4 1.915628 1.757037 1.560983 1.585787 1.630961 1.512211 1.433255 1.331697
SMA5 1.684540 1.620641 1.607461 1.752090 1.660533 1.500787 1.359043 1.276761
SMA6 1.735760 1.765137 1.788670 1.687369 1.507831 1.481652 1.318751 1.197707

Again, pretty outstanding numbers.

The rough patch:

> SharpeMatrix[1:5, 4:8]
       runSD5   runSD6   runSD7   runSD8   runSD9
SMA2 1.905192 1.650921 1.667556 1.388061 1.454764
SMA3 1.495310 1.399240 1.378993 1.527004 1.661142
SMA4 1.591010 1.109749 1.041914 1.411985 1.538603
SMA5 1.288419 1.277330 1.555817 1.753903 1.685827
SMA6 1.278301 1.390989 1.569666 1.650900 1.777006

All Sharpe ratios higher than 1, though some below 1.5

So, to conclude this post:

Was the replication using optimized parameters? Yes. However, those optimized parameters were found within a stable (and even strong) region. Furthermore, it isn’t as though the strategy exhibits poor risk-to-return metrics beyond those regions, either. Aside from raising the lookback period on both the moving average and the standard deviation to levels that no longer resemble the original replication, performance was solid to stellar.

Does this necessarily mean that there is nothing wrong with the strategy? No. It could be that the performance is an artifact of “observe the close, enter at the close” optimistic execution assumptions. For instance, quantstrat (the go-to backtest engine in R for more trading-oriented statistics) uses a next-bar execution method that defaults on the *next* day’s close (so if you look back over my quantstrat posts, I use prefer=”open” so as to get the open of the next bar, instead of its close). It could also be that VXMT itself is an instrument that isn’t very well known in the public sphere, either, seeing as how Yahoo finance barely has any data on it. Lastly, it could simply be the fact that although the risk to reward ratios seem amazing, many investors/mutual fund managers/etc. probably don’t want to think “I’m down 40-60% from my peak”, even though it’s arguably easier to adjust a strategy with a good reward to risk ratio with excess risk by adding cash (to use a cooking analogy, think about your favorite spice. Good in small quantities.), than it is to go and find leverage for a good reward to risk strategy with very small returns (not to mention incurring all the other risks that come with leverage to begin with, such as a 50% drawdown wiping out an account leveraged two to one).

However, to address the question of overfitting, through a modified technique from Jaekle and Tomasini (2009), these are the results I found.

Thanks for reading.

Note: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

Volatility Risk Premium: Sharpe 2+, Return to Drawdown 3+

First, before starting this post, I’d like to give one last comment about my previous post:

I called Vanguard to inquire about the trading policies on VWEHX and VFISX, and there are two-month cooldown periods (aka frequent-trading policies) on those mutual funds. However, the HYG ETF does indeed pay dividends, so the adjusted ETF variant is most likely the closest performance an investor can expect. Still, a Sharpe ratio higher than 1.25 is nothing to scoff at. Of course, no transaction costs are assumed on any of my strategies, so make sure your broker isn’t ripping you off if you actually intend on seriously investing in anything I publish on this blog (I hear interactive brokers has $1 per transaction), and once again, remember that none of this constitutes official advice.

Now, onto this post:

Judging by the attention some of my previous volatility posts have garnered through my replication of SeekingAlpha strategies, today, I am going to share a strategy whose statistics boggle my mind.

The strategy was presented by TradingTheOdds in this post. I had to replicate it for myself to be sure it worked as advertised, but unless I have something horrendously incorrect, this strategy works…quite well. Here’s the strategy:

Using the actual S&P 500 index, compute the two-day annualized historical volatility. Subtract that from the VXMT, which is the six-month expected volatility of the S&P 500 (prior to 2008, use the actual VIX). Then, take the 5-day SMA of that difference. If this number is above 0, go long XIV, otherwise go long VXX. In my replication, the strategy uses market-on-close orders (AKA observe “near” the close, buy at the close), so the strategy should be taken with a little bit of a grain of salt. Here’s the code:


         destfile="longVXX.txt") #requires downloader package

xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxmt <- xts(read.zoo("vxmtdailyprices.csv", format="%m/%d/%Y", sep=",", header=TRUE))

getSymbols("^VIX", from="2004-03-29")

vixvxmt <- merge(Cl(VIX), Cl(vxmt))
vixvxmt[[,2]),2] <- vixvxmt[[,2]),1]

getSymbols("^GSPC", from="1990-01-01")
spyRets <- diff(log(Cl(GSPC)))

spyVol <- runSD(spyRets, n=2)
annSpyVol <- spyVol*100*sqrt(252)

vols <- merge(vixvxmt[,2], annSpyVol, join='inner')
vols$smaDiff <- SMA(vols[,1] - vols[,2], n=5)
vols$signal <- vols$smaDiff > 0
vols$signal <- lag(vols$signal, k = 1)

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))
stratRets <- vols$signal*xivRets + (1-vols$signal)*vxxRets

The VXMT data is taken from the link I showed earlier. So, as I interpret it, to me, this strategy seems to be stating this:

Since we are subtracting the long-term expected volatility (VXMT) from the near-term historical volatility, which I suppose is meant to be a proxy for the “forecast of current volatility”, the implied hypothesis seems to be that volatility is being overestimated by the VXMT, so we should go short volatility. Conversely, if the near-term historical volatility is higher than the expected volatility, it means we should be long volatility instead. So, here’s the punchline that is the equity curve:


Yes, you’re looking at that correctly–over approximately ten years (slightly longer), this strategy had a cumulative return of about $10,000 for every $1 invested. Just to put this into perspective, here’s a log-scale equity curve.

There are a few noticeable dips which correspond to around 40% drawdowns on the regular scale. Now, let’s look at the usual statistics:

stats <- data.frame(cbind(Return.annualized(stratRets)*100, 

colnames(stats) <- c("Annualized Return", "Max Drawdown", "Annualized Sharpe")
stats$MAR <- as.numeric(stats[1])/as.numeric(stats[2])

With the following result:

                  Annualized Return Max Drawdown Annualized Sharpe      MAR
Annualized Return          137.7875     45.64011          2.491509 3.019001

Risky, as judging from maximum drawdown alone? Yes. But is there risk for the reward? Absolutely.

To put a lower bound on the performance of the strategy, here are the same diagrams and statistics with the signal lagged one more day (that is, since the returns use close-to-close data and work off of the closing prices, the signal is lagged by a day to avoid lookahead bias. Lagging the signal by one more day would mean receiving the signal at the close of day t, but only entering on the close of day t+1).

In effect, this is the coding change:

vols$signal <- lag(vols$signal, k = 1)


vols$signal <- lag(vols$signal, k = 2)

Here are the results:

Equity curve/drawdowns:

So, still impressive on the returns, but now there are some very pronounced drawdowns–slightly larger, but more concerning, that they’re longer.

Log equity curve:

Again, slightly more pronounced dips.

Here are the statistics:

                  Annualized Return Max Drawdown Annualized Sharpe      MAR
Annualized Return          84.36546     56.77219          1.521165 1.486035

So still quite respectable for a strategy this understandable.

I’m fairly certain that there’s still room for improvement on this strategy, considering that anywhere there’s a 5-day SMA, there’s often room for better trend-following indicators, and possibly a better indication for the volatility of SPY than the 2-day rolling annualized metric. But as it stands, I think based on its risk/reward characteristics, this strategy has a place as an aggressive, returns-generating component of a portfolio of strategies.

Thanks for reading.

Note: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

Seeking Volatility and Leverage

So Harry Long recently posted several articles, a couple of them all that have variations on a theme of a combination of leveraging SPY (aka SPXL), leveraging TLT (aka TMF), and some small exposure to the insanely volatile volatility indices (VXX, TVIX, ZIV, etc.), which can have absolutely insane drawdowns. Again, before anything else, a special thanks to Mr. Helmuth Vollmeier for his generosity in providing long-dated VXX and ZIV data, both of which will be leveraged for this post (in more ways than one).

In any case, here is the link to the two articles:

A Weird All-Long Strategy That Beats the S&P 500 Every Year
A Refined All-Long Strategy III

As usual, the challenge is that the exact ETFs in question didn’t exist prior to the financial crisis, giving a very handy justification as to why not to show the downsides of the strategy/strategies. From a conceptual standpoint, it’s quite trivial to realize that upon reading the articles, that when a large chunk of the portfolio consists of a leveraged SPY exposure, one is obviously going to look like a genius outperforming the SPY itself in a bull run. The question, obviously, is what happens when the market doesn’t support the strategy. If offered a 50% coin flip, with the outcome of heads winning a million dollars and being told nothing else, the obvious question to ask would be: “and what happens on tails?”

This post aims to address this for three separate configurations of the strategy.

First off, in order to create a believable backtest, the goal is to first create substitutes to the short-dated newfangled ETFs (SPXL and TMF), which will be done very simply: leverage the adjusted returns of SPY and TLT, respectively (I had to use adjusted due to the split in SPXL–normally I don’t like using adjusted data for anything, but splits sort of necessitate this evil).

Here we go:



getSymbols("SPXL", from="1990-01-01")
spxlRets <- Return.calculate(Ad(SPXL)) #have to use adjusted due to split 
getSymbols("SPY", from="1990-01-01")
SPYrets <- Return.calculate(Ad(SPY))
spxl3SPY <- merge(spxlRets, 3*SPYrets, join='inner')

So, the adjusted data in use for this simulation will slightly overshoot in regards to the absolute returns. That stated, it isn’t so much the returns we care about in this post (we know they’re terrific when times are good), but the drawdowns. The drawdowns are basically on top of one another, which is good.

Let’s repeat this with TMF and TLT:

getSymbols("TMF", from="1990-01-01")
TMFrets <- Return.calculate(Ad(TMF))
getSymbols("TLT", from="1990-01-01")
TLTrets <- Return.calculate(Ad(TLT))
tmf3TLT <- merge(TMFrets, 3*TLTrets, join='inner')

The result:

> Return.annualized(tmf3TLT[,2]-tmf3TLT[,1])
Annualized Return   0.03123479

A bit more irritating, as there’s clearly a bit of discrepancy to the tune of approximately 3.1% a year in terms of annualized returns in favor of the leveraged TLT vs. the actual TMF (so if you can borrow for less than 3% a year, this may be a good strategy for you–though I’m completely in the dark about why this sort of mechanic exists–is it impossible to actually short TMF, or buy TLT on margin? If someone is more intimately familiar with this trade, let me know), so, I’m going to make like an engineer and apply a little patch to remove the bias–subtract the daily returns of the discrepancy from the leveraged adjusted TLT.

discrepancy <- as.numeric(Return.annualized(tmf3TLT[,2]-tmf3TLT[,1]))
tmf3TLT[,2] <- tmf3TLT[,2] - ((1+discrepancy)^(1/252)-1)


Much better. Let’s save those modified TLT returns (our synthetic TMF):

modifiedTLT <- 3*TLTrets - ((1+discrepancy)^(1/252)-1)

With VXX, luckily, we simply need to compare Mr. Vollmeier’s data to Yahoo’s return data so that we can verify if two separate return streams check out.

#get long VXX -- thank you so much, Mr. Helmuth Vollmeier
         destfile="longVXX.txt") #requires downloader package
VXXlong <- read.csv("longVXX.txt", stringsAsFactors=FALSE)
VXXlong <- xts(VXXlong[,2:5],$Date))
VXXrets <- Return.calculate(Cl(VXXlong)) #long data only has close

getSymbols("VXX", from="1990-01-01")
vxxYhooRets <- Return.calculate(Ad(VXX))
vxx2source <- merge(VXXrets, vxxYhooRets, join='inner')
charts.PerformanceSummary(vxx2source) #identical

And the result:

No discrepancies here whatsoever. So once again, I am very fortunate to have experienced readers commenting on this blog.

So, with this in mind, let’s attempt to recreate the equity curve of the first strategy, which consists of 50% SPXL, 45% TMF, and 5% VXX.

rSPY_TLT_VXX <- cbind(3*SPYrets, modifiedTLT, VXXrets)
rSPY_TLT_VXX <- rSPY_TLT_VXX[![,3]),]
colnames(rSPY_TLT_VXX) <- c("SPY", "TLT", "VXX")

strat <- Return.rebalancing(R = rSPY_TLT_VXX, weights = c(.5, .45, .05), 
                            rebalance_on = "years", geometric=TRUE)
stratAndSPY <- merge(strat, SPYrets, join='inner')

About there.

One other note, on a purely mechanical issue: when using the

geometric = TRUE

argument with R, when creating synthetic leverage, you cannot create it in the actual


argument, or it will leverage your capital at every rebalancing period, giving you obviously incorrect results. Furthermore, these results were achieved using geometric = TRUE in two places: one in the Return.rebalancing argument (which implies reinvesting the capital), and then once again when calling the PerformanceAnalytics functions. Essentially, the implication of this is reinvesting all gains at the rebalancing period, and not touching any position no matter what. Used inappropriately, this will create results that border on the optimistic.

Now that we’ve replicated the general shape and pattern of the original equity curve, let’s look at this strategy on a whole.


If you just look at the top chart, it looks pretty amazing, doesn’t it? Now look at the bottom chart. Not only is there a massive drawdown, but there’s a massive spike up, and then *another* massive, larger drawdown. Imagine what would have happened to someone who didn’t follow this strategy to the letter. Get out at the very worst moment, get back in after a run-up, and then get hit *again*.

Here are the usual statistics I use:

> Return.annualized(stratAndSPY)
                  portfolio.returns SPY.Adjusted
Annualized Return         0.2305339   0.07937642
> maxDrawdown(stratAndSPY)
               portfolio.returns SPY.Adjusted
Worst Drawdown         0.4901882    0.5518672
> SharpeRatio.annualized(stratAndSPY)
                                portfolio.returns SPY.Adjusted
Annualized Sharpe Ratio (Rf=0%)         0.9487574    0.3981902

An annualized Sharpe just shy of 1, using adjusted data, with a CAGR/max drawdown ratio of less than one half, and a max drawdown far beyond the levels of acceptable (even 20% may be too much for some people, though I’d argue it’s acceptable over a long enough time frame provided it’s part of a diversified portfolio of other such uncorrelated strategies).

Now, the claim is that this strategy consistently beats the S&P 500 year after year? That can also be tested.

diff <- stratAndSPY[,1] - stratAndSPY[,2]
diffAndModTLT <- cbind(diff, modifiedTLT)

Essentially, I shorted the SPY against the strategy (which would simply mean still long the SPY, except at 50% instead of 150%), and this is the result, in comparison to the 3x leveraged TLT (and cut down by the original discrepancy on a daily level)

So even after shorting the SPY and its massive drawdown away, one is still left with what amounts to a diluted TMF position, which has its own issues. Here are the three statistics, once again:

> Return.annualized(diffAndModTLT)
                  portfolio.returns TLT.Adjusted
Annualized Return         0.1181923    0.1356003
> maxDrawdown(diffAndModTLT)
               portfolio.returns TLT.Adjusted
Worst Drawdown         0.3930016    0.6348332
> SharpeRatio.annualized(diffAndModTLT)
                                portfolio.returns TLT.Adjusted
Annualized Sharpe Ratio (Rf=0%)         0.4889822    0.3278975

In short, for a strategy that markets itself on beating the SPY, shorting the SPY against it costs more in upside than is gained on the downside. Generally, anytime I see an article claiming “this strategy does really well against benchmark XYZ”, my immediate intuition is: “so what does the equity curve look like when you short your benchmark against your strategy?” If the performance deteriorates, that once again means some tough questions need asking. That stated, the original strategy handily trounced the SPY benchmark, and the difference trounced the leveraged TLT. Just that my own personal benchmark is an annualized return over max drawdown of 1 or more (meaning that even the worst streak can be made up for within a year’s time–or, more practically, that generally, you don’t go a year without getting paid).

Let’s move on to the second strategy. In this instance, it’s highly similar–50% SPXL (3x SPY), 40% TMF (3x TLT), and 10% TVIX (2x VXX). Again, let’s compare synthetic to actual.

getSymbols("TVIX", from="1990-01-01")
TVIXrets <- Return.calculate(Ad(TVIX))
vxxTvix <- merge(2*VXXrets, TVIXrets, join='inner')
charts.PerformanceSummary(vxxTvix) #about identical

We’re in luck. This chart is about identical, so no tricks necessary.

The other two instruments are identical, so we can move straight to the strategy.

First, let’s replicate an equity curve:

rSPY_TLT_VXX2 <- cbind(3*SPYrets, modifiedTLT,  2*VXXrets)
rSPY_TLT_VXX2 <- rSPY_TLT_VXX2[![,3]),]

stratTwo <- Return.rebalancing(R=rSPY_TLT_VXX, weights = c(.5, .4, .1), rebalance_on="years", geometric=TRUE)
stratTwoAndSPY <- merge(stratTwo, SPYrets, join='inner')
charts.PerformanceSummary(stratTwoAndSPY["2010-11-30::"], geometric=TRUE)

General shape and pattern of the strategy’s equity curve achieved. What does it look like since the inception of the original VIX futures?


Very similar to the one before. Let’s look at them side by side.

bothStrats <- merge(strat, stratTwo, join='inner')
colnames(bothStrats) <- c("strategy one", "strategy two")

First of all, let’s do a side by side comparison of the three statistics:

> Return.annualized(bothStrats)
                  strategy one strategy two
Annualized Return    0.2305339    0.2038783
> maxDrawdown(bothStrats)
               strategy one strategy two
Worst Drawdown    0.4901882    0.4721624
> SharpeRatio.annualized(bothStrats)
                                strategy one strategy two
Annualized Sharpe Ratio (Rf=0%)    0.9487574    0.9075242

The second strategy seems to be strictly worse than the first. If we’d short the second against the first, essentially, it’d mean we have a 15% exposure to TLT, and a -15% exposure to VXX. For a fun tangent, what does such a strategy’s equity curve look like?

stratDiff <- bothStrats[,1] - bothStrats[,2]

With the following statistics:

> Return.annualized(stratDiff)
                  strategy one
Annualized Return   0.02606221
> maxDrawdown(stratDiff)
[1] 0.1254502
> SharpeRatio.annualized(stratDiff)
                                strategy one
Annualized Sharpe Ratio (Rf=0%)    0.8544455

Basically, a 1 to 5 annualized return to max drawdown ratio. In short, this may be how a lot of mediocre managers go out of business–see an idea that looks amazing, leverage it up, then have one short period of severe underperformance, where everything goes wrong for a small amount of time (EG equity market-neutral quant meltdown of August 2007, flash crash, etc.), and then a whole fund keels over. In fact, these spikes of underperformance are the absolute worst type of phenomena that can happen to many systematic strategies, since they trigger the risk-exit mechanisms, and then recover right before the strategy can make it back in.

Finally, we have our third strategy, which introduces one last instrument–ZIV. Here’s the specification for that strategy:

30% SPXL
30% ZIV
30% TMF
10% TVIX

Again, let’s go through the process and get our replicated equity curve.

download("", destfile="longZIV.txt")
ZIVlong <- read.csv("longZIV.txt", stringsAsFactors=FALSE)
ZIVlong <- xts(ZIVlong[,2:5],$Date))
ZIVrets <- Return.calculate(Cl(ZIVlong))

strat3components <- cbind(3*SPYrets, ZIVrets, modifiedTLT, 2*VXXrets)
strat3components <- strat3components[![,4]),]
stratThree <- Return.rebalancing(strat3components, weights=c(.3, .3, .3, .1), rebalance_on="years", geometric=TRUE)
stratThreeAndSPY <- merge(stratThree, SPYrets, join='inner')

With the resulting equity curve replication:

And again, the full-backtest equity curve:


To put it together, let’s combine all three strategies, and the SPY.

threeStrats <- merge(bothStrats, stratThree, join='inner')
threeStratsSPY <- merge(threeStrats, SPYrets, join='inner')
colnames(threeStratsSPY)[3] <- "strategy three"

stats <- data.frame(cbind(t(Return.annualized(threeStratsSPY))*100, 
stats$returnToDrawdown <- stats[,1]/stats[,2]

The resultant equity curve:

The resultant statistics:

> stats
               Annualized.Return Worst.Drawdown Annualized.Sharpe.Ratio..Rf.0.. returnToDrawdown           23.053387       49.01882                       0.9487574        0.4702967
strategy.two           20.387835       47.21624                       0.9075242        0.4317970
strategy three         15.812291       39.31843                       0.8019835        0.4021597
SPY.Adjusted            7.937642       55.18672                       0.3981902        0.1438325

In short, all of them share the same sort of profile–very strong annualized returns, even scarier drawdowns, Sharpe ratios close to 1 (albeit using adjusted data), and return to drawdown ratios slightly less than .5 (also scary). Are these complete strategies on their own? No. Do they beat the S&P 500? Yes. Does it make sense that they beat the S&P 500? Considering that two of these configurations have a greater than 100% market exposure, and the direction of the equity markets tends to be up over time (at least over the period during which the VIX futures traded), then this absolutely makes sense. Should one short the S&P against these strategies? I wouldn’t say so.

One last thing to note–the period over which these strategies were tested (inception of VIX futures) had no stagflation, and the Fed’s QE may be partially (or a great deal) responsible for the rise in the equity markets since the crisis. If the market declines as a result of the fed raising rates (which it inevitably will have to at some point), these strategies might be seriously hurt, so I’d certainly advise a great deal of caution, even going forward.

In any case, for better or for worse, here are a few strategies from SeekingAlpha, replicated to as far as synthetic data is available.

Thanks for reading.