Create Amazing Looking Backtests With This One Wrong–I Mean Weird–Trick! (And Some Troubling Logical Invest Results)

This post will outline an easy-to-make mistake in writing vectorized backtests–namely in using a signal obtained at the end of a period to enter (or exit) a position in that same period. The difference in results one obtains is massive.

Today, I saw two separate posts from Alpha Architect and Mike Harris both referencing a paper by Valeriy Zakamulin on the fact that some previous trend-following research by Glabadanidis was done with shoddy results, and that Glabadanidis’s results were only reproducible through instituting lookahead bias.

The following code shows how to reproduce this lookahead bias.

First, the setup of a basic moving average strategy on the S&P 500 index from as far back as Yahoo data will provide.

require(quantmod)
require(xts)
require(TTR)
require(PerformanceAnalytics)

getSymbols('^GSPC', src='yahoo', from = '1900-01-01')
monthlyGSPC <- Ad(GSPC)[endpoints(GSPC, on = 'months')]

# change this line for signal lookback
movAvg <- SMA(monthlyGSPC, 10)

signal <- monthlyGSPC > movAvg
gspcRets <- Return.calculate(monthlyGSPC)

And here is how to institute the lookahead bias.

lookahead <- signal * gspcRets
correct <- lag(signal) * gspcRets

These are the “results”:

compare <- na.omit(cbind(gspcRets, lookahead, correct))
colnames(compare) <- c("S&P 500", "Lookahead", "Correct")
charts.PerformanceSummary(compare)
rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare))
logRets <- log(cumprod(1+compare))
chart.TimeSeries(logRets, legend.loc='topleft')

Of course, this equity curve is of no use, so here’s one in log scale.

As can be seen, lookahead bias makes a massive difference.

Here are the numerical results:

                            S&P 500  Lookahead   Correct
Annualized Return         0.0740000 0.15550000 0.0695000
Annualized Std Dev        0.1441000 0.09800000 0.1050000
Annualized Sharpe (Rf=0%) 0.5133000 1.58670000 0.6623000
Worst Drawdown            0.5255586 0.08729914 0.2699789
Calmar Ratio              0.1407286 1.78119192 0.2575219

Again, absolutely ridiculous.

Note that when using Return.Portfolio (the function in PerformanceAnalytics), that package will automatically give you the next period’s return, instead of the current one, for your weights. However, for those writing “simple” backtests that can be quickly done using vectorized operations, an off-by-one error can make all the difference between a backtest in the realm of reasonable, and pure nonsense. However, should one wish to test for said nonsense when faced with impossible-to-replicate results, the mechanics demonstrated above are the way to do it.

Now, onto other news: I’d like to thank Gerald M for staying on top of one of the Logical Invest strategies–namely, their simple global market rotation strategy outlined in an article from an earlier blog post.

Up until March 2015 (the date of the blog post), the strategy had performed well. However, after said date?

It has been a complete disaster, which, in hindsight, was evident when I passed it through the hypothesis-driven development framework process I wrote about earlier.

So, while there has been a great deal written about not simply throwing away a strategy because of short-term underperformance, and that anomalies such as momentum and value exist because of career risk due to said short-term underperformance, it’s never a good thing when a strategy creates historically large losses, particularly after being published in such a humble corner of the quantitative financial world.

In any case, this was a post demonstrating some mechanics, and an update on a strategy I blogged about not too long ago.

Thanks for reading.

NOTE: I am always interested in hearing about new opportunities which may benefit from my expertise, and am always happy to network. You can find my LinkedIn profile here.

On The Relationship Between the SMA and Momentum

Happy new year. This post will be a quick one covering the relationship between the simple moving average and time series momentum. The implication is that one can potentially derive better time series momentum indicators than the classical one applied in so many papers.

Okay, so the main idea for this post is quite simple:

I’m sure we’re all familiar with classical momentum. That is, the price now compared to the price however long ago (3 months, 10 months, 12 months, etc.). E.G. P(now) – P(10)
And I’m sure everyone is familiar with the simple moving average indicator, as well. E.G. SMA(10).

Well, as it turns out, these two quantities are actually related.

It turns out, if instead of expressing momentum as the difference of two numbers, it is expressed as the sum of returns, it can be written (for a 10 month momentum) as:

MOM_10 = return of this month + return of last month + return of 2 months ago + … + return of 9 months ago, for a total of 10 months in our little example.

This can be written as MOM_10 = (P(0) – P(1)) + (P(1) – P(2)) + … + (P(9) – P(10)). (Each difference within parentheses denotes one month’s worth of returns.)

Which can then be rewritten by associative arithmetic as: (P(0) + P(1) + … + P(9)) – (P(1) + P(2) + … + P(10)).

In other words, momentum — aka the difference between two prices, can be rewritten as the difference between two cumulative sums of prices. And what is a simple moving average? Simply a cumulative sum of prices divided by however many prices summed over.

Here’s some R code to demonstrate.

require(quantmod)
require(TTR)
require(PerformanceAnalytics)

getSymbols('SPY', from = '1990-01-01')
monthlySPY <- Ad(SPY)[endpoints(SPY, on = 'months')]
monthlySPYrets <- Return.calculate(monthlySPY)
#dividing by 10 since that's the moving average period for comparison
signalTSMOM <- (monthlySPY - lag(monthlySPY, 10))/10 
signalDiffMA <- diff(SMA(monthlySPY, 10))

# rounding just 
sum(round(signalTSMOM, 3)==round(signalDiffMA, 3), na.rm=TRUE)

With the resulting number of times these two signals are equal:

[1] 267

In short, every time.

Now, what exactly is the punchline of this little example? Here’s the punchline:

The simple moving average is…fairly simplistic as far as filters go. It works as a pedagogical example, but it has some well known weaknesses regarding lag, windowing effects, and so on.

Here’s a toy example how one can get a different momentum signal by changing the filter.

toyStrat <- monthlySPYrets * lag(signalTSMOM > 0)

emaSignal <- diff(EMA(monthlySPY, 10))
emaStrat <- monthlySPYrets * lag(emaSignal > 0)

comparison <- cbind(toyStrat, emaStrat)
colnames(comparison) <- c("DiffSMA10", "DiffEMA10")
charts.PerformanceSummary(comparison)
table.AnnualizedReturns(comparison)

With the following results:

                          DiffSMA10 DiffEMA10
Annualized Return            0.1051    0.0937
Annualized Std Dev           0.1086    0.1076
Annualized Sharpe (Rf=0%)    0.9680    0.8706

While the difference of EMA10 strategy didn’t do better than the difference of SMA10 (aka standard 10-month momentum), that’s not the point. The point is that the momentum signal is derived from a simple moving average filter, and that by using a different filter, one can still use a momentum type of strategy.

Or, put differently, the main/general takeaway here is that momentum is the slope of a filter, and one can compute momentum in an infinite number of ways depending on the filter used, and can come up with a myriad of different momentum strategies.

Thanks for reading.

NOTE: I am currently contracting in Chicago, and am always open to networking. Contact me at my email at ilya.kipnis@gmail.com or find me on my LinkedIn here.

A First Attempt At Applying Ensemble Filters

This post will outline a first failed attempt at applying the ensemble filter methodology to try and come up with a weighting process on SPY that should theoretically be a gradual process to shift from conviction between a bull market, a bear market, and anywhere in between. This is a follow-up post to this blog post.

So, my thinking went like this: in a bull market, as one transitions from responsiveness to smoothness, responsive filters should be higher than smooth filters, and vice versa, as there’s generally a trade-off between the two. In fact, in my particular formulation, the quantity of the square root of the EMA of squared returns punishes any deviation from a flat line altogether (although inspired by Basel’s measure of volatility, which is the square root of the 18-day EMA of squared returns), while the responsiveness quantity punishes any deviation from the time series of the realized prices. Whether these are the two best measures of smoothness and responsiveness is a topic I’d certainly appreciate feedback on.

In any case, an idea I had on the top of my head was that in addition to having a way of weighing multiple filters by their responsiveness (deviation from price action) and smoothness (deviation from a flat line), that by taking the sums of the sign of the difference between one filter and its neighbor on the responsiveness to smoothness spectrum, provided enough ensemble filters (say, 101, so there are 100 differences), one would obtain a way to move from full conviction of a bull market, to a bear market, to anything in between, and have this be a smooth process that doesn’t have schizophrenic swings of conviction.

Here’s the code to do this on SPY from inception to 2003:

require(TTR)
require(quantmod)
require(PerformanceAnalytics)

getSymbols('SPY', from = '1990-01-01')

smas <- list()
for(i in 2:250) {
  smas[[i]] <- SMA(Ad(SPY), n = i)
}
smas <- do.call(cbind, smas)

xtsApply <- function(x, FUN, n, ...) {
  out <- xts(apply(x, 2, FUN, n = n, ...), order.by=index(x))
  return(out)
}

sumIsNa <- function(x){
  return(sum(is.na(x)))
}

ensembleFilter <- function(data, filters, n = 20, conviction = 1, emphasisSmooth = .51) {
  
  # smoothness error
  filtRets <- Return.calculate(filters)
  sqFiltRets <- filtRets * filtRets * 100 #multiply by 100 to prevent instability
  smoothnessError <- sqrt(xtsApply(sqFiltRets, EMA, n = n))
  
  # responsiveness error
  repX <- xts(matrix(data, nrow = nrow(filters), ncol=ncol(filters)), 
              order.by = index(filters))
  dataFilterReturns <- repX/filters - 1
  sqDataFilterQuotient <- dataFilterReturns * dataFilterReturns * 100 #multiply by 100 to prevent instability
  responseError <- sqrt(xtsApply(sqDataFilterQuotient, EMA, n = n))
  
  # place smoothness and responsiveness errors on same notional quantities
  meanSmoothError <- rowMeans(smoothnessError)
  meanResponseError <- rowMeans(responseError)
  ratio <- meanSmoothError/meanResponseError
  ratio <- xts(matrix(ratio, nrow=nrow(filters), ncol=ncol(filters)),
               order.by=index(filters))
  responseError <- responseError * ratio
  
  # for each term in emphasisSmooth, create a separate filter
  ensembleFilters <- list()
  for(term in emphasisSmooth) {
    
    # compute total errors, raise them to a conviction power, find the normalized inverse
    totalError <- smoothnessError * term + responseError * (1-term)
    totalError <- totalError ^ conviction
    invTotalError <- 1/totalError
    normInvError <- invTotalError/rowSums(invTotalError)
    
    # ensemble filter is the sum of candidate filters in proportion
    # to the inverse of their total error
    tmp <- xts(rowSums(filters * normInvError), order.by=index(data))
    
    #NA out time in which one or more filters were NA
    initialNAs <- apply(filters, 1, sumIsNa) 
    tmp[initialNAs > 0] <- NA
    tmpName <- paste("emphasisSmooth", term, sep="_")
    colnames(tmp) <- tmpName
    ensembleFilters[[tmpName]] <- tmp
  }
  
  # compile the filters
  out <- do.call(cbind, ensembleFilters)
  return(out)
}

t1 <- Sys.time()
filts <- ensembleFilter(Ad(SPY), smas, n = 20, conviction = 2, emphasisSmooth = seq(0, 1, by=.01))
t2 <- Sys.time()

par(mfrow=c(3,1))
filtDiffs <- sign(filts[,1:100] - filts[,2:101])
sumDiffs <- xts(rowSums(filtDiffs), order.by=index(filtDiffs))

plot(Ad(SPY)["::2003"])
plot(sumDiffs["::2003"])
plot(diff(sumDiffs["::2003"]))

And here’s the very underwhelming result:

Essentially, while I expected to see changes in conviction of maybe 20 at most, instead, my indicator of sum of sign differences did exactly as I had hoped it wouldn’t, which is to be a very binary sort of mechanic. My intuition was that between an “obvious bull market” and an “obvious bear market” that some differences would be positive, some negative, and that they’d net each other out, and the conviction would be zero. Furthermore, that while any individual crossover is binary, all one hundred signs being either positive or negative would be a more gradual process. Apparently, this was not the case. To continue this train of thought later, one thing to try would be an all-pairs sign difference. Certainly, I don’t feel like giving up on this idea at this point, and, as usual, feedback would always be appreciated.

Thanks for reading.

NOTE: I am currently consulting in an analytics capacity in downtown Chicago. However, I am also looking for collaborators that wish to pursue interesting trading ideas. If you feel my skills may be of help to you, let’s talk. You can email me at ilya.kipnis@gmail.com, or find me on my LinkedIn here.

Review: Invoance’s TRAIDE application

This review will be about Inovance Tech’s TRAIDE system. It is an application geared towards letting retail investors apply proprietary machine learning algorithms to assist them in creating systematic trading strategies. Currently, my one-line review is that while I hope the company founders mean well, the application is still in an early stage, and so, should be checked out by potential users/venture capitalists as something with proof of potential, rather than a finished product ready for mass market. While this acts as a review, it’s also my thoughts as to how Inovance Tech can improve its product.

A bit of background: I have spoken several times to some of the company’s founders, who sound like individuals at about my age level (so, fellow millennials). Ultimately, the selling point is this:

Systematic trading is cool.
Machine learning is cool.
Therefore, applying machine learning to systematic trading is awesome! (And a surefire way to make profits, as Renaissance Technologies has shown.)

While this may sound a bit snarky, it’s also, in some ways, true. Machine learning has become the talk of the town, from IBM’s Watson (RenTec itself hired a bunch of speech recognition experts from IBM a couple of decades back), to Stanford’s self-driving car (invented by Sebastian Thrun, who now heads Udacity), to the Netflix prize, to god knows what Andrew Ng is doing with deep learning at Baidu. Considering how well machine learning has done at much more complex tasks than “create a half-decent systematic trading algorithm”, it shouldn’t be too much to ask this powerful field at the intersection of computer science and statistics to help the retail investor glued to watching charts generate a lot more return on his or her investments than through discretionary chart-watching and noise trading. To my understanding from conversations with Inovance Tech’s founders, this is explicitly their mission.

(Note: Dr. Wes Gray and Alpha Architect, in their book DIY Financial Advisor, have already established that listening to pundits, and trying to succeed at discretionary trading, is on a whole, a loser’s game)

However, I am not sure that Inovance’s TRAIDE application actually accomplishes this mission in its current state.

Here’s how it works:

Users select one asset at a time, and select a date range (data going back to Dec. 31, 2009). Assets are currently limited to highly liquid currency pairs, and can take the following settings: 1 hour, 2 hour, 4 hour, 6 hour, or daily bar time frames.

Users then select from a variety of indicators, ranging from technical (moving averages, oscillators, volume calculations, etc. Mostly an assortment of 20th century indicators, though the occasional adaptive moving average has managed to sneak in–namely KAMA–see my DSTrading package, and MAMA–aka the Mesa Adaptive Moving Average, from John Ehlers) to more esoteric ones such as some sentiment indicators. Here’s where things start to head south for me, however. Namely, that while it’s easy to add as many indicators as a user would like, there is basically no documentation on any of them, with no links to reference, etc., so users will have to bear the onus of actually understanding what each and every one of the indicators they select actually does, and whether or not those indicators are useful. The TRAIDE application makes zero effort (thus far) to actually get users acquainted with the purpose of these indicators, what their theoretical objective is (measure conviction in a trend, detect a trend, oscillator type indicator, etc.)

Furthermore, regarding indicator selections, users also specify one parameter setting for each indicator per strategy. E.G. if I had an EMA crossover, I’d have to create a new strategy for a 20/100 crossover, a 21/100 crossover, rather than specifying something like this:

short EMA: 20-60
long EMA: 80-200

Quantstrat itself has this functionality, and while I don’t recall covering parameter robustness checks/optimization (in other words, testing multiple parameter sets–whether one uses them for optimization or robustness is up to the user, not the functionality) in quantstrat on this blog specifically, this information very much exists in what I deem “the official quantstrat manual”, found here. In my opinion, the option of covering a range of values is mandatory so as to demonstrate that any given parameter setting is not a random fluke. Outside of quantstrat, I have demonstrated this methodology in my Hypothesis Driven Development posts, and in coming up for parameter selection for volatility trading.

Where TRAIDE may do something interesting, however, is that after the user specifies his indicators and parameters, its “proprietary machine learning” algorithms (WARNING: COMPLETELY BLACK BOX) determine for what range of values of the indicators in question generated the best results within the backtest, and assign them bullishness and bearishness scores. In other words, “looking backwards, these were the indicator values that did best over the course of the sample”. While there is definite value to exploring the relationships between indicators and future returns, I think that TRAIDE needs to do more in this area, such as reporting P-values, conviction, and so on.

For instance, if you combine enough indicators, your “rule” is a market order that’s simply the intersection of all of the ranges of your indicators. For instance, TRAIDE may tell a user that the strongest bullish signal when the difference of the moving averages is between 1 and 2, the ADX is between 20 and 25, the ATR is between 0.5 and 1, and so on. Each setting the user selects further narrows down the number of trades the simulation makes. In my opinion, there are more ways to explore the interplay of indicators than simply one giant AND statement, such as an “OR” statement, of some sort. (E.G. select all values, put on a trade when 3 out of 5 indicators fall into the selected bullish range in order to place more trades). While it may be wise to filter down trades to very rare instances if trading a massive amount of instruments, such that of several thousand possible instruments, only several are trading at any given time, with TRAIDE, a user selects only *one* asset class (currently, one currency pair) at a time, so I’m hoping to see TRAIDE create more functionality in terms of what constitutes a trading rule.

After the user selects both a long and a short rule (by simply filtering on indicator ranges that TRAIDE’s machine learning algorithms have said are good), TRAIDE turns that into a backtest with a long equity curve, short equity curve, total equity curve, and trade statistics for aggregate, long, and short trades. For instance, in quantstrat, one only receives aggregate trade statistics. Whether long or short, all that matters to quantstrat is whether or not the trade made or lost money. For sophisticated users, it’s trivial enough to turn one set of rules on or off, but TRAIDE does more to hold the user’s hand in that regard.

Lastly, TRAIDE then generates MetaTrader4 code for a user to download.

And that’s the process.

In my opinion, while what Inovance Tech has set out to do with TRAIDE is interesting, I wouldn’t recommend it in its current state. For sophisticated individuals that know how to go through a proper research process, TRAIDE is too stringent in terms of parameter settings (one at a time), pre-coded indicators (its target audience probably can’t program too well), and asset classes (again, one at a time). However, for retail investors, my issue with TRAIDE is this:

There is a whole assortment of undocumented indicators, which then move to black-box machine learning algorithms. The result is that the user has very little understanding of what the underlying algorithms actually do, and why the logic he or she is presented with is the output. While TRAIDE makes it trivially easy to generate any one given trading system, as multiple individuals have stated in slightly different ways before, writing a strategy is the easy part. Doing the work to understand if that strategy actually has an edge is much harder. Namely, checking its robustness, its predictive power, its sensitivity to various regimes, and so on. Given TRAIDE’s rather short data history (2010 onwards), and coupled with the opaqueness that the user operates under, my analogy would be this:

It’s like giving an inexperienced driver the keys to a sports car in a thick fog on a winding road. Nobody disputes that a sports car is awesome. However, the true burden of the work lies in making sure that the user doesn’t wind up smashing into a tree.

Overall, I like the TRAIDE application’s mission, and I think it may have potential as something for the retail investors that don’t intend to learn the ins-and-outs of coding a trading system in R (despite me demonstrating many times over how to put such systems together). I just think that there needs to be more work put into making sure that the results a user sees are indicative of an edge, rather than open the possibility of highly-flexible machine learning algorithms chasing ghosts in one of the noisiest and most dynamic data sets one can possibly find.

My recommendations are these:

1) Multiple asset classes
2) Allow parameter ranges, and cap the number of trials at any given point (E.G. 4 indicators with ten settings each = 10,000 possible trading systems = blow up the servers). To narrow down the number of trial runs, use techniques from experimental design to arrive at decent combinations. (I wish I remembered my response surface methodology techniques from my master’s degree about now!)
3) Allow modifications of order sizing (E.G. volatility targeting, stop losses), such as I wrote about in my hypothesis-driven development posts.
4) Provide *some* sort of documentation for the indicators, even if it’s as simple as a link to investopedia (preferably a lot more).
5) Far more output is necessary, especially for users who don’t program. Namely, to distinguish whether or not there is a legitimate edge, or if there are too few observations to reject the null hypothesis of random noise.
6) Far longer data histories. 2010 onwards just seems too short of a time-frame to be sure of a strategy’s efficacy, at least on daily data (may not be true for hourly).
7) Factor in transaction costs. Trading on an hourly time frame will mean far less P&L per trade than on a daily resolution. If MT4 charges a fixed ticket price, users need to know how this factors into their strategy.
8) Lastly, dogfooding. When I spoke last time with Inovance Tech’s founders, they claimed they were using their own algorithms to create a forex strategy, which was doing well in live trading. By the time more of these suggestions are implemented, it’d be interesting to see if they have a track record as a fund, in addition to as a software provider.

If all of these things are accounted for and automated, the product will hopefully accomplish its mission of bringing systematic trading and machine learning to more people. I think TRAIDE has potential, and I’m hoping that its staff will realize that potential.

Thanks for reading.

NOTE: I am currently contracting in downtown Chicago, and am always interested in networking with professionals in the systematic trading and systematic asset management/allocation spaces. Find my LinkedIn here.

EDIT: Today in my email (Dec. 3, 2015), I received a notice that Inovance was making TRAIDE completely free. Perhaps they want a bunch more feedback on it?

A Filter Selection Method Inspired From Statistics

This post will demonstrate a method to create an ensemble filter based on a trade-off between smoothness and responsiveness, two properties looked for in a filter. An ideal filter would both be responsive to price action so as to not hold incorrect positions, while also be smooth, so as to not incur false signals and unnecessary transaction costs.

So, ever since my volatility trading strategy, using three very naive filters (all SMAs) completely missed a 27% month in XIV, I’ve decided to try and improve ways to create better indicators in trend following. Now, under the realization that there can potentially be tons of complex filters in existence, I decided instead to focus on a way to create ensemble filters, by using an analogy from statistics/machine learning.

In static data analysis, for a regression or classification task, there is a trade-off between bias and variance. In a nutshell, variance is bad because of the possibility of overfitting on a few irregular observations, and bias is bad because of the possibility of underfitting legitimate data. Similarly, with filtering time series, there are similar concerns, except bias is called lag, and variance can be thought of as a “whipsawing” indicator. Essentially, an ideal indicator would move quickly with the data, while at the same time, not possess a myriad of small bumps-and-reverses along the way, which may send false signals to a trading strategy.

So, here’s how my simple algorithm works:

The inputs to the function are the following:

A) The time series of the data you’re trying to filter
B) A collection of candidate filters
C) A period over which to measure smoothness and responsiveness, defined as the square root of the n-day EMA (2/(n+1) convention) of the following:
a) Responsiveness: the squared quantity of price/filter – 1
b) Smoothness: the squared quantity of filter(t)/filter(t-1) – 1 (aka R’s return.calculate) function
D) A conviction factor, to which power the errors will be raised. This should probably be between .5 and 3
E) A vector that defines the emphasis on smoothness (vs. emphasis on responsiveness), which should range from 0 to 1.

Here’s the code:

require(TTR)
require(quantmod)

getSymbols('SPY', from = '1990-01-01')

smas <- list()
for(i in 2:250) {
  smas[[i]] <- SMA(Ad(SPY), n = i)
}
smas <- do.call(cbind, smas)

xtsApply <- function(x, FUN, n, ...) {
  out <- xts(apply(x, 2, FUN, n = n, ...), order.by=index(x))
  return(out)
}

sumIsNa <- function(x){
  return(sum(is.na(x)))
}

This gets SPY data, and creates two utility functions–xtsApply, which is simply a column-based apply that replaces the original index that using a column-wise apply discards, and sumIsNa, which I use later for counting the numbers of NAs in a given row. It also creates my candidate filters, which, to keep things simple, are just SMAs 2-250.

Here’s the actual code of the function, with comments in the code itself to better explain the process from a technical level (for those still unfamiliar with R, look for the hashtags):

ensembleFilter <- function(data, filters, n = 20, conviction = 1, emphasisSmooth = .51) {
  
  # smoothness error
  filtRets <- Return.calculate(filters)
  sqFiltRets <- filtRets * filtRets * 100 #multiply by 100 to prevent instability
  smoothnessError <- sqrt(xtsApply(sqFiltRets, EMA, n = n))
  
  # responsiveness error
  repX <- xts(matrix(data, nrow = nrow(filters), ncol=ncol(filters)), 
              order.by = index(filters))
  dataFilterReturns <- repX/filters - 1
  sqDataFilterQuotient <- dataFilterReturns * dataFilterReturns * 100 #multiply by 100 to prevent instability
  responseError <- sqrt(xtsApply(sqDataFilterQuotient, EMA, n = n))
  
  # place smoothness and responsiveness errors on same notional quantities
  meanSmoothError <- rowMeans(smoothnessError)
  meanResponseError <- rowMeans(responseError)
  ratio <- meanSmoothError/meanResponseError
  ratio <- xts(matrix(ratio, nrow=nrow(filters), ncol=ncol(filters)),
               order.by=index(filters))
  responseError <- responseError * ratio
  
  # for each term in emphasisSmooth, create a separate filter
  ensembleFilters <- list()
  for(term in emphasisSmooth) {
    
    # compute total errors, raise them to a conviction power, find the normalized inverse
    totalError <- smoothnessError * term + responseError * (1-term)
    totalError <- totalError ^ conviction
    invTotalError <- 1/totalError
    normInvError <- invTotalError/rowSums(invTotalError)
    
    # ensemble filter is the sum of candidate filters in proportion
    # to the inverse of their total error
    tmp <- xts(rowSums(filters * normInvError), order.by=index(data))
    
    #NA out time in which one or more filters were NA
    initialNAs <- apply(filters, 1, sumIsNa) 
    tmp[initialNAs > 0] <- NA
    tmpName <- paste("emphasisSmooth", term, sep="_")
    colnames(tmp) <- tmpName
    ensembleFilters[[tmpName]] <- tmp
  }
  
  # compile the filters
  out <- do.call(cbind, ensembleFilters)
  return(out)
}

The vast majority of the computational time takes place in the two xtsApply calls. On 249 different simple moving averages, the process takes about 30 seconds.

Here’s the output, using a conviction factor of 2:

t1 <- Sys.time()
filts <- ensembleFilter(Ad(SPY), smas, n = 20, conviction = 2, emphasisSmooth = c(0, .05, .25, .5, .75, .95, 1))
t2 <- Sys.time()
print(t2-t1)


plot(Ad(SPY)['2007::2011'])
lines(filts[,1], col='blue', lwd=2)
lines(filts[,2], col='green', lwd = 2)
lines(filts[,3], col='orange', lwd = 2)
lines(filts[,4], col='brown', lwd = 2)
lines(filts[,5], col='maroon', lwd = 2)
lines(filts[,6], col='purple', lwd = 2)
lines(filts[,7], col='red', lwd = 2)

And here is an example, looking at SPY from 2007 through 2011.

In this case, I chose to go from blue to green, orange, brown, maroon, purple, and finally red for smoothness emphasis of 0, 5%, 25%, 50%, 75%, 95%, and 1, respectively.

Notice that the blue line is very wiggly, while the red line sometimes barely moves, such as during the 2011 drop-off.

One thing that I noticed in the course of putting this process together is something that eluded me earlier–namely, that naive trend-following strategies which are either fully long or fully short based on a crossover signal can lose money quickly in sideways markets.

However, theoretically, by finely varying the jumps between 0% to 100% emphasis on smoothness, whether in steps of 1% or finer, one can have a sort of “continuous” conviction, by simply adding up the signs of differences between various ensemble filters. In an “uptrend”, the difference as one moves from the most responsive to most smooth filter should constantly be positive, and vice versa.

In the interest of brevity, this post doesn’t even have a trading strategy attached to it. However, an implied trading strategy can be to be long or short the SPY depending on the sum of signs of the differences in filters as you move from responsiveness to smoothness. Of course, as the candidate filters are all SMAs, it probably wouldn’t be particularly spectacular. However, for those out there who use more complex filters, this may be a way to create ensembles out of various candidate filters, and create even better filters. Furthermore, I hope that given enough candidate filters and an objective way of selecting them, it would be possible to reduce the chances of creating an overfit trading system. However, anything with parameters can potentially be overfit, so that may be wishful thinking.

All in all, this is still a new idea for me. For instance, the filter to compute the error terms can probably be improved. The inspiration for an EMA 20 essentially came from how Basel computes volatility (if I recall, correctly, it uses the square root of an 18 day EMA of squared returns), and the very fact that I use an EMA can itself be improved upon (why an EMA instead of some other, more complex filter). In fact, I’m always open to how I can improve this concept (and others) from readers.

Thanks for reading.

NOTE: I am currently contracting in Chicago in an analytics capacity. If anyone would like to meet up, let me know. You can email me at ilya.kipnis@gmail.com, or contact me through my LinkedIn here.

How well can you scale your strategy?

This post will deal with a quick, finger in the air way of seeing how well a strategy scales–namely, how sensitive it is to latency between signal and execution, using a simple volatility trading strategy as an example. The signal will be the VIX/VXV ratio trading VXX and XIV, an idea I got from Volatility Made Simple’s amazing blog, particularly this post. The three signals compared will be the “magical thinking” signal (observe the close, buy the close, named from the ruleOrderProc setting in quantstrat), buy on next-day open, and buy on next-day close.

Let’s get started.

require(downloader)
require(PerformanceAnalytics)
require(IKTrading)
require(TTR)

download("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vxvdailyprices.csv", 
         destfile="vxvData.csv")
download("https://dl.dropboxusercontent.com/s/jk6der1s5lxtcfy/XIVlong.TXT",
         destfile="longXIV.txt")
download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT", 
         destfile="longVXX.txt") #requires downloader package
getSymbols('^VIX', from = '1990-01-01')


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxv <- xts(read.zoo("vxvData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
vixVxv <- Cl(VIX)/Cl(vxv)


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

vxxCloseRets <- Return.calculate(Cl(vxx))
vxxOpenRets <- Return.calculate(Op(vxx))
xivCloseRets <- Return.calculate(Cl(xiv))
xivOpenRets <- Return.calculate(Op(xiv))

vxxSig <- vixVxv > 1
xivSig <- 1-vxxSig

magicThinking <- vxxCloseRets * lag(vxxSig) + xivCloseRets * lag(xivSig)
nextOpen <- vxxOpenRets * lag(vxxSig, 2) + xivOpenRets * lag(xivSig, 2)
nextClose <- vxxCloseRets * lag(vxxSig, 2) + xivCloseRets * lag(xivSig, 2)
tradeWholeDay <- (nextOpen + nextClose)/2

compare <- na.omit(cbind(magicThinking, nextOpen, nextClose, tradeWholeDay))
colnames(compare) <- c("Magic Thinking", "Next Open", 
                       "Next Close", "Execute Through Next Day")
charts.PerformanceSummary(compare)
rbind(table.AnnualizedReturns(compare), 
      maxDrawdown(compare), CalmarRatio(compare))

par(mfrow=c(1,1))
chart.TimeSeries(log(cumprod(1+compare), base = 10), legend.loc='topleft', ylab='log base 10 of additional equity',
                 main = 'VIX vx. VXV different execution times')

So here’s the run-through. In addition to the magical thinking strategy (observe the close, buy that same close), I tested three other variants–a variant which transacts the next open, a variant which transacts the next close, and the average of those two. Effectively, I feel these three could give a sense of a strategy’s performance under more realistic conditions–that is, how well does the strategy perform if transacted throughout the day, assuming you’re managing a sum of money too large to just plow into the market in the closing minutes (and if you hope to get rich off of trading, you will have a larger sum of money than the amount you can apply magical thinking to). Ideally, I’d use VWAP pricing, but as that’s not available for free anywhere I know of, that means that readers can’t replicate it even if I had such data.

In any case, here are the results.

Equity curves:

Log scale (for Mr. Tony Cooper and others):

Stats:

                          Magic Thinking Next Open Next Close Execute Through Next Day
Annualized Return               0.814100 0.8922000  0.5932000                 0.821900
Annualized Std Dev              0.622800 0.6533000  0.6226000                 0.558100
Annualized Sharpe (Rf=0%)       1.307100 1.3656000  0.9529000                 1.472600
Worst Drawdown                  0.566122 0.5635336  0.6442294                 0.601014
Calmar Ratio                    1.437989 1.5831686  0.9208586                 1.367510

My reaction? The execute on next day’s close performance being vastly lower than the other configurations (and that deterioration occurring in the most recent years) essentially means that the fills will have to come pretty quickly at the beginning of the day. While the strategy seems somewhat scalable through the lens of this finger-in-the-air technique, in my opinion, if the first full day of possible execution after signal reception will tank a strategy from a 1.44 Calmar to a .92, that’s a massive drop-off, after holding everything else constant. In my opinion, I think this is quite a valid question to ask anyone who simply sells signals, as opposed to manages assets. Namely, how sensitive are the signals to execution on the next day? After all, unless those signals come at 3:55 PM, one is most likely going to be getting filled the next day.

Now, while this strategy is a bit of a tomato can in terms of how good volatility trading strategies can get (they can get a *lot* better in my opinion), I think it made for a simple little demonstration of this technique. Again, a huge thank you to Mr. Helmuth Vollmeier for so kindly keeping up his dropbox all this time for the volatility data!

Thanks for reading.

NOTE: I am currently contracting in a data science capacity in Chicago. You can email me at ilya.kipnis@gmail.com, or find me on my LinkedIn here. I’m always open to beers after work if you’re in the Chicago area.

NOTE 2: Today, on October 21, 2015, if you’re in Chicago, there’s a Chicago R Users Group conference at Jaks Tap at 6:00 PM. Free pizza, networking, and R, hosted by Paul Teetor, who’s a finance guy. Hope to see you there.

Volatility Stat-Arb Shenanigans

This post deals with an impossible-to-implement statistical arbitrage strategy using VXX and XIV. The strategy is simple: if the average daily return of VXX and XIV was positive, short both of them at the close. This strategy makes two assumptions of varying dubiousness: that one can “observe the close and act on the close”, and that one can short VXX and XIV.

So, recently, I decided to play around with everyone’s two favorite instruments on this blog–VXX and XIV, with the idea that “hey, these two instruments are diametrically opposed, so shouldn’t there be a stat-arb trade here?”

So, in order to do a lick-finger-in-the-air visualization, I implemented Mike Harris’s momersion indicator.

momersion <- function(R, n, returnLag = 1) {
  momentum <- sign(R * lag(R, returnLag))
  momentum[momentum < 0] <- 0
  momersion <- runSum(momentum, n = n)/n * 100
  colnames(momersion) <- "momersion"
  return(momersion)
}

And then I ran the spread through it.


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

volSpread <- xivRets + vxxRets
volSpreadMomersion <- momersion(volSpread, n = 252)
plot(volSpreadMomersion)

In other words, this spread is certainly mean-reverting at just about all times.

And here is the code for the results from 2011 onward, from when the XIV and VXX actually started trading.

#both sides
sig <- -lag(sign(volSpread))
longShort <- sig * volSpread
charts.PerformanceSummary(longShort['2011::'], main = 'long and short spread')

#long spread only
sig <- -lag(sign(volSpread))
sig[sig < 0] <- 0
longOnly <- sig * volSpread
charts.PerformanceSummary(longOnly['2011::'], main = 'long spread only')


#short spread only
sig <- -lag(sign(volSpread))
sig[sig > 0] <- 0
shortOnly <- sig * volSpread
charts.PerformanceSummary(shortOnly['2011::'], main = 'short spread only')

threeStrats <- na.omit(cbind(longShort, longOnly, shortOnly))["2011::"]
colnames(threeStrats) <- c("LongShort", "Long", "Short")
rbind(table.AnnualizedReturns(threeStrats), CalmarRatio(threeStrats))

Here are the equity curves:

Long-short:

Long-only:

Short-only:

With the following statistics:

                          LongShort      Long    Short
Annualized Return          0.115400 0.0015000 0.113600
Annualized Std Dev         0.049800 0.0412000 0.027900
Annualized Sharpe (Rf=0%)  2.317400 0.0374000 4.072100
Calmar Ratio               1.700522 0.0166862 7.430481

In other words, the short side is absolutely amazing as a trade–except for the one small fact of having it be impossible to actually execute, or at least as far as I’m aware. Anyhow, this was simply a for-fun post, but hopefully it served some purpose.

Thanks for reading.

NOTE: I am currently contracting and am looking to network in the Chicago area. You can find my LinkedIn here.

Hypothesis-Driven Development Part II

This post will evaluate signals based on the rank regression hypotheses covered in the last post.

The last time around, we saw that rank regression had a very statistically significant result. Therefore, the next step would be to evaluate the basic signals — whether or not there is statistical significance in the actual evaluation of the signal–namely, since the strategy from SeekingAlpha simply selects the top-ranked ETF every month, this is a very easy signal to evaluate.

Simply, using the 1-24 month formation periods for cumulative sum of monthly returns, select the highest-ranked ETF and hold it for one month.

Here’s the code to evaluate the signal (continued from the last post), given the returns, a month parameter, and an EW portfolio to compare with the signal.


signalBacktest <- function(returns, nMonths, ewPortfolio) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
  nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(returns))
  selection <- (nMonthAvgRank==5) * 1 #select highest average performance
  sigTest <- Return.portfolio(R = returns, weights = selection)
  difference <- sigTest - ewPortfolio
  diffZscore <- mean(difference)/sd(difference)
  sigZscore <- mean(sigTest)/sd(sigTest)
  return(list(sigTest, difference, mean(sigTest), sigZscore, mean(difference), diffZscore))
}

ewPortfolio <- Return.portfolio(monthRets, rebalance_on="months")

sigBoxplots <- list()
excessBoxplots <- list()
sigMeans <- list()
sigZscores <- list()
diffMeans <- list()
diffZscores <- list()
for(i in 1:24) {
  tmp <- signalBacktest(monthRets, nMonths = i, ewPortfolio)
  sigBoxplots[[i]] <- tmp[[1]]
  excessBoxplots[[i]] <- tmp[[2]]
  sigMeans[[i]] <- tmp[[3]]
  sigZscores[[i]] <- tmp[[4]]
  diffMeans[[i]] <- tmp[[5]]
  diffZscores[[i]] <- tmp[[6]]
}

sigBoxplots <- do.call(cbind, sigBoxplots)
excessBoxplots <- do.call(cbind, excessBoxplots)
sigMeans <- do.call(c, sigMeans)
sigZscores <- do.call(c, sigZscores)
diffMeans <- do.call(c, diffMeans)
diffZscores <- do.call(c, diffZscores)

par(mfrow=c(2,1))
plot(as.numeric(sigMeans)*100, type='h', main = 'signal means', 
     ylab = 'percent per month', xlab='formation period')
plot(as.numeric(sigZscores), type='h', main = 'signal Z scores', 
     ylab='Z scores', xlab='formation period')

plot(as.numeric(diffMeans)*100, type='h', main = 'mean difference between signal and EW',
     ylab = 'percent per month', xlab='formation period')
plot(as.numeric(diffZscores), type='h', main = 'difference Z scores',
     ylab = 'Z score', xlab='formation period')

boxplot(as.matrix(sigBoxplots), main = 'signal boxplots', xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(sigBoxplots[,1:12]), main = 'signal boxplots 1 through 12 month formations', 
        xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')

boxplot(as.matrix(excessBoxplots), main = 'difference (signal - EW) boxplots', 
        xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(excessBoxplots[,1:12]), main = 'difference (signal - EW) boxplots 1 through 12 month formations', 
        xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')

Okay, so what’s going on here is that I compare the signal against the equal weight portfolio, and take means and z scores of both the signal values in general, and against the equal weight portfolio. I plot these values, along with boxplots of the distributions of both the signal process, and the difference between the signal process and the equal weight portfolio.

Here are the results:




To note, the percents are already multiplied by 100, so in the best cases, the rank strategy outperforms the equal weight strategy by about 30 basis points per month. However, these results are…not even in the same parking lot as statistical significance, let alone in the same ballpark.

Now, at this point, in case some people haven’t yet read Brian Peterson’s paper on strategy development, the point of hypothesis-driven development is to *reject* hypothetical strategies ASAP before looking at any sort of equity curve and trying to do away with periods of underperformance. So, at this point, I would like to reject this entire strategy because there’s no statistical evidence to actually continue. Furthermore, because August 2015 was a rather interesting month, especially in terms of volatility dispersion, I want to return to volatility trading strategies, now backed by hypothesis-driven development.

If anyone wants to see me continue to rule testing with this process, let me know. If not, I have more ideas on the way.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses

This post will begin to apply a hypothesis-driven development framework (that is, the framework written by Brian Peterson on how to do strategy construction correctly, found here) to a strategy I’ve come across on SeekingAlpha. Namely, Cliff Smith posted about a conservative bond rotation strategy, which makes use of short-term treasuries, long-term treasuries, convertibles, emerging market debt, and high-yield corporate debt–that is, SHY, TLT, CWB, PCY, and JNK. What this post will do is try to put a more formal framework on whether or not this strategy is a valid one to begin with.

One note: For the sake of balancing succinctness for blog consumption and to demonstrate the computational techniques more quickly, I’ll be glossing over background research write-ups for this post/strategy, since it’s yet another take on time-series/cross-sectional momentum, except pared down to something more implementable for individual investors, as opposed to something that requires a massive collection of different instruments for massive, institutional-class portfolios.

Introduction, Overview, Objectives, Constraints, Assumptions, and Hypotheses to be Tested:

Momentum. It has been documented many times. For the sake of brevity, I’ll let readers follow the links if they’re so inclined, but among them are Jegadeesh and Titman’s seminal 1993 paper, Mark Carhart’s 1997 paper, Andreu et. Al (2012), Barroso and Santa-Clara (2013), Ilmanen’s Expected Returns (which covers momentum), and others. This list, of course, is far from exhaustive, but the point stands. Formation periods of several months (up to a year) should predict returns moving forward on some holding period, be it several months, or as is more commonly seen, one month.

Furthermore, momentum applies in two varieties–cross sectional, and time-series. Cross-sectional momentum asserts that assets that outperformed among a group will continue to outperform, while time-series momentum asserts that assets that have risen in price during a formation period will continue to do so for the short-term future.

Cliff Smith’s strategy depends on the latter, effectively, among a group of five bond ETFs. I am not certain of the objective of the strategy (he didn’t mention it), as PCY, JNK, and CWB, while they may be fixed-income in name, possess volatility on the order of equities. I suppose one possible “default” objective would be to achieve an outperforming total return against an equal-weighted benchmark, both rebalanced monthly.

The constraints are that one would need a sufficient amount of capital such that fixed transaction costs are negligible, since the strategy is a single-instrument rotation type, meaning that each month may have two-way turnover of 200% (sell one ETF, buy another). On the other hand, one would assume that the amount of capital deployed is small enough such that execution costs of trading do not materially impact the performance of the strategy. That is to say, moving multiple billions from one of these ETFs to the other is a non-starter. As all returns are computed close-to-close for the sake of simplicity, this creates the implicit assumption that the market impact and execution costs are very small compared to overall returns.

There are two overarching hypotheses to be tested in order to validate the efficacy of this strategy:

1) Time-series momentum: while it has been documented for equities and even industry/country ETFs, it may not have been formally done so yet for fixed-income ETFs, and their corresponding mutual funds. In order to validate this strategy, it should be investigated if the particular instruments it selects adhere to the same phenomena.

2) Cross-sectional momentum: again, while this has been heavily demonstrated in the past with regards to equities, ETFs are fairly new, and of the five mutual funds Cliff Smith selected, the latest one only has data going back to 1997, thus allowing less sophisticated investors to easily access diversified fixed income markets a relatively new innovation.

Essentially, both of these can be tested over a range of parameters (1-24 months).

Another note: with hypothesis-driven strategy development, the backtest is to be *nothing more than a confirmation of all the hypotheses up to that point*. That is, re-optimizing on the backtest itself means overfitting. Any proposed change to a strategy should be done in the form of tested hypotheses, as opposed to running a bunch of backtests and selecting the best trials. Taken another way, this means that every single proposed element of a strategy needs to have some form of strong hypothesis accompanying it, in order to be justified.

So, here are the two hypotheses I tested on the corresponding mutual funds:

require(quantmod)
require(PerformanceAnalytics)
require(reshape2)
symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
  prices[[symbol]] <- Ad(get(symbol))
}
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
returns <- na.omit(Return.calculate(prices))

sample <- returns['1997-08/2009-03']
monthRets <- apply.monthly(sample, Return.cumulative)

returnRegression <- function(returns, nMonths) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAverage <- na.omit(lag(nMonthAverage))
  returns <- returns[index(nMonthAverage)]
  
  rankAvg <- t(apply(nMonthAverage, 1, rank))
  rankReturn <- t(apply(returns, 1, rank))
  
  
  meltedAverage <- melt(data.frame(nMonthAverage))
  meltedReturns <- melt(data.frame(returns))
  meltedRankAvg <- melt(data.frame(rankAvg))
  meltedRankReturn <- melt(data.frame(rankReturn))
  lmfit <- lm(meltedReturns$value ~ meltedAverage$value - 1)
  rankLmfit <- lm(meltedRankReturn$value ~ meltedRankAvg$value)
  return(rbind(summary(lmfit)$coefficients, summary(rankLmfit)$coefficients))
}

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)

Essentially, in this case, I take a pooled regression (that is, take the five instruments and pool them together into one giant vector), and regress the cumulative sum of monthly returns against the next month’s return. Also, I do the same thing as the above, except also using cross-sectional ranks for each month, and performing a rank-rank regression. The sample I used was the five mutual funds (CNSAX, FAHDX, VUSTX, VFISX, and PREMX) since their inception to March 2009, since the data for the final ETF begins in April of 2009, so I set aside the ETF data for out-of-sample backtesting.

Here are the results:

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)


plot(estimates, type='h', xlab = 'Months regressed on', ylab='momentum coefficient', 
     main='future returns regressed on past momentum')
plot(pvals, type='h', xlab='Months regressed on', ylab='p-value', main='momentum significance')
abline(h=.05, col='green')
abline(h=.1, col='red')

plot(rankEstimates, type='h', xlab='Months regressed on', ylab="Rank coefficient",
     main='future return ranks regressed on past momentum ranks', ylim=c(0,3))
plot(rankPs, type='h', xlab='Months regressed on', ylab='P-values')




Of interest to note is that while much of the momentum literature specifies a reversion effect on time-series momentum at 12 months or greater, all the regression coefficients in this case (even up to 24 months!) proved to be positive, with the very long-term coefficients possessing more statistical significance than the short-term ones. Nevertheless, Cliff Smith’s chosen parameters (the two and four month settings) possess statistical significance at least at the 10% level. However, if one were to be highly conservative in terms of rejecting strategies, that in and of itself may be reason enough to reject this strategy right here.

However, the rank-rank regression (that is, regressing the future month’s cross-sectional rank on the past n month sum cross sectional rank) proved to be statistically significant beyond any doubt, with all p-values being effectively zero. In short, there is extremely strong evidence for cross-sectional momentum among these five assets, which extends out to at least two years. Furthermore, since SHY or VFISX, aka the short-term treasury fund, is among the assets chosen, since it’s a proxy for the risk-free rate, by including it among the cross-sectional rankings, the cross-sectional rankings also implicitly state that in order to be invested into (as this strategy is a top-1 asset rotation strategy), it must outperform the risk-free asset, otherwise, by process of elimination, the strategy will invest into the risk-free asset itself.

In upcoming posts, I’ll look into testing hypotheses on signals and rules.

Lastly, Volatility Made Simple has just released a blog post on the performance of volatility-based strategies for the month of August. Given the massive volatility spike, the dispersion in performance of strategies is quite interesting. I’m happy that in terms of YTD returns, the modified version of my strategy is among the top 10 for the year.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Why Backtesting On Individual Legs In A Spread Is A BAD Idea

So after reading the last post, the author of quantstrat had mostly critical feedback, mostly of the philosophy that prompted its writing in the first place. Basically, the reason I wrote it, as I stated before, is that I’ve seen many retail users of quantstrat constantly ask “how do I model individual spread instruments”, and otherwise try to look like they’re sophisticated by trading spreads.

The truth is that real professionals use industrial-strength tools to determine their intraday hedge ratios (such a tool is called a spreader). The purpose of quantstrat is not to be an execution modeling system, but to be a *strategy* modeling system. Basically, the purpose of your backtest isn’t to look at individual instruments, since in the last post, the aggregate trade statistics told us absolutely nothing about how our actual spread trading strategy performed. The backtest was a mess as far as the analytics were concerned, and thus rendering it more or less useless. So this post, by request of the author of quantstrat, is about how to do the analysis better, and looking at what matters more–the actual performance of the strategy on the actual relationship being traded–namely, the *spread*, rather than the two components.

So, without further ado, let’s look at the revised code:

require(quantmod)
require(quantstrat)
require(IKTrading)

getSymbols("UNG", from="1990-01-01")
getSymbols("DGAZ", from="1990-01-01")
getSymbols("UGAZ", from="1990-01-01")
UNG <- UNG["2012-02-22::"]
UGAZ <- UGAZ["2012-02-22::"]

spread <- 3*OHLC(UNG) - OHLC(UGAZ)

initDate='1990-01-01'
currency('USD')
Sys.setenv(TZ="UTC")
symbols <- c("spread")
stock(symbols, currency="USD", multiplier=1)

strategy.st <- portfolio.st <- account.st <-"spread_strategy_done_better"
rm.strat(portfolio.st)
rm.strat(strategy.st)
initPortf(portfolio.st, symbols=symbols, initDate=initDate, currency='USD')
initAcct(account.st, portfolios=portfolio.st, initDate=initDate, currency='USD')
initOrders(portfolio.st, initDate=initDate)
strategy(strategy.st, store=TRUE)

#### paramters

nEMA = 20

### indicator

add.indicator(strategy.st, name="EMA",
              arguments=list(x=quote(Cl(mktdata)), n=nEMA),
              label="ema")

### signals

add.signal(strategy.st, name="sigCrossover",
           arguments=list(columns=c("Close", "EMA.ema"), relationship="gt"),
           label="longEntry")

add.signal(strategy.st, name="sigCrossover",
           arguments=list(columns=c("Close", "EMA.ema"), relationship="lt"),
           label="longExit")

### rules

add.rule(strategy.st, name="ruleSignal", 
         arguments=list(sigcol="longEntry", sigval=TRUE, ordertype="market", 
                        orderside="long", replace=FALSE, prefer="Open", orderqty=1), 
         type="enter", path.dep=TRUE)

add.rule(strategy.st, name="ruleSignal", 
         arguments=list(sigcol="longExit", sigval=TRUE, orderqty="all", ordertype="market", 
                        orderside="long", replace=FALSE, prefer="Open"), 
         type="exit", path.dep=TRUE)

#apply strategy
t1 <- Sys.time()
out <- applyStrategy(strategy=strategy.st,portfolios=portfolio.st)
t2 <- Sys.time()
print(t2-t1)

In this case, things are a LOT simpler. Rather than jumping through the hoops of pre-computing an indicator, along with the shenanigans of separate rules for both the long and the short end, we simply have a spread as it’s theoretically supposed to work–three of an unleveraged ETF against the 3x leveraged ETF, and we can go long the spread, or short the spread. In this case, the dynamic seems to be on the up, and we want to capture that.

So how did we do?

#set up analytics
updatePortf(portfolio.st)
dateRange <- time(getPortfolio(portfolio.st)$summary)[-1]
updateAcct(portfolio.st,dateRange)
updateEndEq(account.st)

#trade statistics
tStats <- tradeStats(Portfolios = portfolio.st, use="trades", inclZeroDays=FALSE)
tStats[,4:ncol(tStats)] <- round(tStats[,4:ncol(tStats)], 2)
print(data.frame(t(tStats[,-c(1,2)])))
(aggPF <- sum(tStats$Gross.Profits)/-sum(tStats$Gross.Losses))
(aggCorrect <- mean(tStats$Percent.Positive))
(numTrades <- sum(tStats$Num.Trades))
(meanAvgWLR <- mean(tStats$Avg.WinLoss.Ratio[tStats$Avg.WinLoss.Ratio < Inf], na.rm=TRUE))

And here’s the output:

> print(data.frame(t(tStats[,-c(1,2)])))
                   spread
Num.Txns            76.00
Num.Trades          38.00
Net.Trading.PL       9.87
Avg.Trade.PL         0.26
Med.Trade.PL        -0.10
Largest.Winner       7.76
Largest.Loser       -1.06
Gross.Profits       21.16
Gross.Losses       -11.29
Std.Dev.Trade.PL     1.68
Percent.Positive    39.47
Percent.Negative    60.53
Profit.Factor        1.87
Avg.Win.Trade        1.41
Med.Win.Trade        0.36
Avg.Losing.Trade    -0.49
Med.Losing.Trade    -0.46
Avg.Daily.PL         0.26
Med.Daily.PL        -0.10
Std.Dev.Daily.PL     1.68
Ann.Sharpe           2.45
Max.Drawdown        -4.02
Profit.To.Max.Draw   2.46
Avg.WinLoss.Ratio    2.87
Med.WinLoss.Ratio    0.78
Max.Equity          13.47
Min.Equity          -1.96
End.Equity           9.87
> (aggPF <- sum(tStats$Gross.Profits)/-sum(tStats$Gross.Losses))
[1] 1.874225
> (aggCorrect <- mean(tStats$Percent.Positive))
[1] 39.47
> (numTrades <- sum(tStats$Num.Trades))
[1] 38
> (meanAvgWLR <- mean(tStats$Avg.WinLoss.Ratio[tStats$Avg.WinLoss.Ratio < Inf], na.rm=TRUE))
[1] 2.87

In other words, the typical profile for a trend follower, rather than the uninformative analytics from the last post. Furthermore, the position sizing and equity curve chart actually make sense now. Here they are.

To conclude, while it’s possible to model spreads using individual legs, it makes far more sense in terms of analytics to actually examine the performance of the strategy on the actual relationship being traded, which is the spread itself. Furthermore, after constructing the spread as a synthetic instrument, it can be treated like any other regular instrument in the context of analysis in quantstrat.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.