The Quarterly Tactical Strategy (aka QTS)

Posted on February 13, 2015 by Ilya Kipnis • Posted in Asset Allocation, Portfolio Management, R, Replication, SeekingAlpha • Tagged R • 25 Comments

This post introduces the Quarterly Tactical Strategy, introduced by Cliff Smith on a Seeking Alpha article. It presents a variation on the typical dual-momentum strategy that only trades over once a quarter, yet delivers a seemingly solid risk/return profile. The article leaves off a protracted period of unimpressive performance at the turn of the millennium, however.

First off, due to the imprecision of the English language, I received some help from TrendXplorer in implementing this strategy. Those who are fans of Amibroker are highly encouraged to visit his blog.

In any case, this strategy is fairly simple:

Take a group of securities (in this case, 8 mutual funds), and do the following:

Rank a long momentum (105 days) and a short momentum (20 days), and invest in the security with the highest composite rank, with ties going to the long momentum (that is, .501*longRank + .499*shortRank, for instance). If the security with the highest composite rank is greater than its three month SMA, invest in that security, otherwise, hold cash.

There are two critical points that must be made here:

1) The three-month SMA is *not* a 63-day SMA. It is precisely a three-point SMA up to that point on the monthly endpoints of that security.
2) Unlike in flexible asset allocation or elastic asset allocation, the cash asset is not treated as a formal asset.

Let’s look at the code. Here’s the data–which are adjusted-data mutual fund data (although with a quarterly turnover, the frequent trading constraint of not trading out of the security is satisfied, though I’m not sure how dividends are treated–that is, whether a retail investor would actually realize these returns less a hopefully tiny transaction cost through their brokers–aka hopefully not too much more than $1 per transaction):

require(quantmod)
require(PerformanceAnalytics)
require(TTR)

#get our data from yahoo, use adjusted prices
symbols <- c("NAESX", #small cap
             "PREMX", #emerging bond
             "VEIEX", #emerging markets
             "VFICX", #intermediate investment grade
             "VFIIX", #GNMA mortgage
             "VFINX", #S&P 500 index
             "VGSIX", #MSCI REIT
             "VGTSX", #total intl stock idx
             "VUSTX") #long term treasury (cash)

getSymbols(symbols, from="1990-01-01")
prices <- list()
for(i in 1:length(symbols)) {
  prices[[i]] <- Ad(get(symbols[i]))  
}
prices <- do.call(cbind, prices)
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))

#define our cash asset and keep track of which column it is
cashAsset <- "VUSTX"
cashCol <- grep(cashAsset, colnames(prices))

#start our data off on the security with the least data (VGSIX in this case)
prices <- prices[!is.na(prices[,7]),] 

#cash is not a formal asset in our ranking
cashPrices <- prices[, cashCol]
prices <- prices[, -cashCol]

Nothing anybody hasn’t seen before up to this point. Get data, start it off at most recent inception mutual fund, separate the cash prices, moving along.

What follows is a rather rough implementation of QTS, not wrapped up in any sort of function that others can plug and play with (though I hope I made the code readable enough for others to tinker with).

Let’s define parameters and compute momentum.

#define our parameters
nShort <- 20
nLong <- 105
nMonthSMA <- 3

#compute momentums
rocShort <- prices/lag(prices, nShort) - 1
rocLong <- prices/lag(prices, nLong) - 1

Now comes some endpoints functionality (or, more colloquially, magic) that the xts library provides. It’s what allows people to get work done in R much faster than in other programming languages.

#take the endpoints of quarter start/end
quarterlyEps <- endpoints(prices, on="quarters")
monthlyEps <- endpoints(prices, on = "months")

#take the prices at quarterly endpoints
quarterlyPrices <- prices[quarterlyEps,]

#short momentum at quarterly endpoints (20 day)
rocShortQtrs <- rocShort[quarterlyEps,]

#long momentum at quarterly endpoints (105 day)
rocLongQtrs <- rocLong[quarterlyEps,]

In short, get the quarterly endpoints (and monthly, we need those for the monthly SMA which you’ll see shortly) and subset our momentum computations on those quarterly endpoints. Now let’s get the total rank for those subset-on-quarters momentum computations.

#rank short momentum, best highest rank
rocSrank <- t(apply(rocShortQtrs, 1, rank))

#rank long momentum, best highest rank
rocLrank <- t(apply(rocLongQtrs, 1, rank))

#total rank, long slightly higher than short, sum them
totalRank <- 1.01*rocLrank + rocSrank 

#function that takes 100% position in highest ranked security
maxRank <- function(rankRow) {
  return(rankRow==max(rankRow))
}

#apply above function to our quarterly ranks every quarter
rankPos <- t(apply(totalRank, 1, maxRank))

So as you can see, I rank the momentum computations by row, take a weighted sum (in slight favor of the long momentum), and then simply take the security with the highest rank at every period, giving me one 1 in every row and 0s otherwise.

Now let’s do the other end of what determines position, which is the SMA filter. In this case, we need monthly data points for our three-month SMA, and then subset it to quarters to be on the same timescale as the quarterly ranks.

#SMA of securities, only use monthly endpoints
#subset to quarters
#then filter
monthlyPrices <- prices[monthlyEps,]
monthlySMAs <- xts(apply(monthlyPrices, 2, SMA, n=nMonthSMA), order.by=index(monthlyPrices))
quarterlySMAs <- monthlySMAs[index(quarterlyPrices),]
smaFilter <- quarterlyPrices > quarterlySMAs

Now let’s put it together to get our final positions. Our cash position is simply one if we don’t have a single investment in the time period, zero else.

finalPos <- rankPos*smaFilter
finalPos <- finalPos[!is.na(rocLongQtrs[,1]),]
cash <- xts(1-rowSums(finalPos), order.by=index(finalPos))
finalPos <- merge(finalPos, cash, join='inner')

Now we can finally compute our strategy returns.

prices <- merge(prices, cashPrices, join='inner')
returns <- Return.calculate(prices)
stratRets <- Return.portfolio(returns, finalPos)
table.AnnualizedReturns(stratRets)
maxDrawdown(stratRets)
charts.PerformanceSummary(stratRets)
plot(log(cumprod(1+stratRets)))

So what do things look like?

Like this:

> table.AnnualizedReturns(stratRets)
                          portfolio.returns
Annualized Return                    0.1899
Annualized Std Dev                   0.1619
Annualized Sharpe (Rf=0%)            1.1730
> maxDrawdown(stratRets)
[1] 0.1927991

And since the first equity curve doesn’t give much of an indication in the early years, I’ll take Tony Cooper’s (of Double Digit Numerics) advice and show the log equity curve as well.

In short, from 1997 through 2002, this strategy seemed to be going nowhere, and then took off. As I was able to get this backtest going back to 1997, it makes me wonder why it was only started in 2003 for the SeekingAlpha article, since even with 1997-2002 thrown in, this strategy’s risk/reward profile still looks fairly solid. CAR about 1 (slightly less, but that’s okay, for something that turns over so infrequently, and in so few securities!), and a Sharpe higher than 1. Certainly better than what the market itself offered over the same period of time for retail investors. Perhaps Cliff Smith himself could chime in regarding his choice of time frame.

In any case, Cliff Smith marketed the strategy as having a higher than 28% CAGR, and his article was published on August 15, 2014, and started from 2003. Let’s see if we can replicate those results.

stratRets <- stratRets["2002-12-31::2014-08-15"]
table.AnnualizedReturns(stratRets)
maxDrawdown(stratRets)
charts.PerformanceSummary(stratRets)
plot(log(cumprod(1+stratRets)))

Which results in this:

> table.AnnualizedReturns(stratRets)
                          portfolio.returns
Annualized Return                    0.2862
Annualized Std Dev                   0.1734
Annualized Sharpe (Rf=0%)            1.6499
> maxDrawdown(stratRets)
[1] 0.1911616

A far improved risk/return profile without 1997-2002 (or the out-of-sample period after Cliff Smith’s publishing date). Here are the two equity curves in-sample.

In short, the results look better, and the SeekingAlpha article’s results are validated.

Now, let’s look at the out-of-sample periods on their own.

stratRets <- Return.portfolio(returns, finalPos)
earlyOOS <- stratRets["::2002-12-31"]
table.AnnualizedReturn(earlyOOS)
maxDrawdown(earlyOOS)
charts.PerformanceSummary(earlyOOS)

Here are the results:

> table.AnnualizedReturns(earlyOOS)
                          portfolio.returns
Annualized Return                    0.0321
Annualized Std Dev                   0.1378
Annualized Sharpe (Rf=0%)            0.2327
> maxDrawdown(earlyOOS)
[1] 0.1927991

And with the corresponding equity curve (which does not need a log-scale this time).

In short, it basically did nothing for an entire five years. That’s rough, and I definitely don’t like the fact that it was left off of the SeekingAlpha article, as anytime I can extend a backtest further back than a strategy’s original author and then find skeletons in the closet (as happened for each and every one of Harry Long’s strategies), it sets off red flags on this end, so I’m hoping that there’s some good explanation for leaving off 1997-2002 that I’m simply failing to mention.

Lastly, let’s look at the out-of-sample performance.

lateOOS <- stratRets["2014-08-15::"]
charts.PerformanceSummary(lateOOS)
table.AnnualizedReturns(lateOOS)
maxDrawdown(lateOOS)

With the following results:

> table.AnnualizedReturns(lateOOS)
                          portfolio.returns
Annualized Return                    0.0752
Annualized Std Dev                   0.1426
Annualized Sharpe (Rf=0%)            0.5277
> maxDrawdown(lateOOS)
[1] 0.1381713

And the following equity curve:

Basically, while it’s ugly, it made new equity highs over only two more transactions (and in such a small sample size, anything can happen), so I’ll put this one down as a small, ugly win, but a win nevertheless.

If anyone has any questions or comments about this strategy, I’d love to see them, as this is basically a first-pass replica. To Mr. Cliff Smith’s credit, the results check out, and when the worst thing one can say about a strategy is that it had a period of a flat performance (aka when the market crested at the end of the Clinton administration right before the dot-com burst), well, that’s not the worst thing in the world.

More replications (including one requested by several readers) will be upcoming.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

25 thoughts on “The Quarterly Tactical Strategy (aka QTS)”

Terence Doherty says:

February 13, 2015 at 8:18 pm

Hi Ilya,

Very nice, thanks very much for this analysis. I can answer the question about start dates. Cliff does all his testing on ETFReplay, and there are a lot of limitations there. For starters, they don’t have any data prior to 2003. Also, they mostly list ETFs, and have only a limited number of mutual funds available for testing purposes. So there’s no way that the backtest could have been extended further back using ETF replay.

That’s one reason why this study is valuable.

Terence

Reply
- Ilya Kipnis says:
  
  February 13, 2015 at 8:22 pm
  
  Terence,
  
  Thanks so much for that clarification. Understood. Shame about that. Looks like I’m the first person then to look out to 1997. Alright.
  
  Reply
Terence Doherty says:

February 13, 2015 at 8:21 pm

Ilya, one other thing: Can you calculate the distribution of all 12-month rolling returns, along with the mean/median/SD/SE/max/min and especially, the 95% confidence interval limits? Also, some plot of the trend in the 12-month rolling returns, maybe using a moving average or something?

That would be great if you could do this.

Thanks for the great work,

Terence

Reply
- Ilya Kipnis says:
  
  February 13, 2015 at 8:24 pm
  
  Terence,
  
  That’s definitely possible. Cumulative return divided by the lag 252 day cumulative return would be a 252-day (aka appx. 12 month) rolling return, and the rest is basic statistics. And it can definitely be done.
  
  -Ilya
  
  Reply
Cliff Smith says:

February 13, 2015 at 9:22 pm

Thanks, IIya, for producing these results. I think, in general, you replicated the results I published on Seeking Alpha from 2003-mid-2014 (mid-2014 was when the article was published). I worked with TendXplorer as he was trying to match my results, and I think that except for two selections, we matched exactly. And I think we finally decided the differences were caused by the data we used. TrendXplorer used Yahoo Finance adjusted data. I used ETFreplay in my calculations, and I think their data are more accurate than Yahoo data. I have posted about some of the issues with Yahoo data on my SA Instablog. Here are the links:
1. http://seekingalpha.com/instablog/8190911-cliff-smith/3466275-dividend-errors-present-in-yahoo-data
2. http://seekingalpha.com/instablog/8190911-cliff-smith/3277135-inherent-inaccuracy-using-yahoo-adjusted-price-data-in-backtesting-taa-strategies

I’m not sure what adjusted data you used. Maybe I missed it in your posting.

Let me explain a few things for you so you and your readers don’t get the mistaken impression that I was “hiding” anything. Like I said, I used ETFreplay for my calculations. ETFreplay limits calculations to 2003 for the backtesting method I used. That’s why I only posted results from 2003-present (mid-2014 in the case of my article).

I did attempt to take the QTS results back to 1999 on my SA Instablog using hand calculations and stockcharts.com plot data. It was a very tedious process as you might imagine. But I was very careful and hopefully I didn’t make any mistakes. It showed the QTS strategy did well from 1999-2003 using these hand calculations. I think stockcharts.com data is very accurate (of the same quality as ETFreplay’s data in my estimation). The results of my hand calculations are shown here: http://seekingalpha.com/instablog/8190911-cliff-smith/3218375-extending-the-backtesting-of-the-quarterly-tactical-strategy-to-midminus-1999

So that leads me to try to understand why the strategy did relatively poorly using “out-of-sample” data in your calculations. I’m not trying to say the “out-of-sample” results are not valid, but I think if you used Yahoo data for the mutual fund proxies, there could be reasons why results prior to 2003 might be suspect. If you look at the adjusted data for the mutual funds prior to 2000, you will see some of them will have adjusted values quite small, somewhere between $10.00 and $1.00. Yahoo only downloads the adjusted price values to $0.01, so you can see large inaccuracies in the data when the values are so small. I’m not sure if 5-month returns or 20-day returns can be accurately determined in this case.

I think the proof of whether QTS produces good results in the future will be determined as we go. Because quarterly updating is employed, there are relatively few data points even if we go back to 1997.
We can estimate the statistical 95% confidence interval in a crude manner by taking the CAGR as the mean annual return, and the standard deviation of the strategy (that is 16.3% based on ETFreplay calculations). The 95% confidence interval is defined as plus or minus 2 standard deviations from the mean. Thus, one can have 95% confidence going forward that the annual return will between -4.6% and 60.6%.

Cliff

Reply
- Ilya Kipnis says:
  
  February 13, 2015 at 9:28 pm
  
  Thanks so much, Cliff.
  
  I suspected that there was a reason that you didn’t go back to 1997, and I was pretty sure it wasn’t simply because “oh hey let’s leave off the results and hope nobody questions it”.
  
  That’s an excellent point about Yahoo data, and that’s exactly where I get the data from. So I suppose those results should be asterisked. I agree with the proof being what the OOS results produce, however. As for confidence intervals on returns, that’s a valid point, and indeed, it would be a crude estimate, since it would assume all market conditions that generated the observations are identical, which they aren’t. But, for instance, it’s better than nothing when comparing say, two different return streams (EG, a strategy vs. a benchmark), since they would have faced identical market conditions.
  
  -Ilya
  
  Reply
  - Cliff Smith says:
    
    February 13, 2015 at 10:15 pm
    
    Ilya,
    
    If you want, we can compare the mid-1999-2003 picks/results I calculated using stockcharts.com data with the picks/returns you calculated using Yahoo data. My picks/returns were:
    
    July-Sept 1999 PREMX Growth = +4.98%
    
    Oct-Dec 1999 PREMX Growth = +19.22%
    
    1999 (partial) = +25.5%; SPY = +8.0%; Difference = 17.5%
    
    Jan-Mar 2000 VEIEX Growth = -2.40%
    
    Apr-June 2000 PREMX Growth = +2.94%
    
    July-Sept 2000 PREMX Growth = +6.47%
    
    Oct-Dec 2000 VFIIX Growth = +3.71%
    
    2000 = +11.1%; SPY = -9.8%; Difference = 20.9%
    
    Jan-Mar 2001 PREMX Growth = +4.62%
    
    Apr-June 2001 VFIIX Growth = +1.23%
    
    July-Sept 2001 VGSIX Growth = -2.48%
    
    Oct-Dec 2001 PREMX Growth = +7.44%
    
    2001 = +11.1%; SPY = -12.1%; Difference = 23.2%
    
    Jan-Mar 2002 VEIEX Growth = +10.39%
    
    Apr-June 2002 VGSIX Growth =+4.80%
    
    July-Sept 2002 VGSIX Growth = -8.45%
    
    Oct-Dec 2002 VFICX Growth = +1.59%
    
    2002 = +7.6%; SPY = -21.5%; Difference = 29.1%
    
    Jan-Mar 2003 PREMX Growth = +5.85%
    
    Apr-June 2003 PREMX Growth = +13.76%
    
    July-Sept 2003 NAESX Growth = +8.65%
    
    Oct-Dec 2003 VEIEX Growth=+18.85%
    
    2003 = +55.5%; SPY = +28.2%; Difference = 27.3%
    
    Cliff
- Ilya Kipnis says:
  
  February 13, 2015 at 10:32 pm
  
  Cliff,
  
  Here are my early year selections for the time period you’re referencing.
  
  [1,] “1999-06-30” “VEIEX”
  [2,] “1999-09-30” “VFIIX”
  [3,] “1999-12-31” “VEIEX”
  [4,] “2000-03-31” “PREMX”
  [5,] “2000-06-30” “PREMX”
  [6,] “2000-09-29” “VGSIX”
  [7,] “2000-12-29” “VGSIX”
  [8,] “2001-03-30” “VFICX”
  [9,] “2001-06-29” “VGSIX”
  [10,] “2001-09-28” “VFICX”
  [11,] “2001-12-31” “VEIEX”
  [12,] “2002-03-28” “VGSIX”
  [13,] “2002-06-28” “VGSIX”
  [14,] “2002-09-30” “VFICX”
  [15,] “2002-12-31” “PREMX”
  [16,] “2003-03-31” “PREMX”
  [17,] “2003-06-30” “NAESX”
  [18,] “2003-09-30” “VEIEX”
  [19,] “2003-12-31” “VEIEX”
  
  It seems Yahoo dropped the ball on this one, as our results contain quite a few differences.
  
  Reply
  - Cliff Smith says:
    
    February 14, 2015 at 1:40 am
    
    Looks like we match for 2002 and 2003, but there are six misses out of ten for mid-1999 through 2001. That is similar to what TrendXplorer showed if I remember correctly.
    
    Cliff
- Cliff Smith says:
  
  February 14, 2015 at 8:21 am
  
  As Terry has reported in SA comments and I have reported on my Seeking Alpha Instablog posts, the other error with Yahoo data is that they periodically miss dividends. These dividend misses are very random, and can pop up anytime. This problem of missed dividends is hard to resolve because there isn’t a Yahoo phone number or email address to report the errors.
  
  Cliff
  
  Reply
  - Cliff Smith says:
    
    February 19, 2015 at 3:00 am
    
    It turns out that Yahoo misses five dividends for PREMX in 2002 and 2003. Here are the misses:
    Apr 2002
    May 2002
    Nov 2002
    Jan 2003
    Feb 2003
    
    Cliff
gregor says:

February 14, 2015 at 6:35 am

I think that this strategy may suffer from overfitting and that may explain the poor out of sample result.

Reply
- Cliff Smith says:
  
  February 14, 2015 at 8:14 pm
  
  Gregor,
  
  TrendXplorer actually tested a very wide range of long and short momentum returns, and even varied the split between long momentum and short momentum. He found that there were a number of parameters that produced very robust results (high CAGRs with low maxDDs). Robustness means that the parameters can be varied quite a bit with little degradation in performance. The most robust parameters are not the ones I initially used and reported in my article (where I was severely limited by ETFreplay).
  
  Having a high degree of robustness implies that there wasn’t a high degree of curve-fitting. Plus I would suggest that the out-of-sample results using higher fidelity data gave good performance (see the comments I made above).
  
  Thanks,
  Cliff
  
  Reply
- Terence says:
  
  February 14, 2015 at 10:26 pm
  
  Gregor, that’s a common knee-jerk response to ANY algorithmic strategy. Look at any blog or article presenting testing of algorithmic strategies and there will usually be some that make the same accusation, invariably without a shred of evidence to support the presumption. The problem is two-fold: 1) It is easy to speculate that this might have occurred, but quite another to provide any evidence to support it; and 2) curve-fitting is not necessarily bad at all, and in fact can be a very good thing.
  
  The issue is not so much whether curve-fitting has occurred (that’s a given; nobody tests random strategies composed of any old random parameters). The real issues are whether backtesting results were likely the result of chance, and what the predictive validity might be. Regarding the former, if stress testing provides evidence of robustness, then that argues strongly that results were NOT the result of chance. Regarding the latter, if there is reasonable predictive validity, then curve-fitting is a non-issue.
  
  One example: when you go to your doctor and he measures your cholesterol and blood pressure, then gives you a prescription for a statin or a blood pressure medication, he is very definitely curve-fitting. In this case, the data mining was on the town of Framingham MA. The backtesting of the people in this town directly led to algorithms—-that’s right, algorithms—-that predict risk of heart attacks, etc. So your doctor is overfitting the data when he is trying to determine whether you could benefit from a statin. But if this prevents you from keeling over dead from a heart attack, then can you say that your doctor’s curve fitting was a bad thing?
  
  Similarly, if “overfitted” data has enough predictive validity to make money for you, can you really say that curve fitting was a bad thing?
  
  TMD
  
  Reply
Pingback: The Whole Street’s Daily Wrap for 2/13/2015 | The Whole Street
Tony Cooper says:

February 16, 2015 at 11:53 pm

I suspect that there is a lot of overfitting in this strategy. Three clues:

(1) Cliff says “TrendXplorer actually tested a very wide range of long and short momentum returns.” “Very wide” is the danger signal here. The more you test the more likely overfitting is.

(2) The out of sample returns were poor

(3) 2003 to 2015 is a period where there was a bull, bear, bull market sequence. You only have to get two dates right and you can get 22% annual returns for SPY. It is so so easy to find a strategy that picks these two dates that my cat could do it. Invariably any data mining strategy for the last decade will overfit to these dates.

Some comments:
“Having a high degree of robustness implies that there wasn’t a high degree of curve-fitting.” This isn’t true. Similar strategies are highly correlated with each other. (e.g. 19 day rule will have a high correlation to a 20 day rule). High correlations should NOT be interpreted as “robustness.”

Nor does it mean that there is no overfitting – overfitted strategies can also be highly correlated with each other and thus appear to be “robust.”

“Bad thing” – overfitting is, by definition bad. That’s what “over” means – too much. In this context overfitting means that the strategy won’t work when you put real money on it.

Nice exercise – thanks Ilya and Cliff.

Reply
- Ilya Kipnis says:
  
  February 17, 2015 at 12:03 am
  
  Tony,
  
  Is there a particular procedure you go through in order to determine whether or not a strategy is overfit?
  
  -Ilya
  
  Reply
  - Tony Cooper says:
    
    February 18, 2015 at 1:25 am
    
    “How do you determine when a strategy is fit or overfit” – I don’t. I don’t work that way. I don’t develop a strategy and then test to see if it is overfit. I use an optimisation method that avoids overfitting in the first place. You know the method – it’s called cross validation.
    
    Specifically it’s called time series cross-validation. I added it to the R package caret. It can be used to optimise a strategy in a “robust” way. By robust I mean robust across time periods not across parameters as described previously.
    
    I can’t describe it here. Cross validation with correlated data is an art. For market data everything is correlated with everything else. There are so many traps for the unwary.
    
    Before you do anything read Elements of Statistical Learning section 7.10.2 “The Wrong and Right Way to Do Cross-validation.”
    
    Then read Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments. It’s a tough read but if you want to be “sound” it’s the only reference I know.
    
    Other comments:
    
    I didn’t say that high correlations are a bad thing – I just said that robustness tests have to take them into account. And by “poor” I mean relatively poor and that’s pretty typical for overfitted data – it doesn’t prove that overfitting has occurred, it’s just evidence.
    
    “Very different conceptual bases” – that’s true. I come from a strict machine learning / data mining perspective. Other people come from a trading perspective and don’t even realise that they are doing machine learning. The difference is quite funny. You have stolen some of our terminology and changed its meaning. No wonder we can’t communicate. I don’t understand why you use the term “curve fitting” to mean overfitting. I have no idea what “85,000 distinct iterative tests” means.
- Terence says:
  
  February 17, 2015 at 10:11 am
  
  Tony, you’ve got several things wrong.
  
  First of all, the out of sample returns were NOT poor. Ilya’s analysis is wrong, because the input data is hopelessly flawed.
  
  I think we have very different conceptual bases. Maybe you can explain what exactly you mean by “overfitting”, vs. just “fitting.” How do you determine when a strategy is “fit” or “overfit?”
  
  I disagree that high correlation necessarily is a bad thing. It is only a bad thing if the strategies do not work. If you developed a strategy that trades SPX and gave a sell signal in June 2008, and a buy signal in March 2009, then developed another similar strategy that trades QQQ that also gave a sell signal in June 2008 and a buy signal in March 2009, these strategies are highly correlated. Should one reject both just because they are highly correlated? My point is that correlation says nothing about either performance or predictive validity (or lack thereof). And that is the crux of the issue: does a strategy have predictive validity?
  
  RE your point #3: You have some things wrong. If you read the discussion, you will see that the strategy was backtested not to just 2003, but to 1999. That included two bear markets, several corrections, a global economic crisis, a period of rising interest rates (albeit less than a year), and a period of declining interest rates. So, some of the start dates occurred at the worst possible times, e.g. at market highs, and they were included in the overall analyses. Note that we didn’t just calculate CAGR from one start date and one end date. We calculated CAGRs from several thousand start dates and end dates, then calculated summary statistics describing the distribution of returns, along with 95% CI limits. Are you asserting that four or five thousand results all occurred by chance, and that all are completely irrelevant to forward performance? Are you also asserting that 85,000 distinct iterative tests also occurred by chance? If so, what is the evidence to support that conclusion?
  
  Thanks,
  
  Terence
  
  Reply
- Cliff Smith says:
  
  February 17, 2015 at 11:48 am
  
  Hi Tony,
  
  Thanks for your comments. It is much appreciated. In all of these tactical strategies, there is some degree of curve fitting by definition. The thing we want to avoid is over-fitting, in which just one set of parameters works, and all other combinations do not work. That’s where robustness comes in. If we can use a wide range of parameters and still get good performance and relatively low drawdown, then we have not used excessive curve fitting in my opinion.
  
  When I said we have tested a wide range of long and short momentum returns, it wasn’t to just pick the “best” set of parameters for performance, but to find the most robust parameters. We found a set of parameters that allowed us to vary the settings by about 15% and still see good results. So if the low momentum parameter was 50 days, it could be changed from 43 days to 57 days without affecting the overall results.
  
  The out-of-sample results were not poor. In other comments on this blog I explained the infidelity of Yahoo adjusted data that Ilya used. When I used stockcharts data, I showed QTS did well from mid-1999-2002.
  
  Yes, it is important to do well in 2002-2003 and 2008 to see acceptable performance, but the strategy also has to do well in other situations, e.g. the correction in 2011, the rising interest rates of 2013, the choppy market go 2014, etc. Since mid-1999, QTS has picked a winner in 85% of the quarters. That means it has done well most of the time.
  
  Thanks,
  Cliff
  
  Reply
Pingback: An Attempt At Replicating David Varadi’s Percentile Channels Strategy | QuantStrat TradeR
gregor says:

February 18, 2015 at 5:28 am

One way to test the robustness would be to compute the CAGR by throwing out the returns for two or three of the years with the highest returns. If the CAGR changes too much, the strategy would seem to have benefited from some singular events.

Reply
- Terence says:
  
  February 19, 2015 at 8:45 am
  
  Gregor, what you are suggesting is that the CAGR is unduly influenced by a few outlier years. I agree that this can happen. The problem is one of sampling: if you only examine calendar year returns, then maybe you have 20 data points or so. That’s not even close to being sufficient to get an idea of what the returns might realistically be prospectively.
  
  CAGR becomes increasingly irrelevant as the SD of returns increases. In the limit, when the SD of returns is very high, CAGR is all but useless and totally misleading.
  
  That’s why we calculated all possible rolling returns (month, quarter, and 12-month), then described the distribution of returns. If you have 21 years of returns, then you will have about 5,000 rolling 12-month periods, so 5,000 CAGRs. One can then describe this distribution using summary statistics, but the best single statistic in my opinion is the 95% confidence interval limits. This gives the range of returns one can anticipate with a probability of 95%.
  
  Terence
  
  Reply
Tyler says:

February 23, 2015 at 3:14 pm

I changed all of the tickers to ETFs and came out with some great results with this symbol list
symbols <- c("VB", #small cap (VSMAX,NAESX)
"ITE", #Intermediate-Term Treasury
"VCIT", #intermediate investment grade (VFICX)
"VMBS", #GNMA mortgage (VFIJX)
"SPY", #S&P 500 index
"VNQ", #MSCI REIT
"VXUS", #total intl stock idx (admiral VTIAX)
"VUSTX") #long term treasury (cash)

Reply
Pingback: The Downside of Rankings-Based Strategies | QuantStrat TradeR