This post will investigate using Principal Components as part of a momentum strategy.

Recently, I ran across a post from David Varadi that I thought I’d further investigate and translate into code I can explicitly display (as David Varadi doesn’t). Of course, as David Varadi is a quantitative research director with whom I’ve done good work with in the past, I find that trying to investigate his ideas is worth the time spent.

So, here’s the basic idea: in an allegedly balanced universe, containing both aggressive (e.g. equity asset class ETFs) assets and defensive assets (e.g. fixed income asset class ETFs), that principal component analysis, a cornerstone in machine learning, should have some effectiveness at creating an effective portfolio.

I decided to put that idea to the test with the following algorithm:

Using the same assets that David Varadi does, I first use a rolling window (between 6-18 months) to create principal components. Making sure that the SPY half of the loadings is always positive (that is, if the loading for SPY is negative, multiply the first PC by -1, as that’s the PC we use), and then create two portfolios–one that’s comprised of the normalized positive weights of the first PC, and one that’s comprised of the negative half.

Next, every month, I use some momentum lookback period (1, 3, 6, 10, and 12 months), and invest in the portfolio that performed best over that period for the next month, and repeat.

Here’s the source code to do that: (and for those who have difficulty following, I highly recommend James Picerno’s Quantitative Investment Portfolio Analytics in R book.

require(PerformanceAnalytics) require(quantmod) require(stats) require(xts) symbols <- c("SPY", "EFA", "EEM", "DBC", "HYG", "GLD", "IEF", "TLT") # get free data from yahoo rets <- list() getSymbols(symbols, src = 'yahoo', from = '1990-12-31') for(i in 1:length(symbols)) { returns <- Return.calculate(Ad(get(symbols[i]))) colnames(returns) <- symbols[i] rets[[i]] <- returns } rets <- na.omit(do.call(cbind, rets)) # 12 month PC rolling PC window, 3 month momentum window pcPlusMinus <- function(rets, pcWindow = 12, momWindow = 3) { ep <- endpoints(rets) wtsPc1Plus <- NULL wtsPc1Minus <- NULL for(i in 1:(length(ep)-pcWindow)) { # get subset of returns returnSubset <- rets[(ep[i]+1):(ep[i+pcWindow])] # perform PCA, get first PC (I.E. pc1) pcs <- prcomp(returnSubset) firstPc <- pcs[[2]][,1] # make sure SPY always has a positive loading # otherwise, SPY and related assets may have negative loadings sometimes # positive loadings other times, and creates chaotic return series if(firstPc['SPY'] < 0) { firstPc <- firstPc * -1 } # create vector for negative values of pc1 wtsMinus <- firstPc * -1 wtsMinus[wtsMinus < 0] <- 0 wtsMinus <- wtsMinus/(sum(wtsMinus)+1e-16) # in case zero weights wtsMinus <- xts(t(wtsMinus), order.by=last(index(returnSubset))) wtsPc1Minus[[i]] <- wtsMinus # create weight vector for positive values of pc1 wtsPlus <- firstPc wtsPlus[wtsPlus < 0] <- 0 wtsPlus <- wtsPlus/(sum(wtsPlus)+1e-16) wtsPlus <- xts(t(wtsPlus), order.by=last(index(returnSubset))) wtsPc1Plus[[i]] <- wtsPlus } # combine positive and negative PC1 weights wtsPc1Minus <- do.call(rbind, wtsPc1Minus) wtsPc1Plus <- do.call(rbind, wtsPc1Plus) # get return of PC portfolios pc1MinusRets <- Return.portfolio(R = rets, weights = wtsPc1Minus) pc1PlusRets <- Return.portfolio(R = rets, weights = wtsPc1Plus) # combine them combine <-na.omit(cbind(pc1PlusRets, pc1MinusRets)) colnames(combine) <- c("PCplus", "PCminus") momEp <- endpoints(combine) momWts <- NULL for(i in 1:(length(momEp)-momWindow)){ momSubset <- combine[(momEp[i]+1):(momEp[i+momWindow])] momentums <- Return.cumulative(momSubset) momWts[[i]] <- xts(momentums==max(momentums), order.by=last(index(momSubset))) } momWts <- do.call(rbind, momWts) out <- Return.portfolio(R = combine, weights = momWts) colnames(out) <- paste("PCwin", pcWindow, "MomWin", momWindow, sep="_") return(list(out, wtsPc1Minus, wtsPc1Plus, combine)) } pcWindows <- c(6, 9, 12, 15, 18) momWindows <- c(1, 3, 6, 10, 12) permutes <- expand.grid(pcWindows, momWindows) stratStats <- function(rets) { stats <- rbind(table.AnnualizedReturns(rets), maxDrawdown(rets)) stats[5,] <- stats[1,]/stats[4,] stats[6,] <- stats[1,]/UlcerIndex(rets) rownames(stats)[4] <- "Worst Drawdown" rownames(stats)[5] <- "Calmar Ratio" rownames(stats)[6] <- "Ulcer Performance Index" return(stats) } results <- NULL for(i in 1:nrow(permutes)) { tmp <- pcPlusMinus(rets = rets, pcWindow = permutes$Var1[i], momWindow = permutes$Var2[i]) results[[i]] <- tmp[[1]] } results <- do.call(cbind, results) stats <- stratStats(results)

After a cursory look at the results, it seems the performance is fairly miserable with my implementation, even by the standards of tactical asset allocation models (the good ones have a Calmar and Sharpe Ratio above 1)

Here are histograms of the Calmar and Sharpe ratios.

These values are generally too low for my liking. Here’s a screenshot of the table of all 25 results.

While my strategy of choosing which portfolio to hold is different from David Varadi’s (momentum instead of whether or not the aggressive portfolio is above its 200-day moving average), there are numerous studies that show these two methods are closely related, yet the results feel starkly different (and worse) compared to his site.

I’d certainly be willing to entertain suggestions as to how to improve the process, which will hopefully create some more meaningful results. I also know that AllocateSmartly expressed interest in implementing something along these lines for their estimable library of TAA strategies, so I thought I’d try to do it and see what results I’d find, which in this case, aren’t too promising.

Thanks for reading.

NOTE: I am networking, and actively seeking a position related to my skill set in either Philadelphia, New York City, or remotely. If you know of a position which may benefit from my skill set, feel free to let me know. You can reach me on my LinkedIn profile here, or email me.

Hi Ilya, thanks for working this up! I saw David Varadi’s article pair, and thought Wow, that’s quite an amazing result. But then I couldn’t replicate his result (I was getting inferior returns), but didn’t pursue it deeply at the time.

So I popped your R script into RStudio tonight, and first got an error on Line 16:

Error in do.call(cbind, rets2) : object ‘rets2’ not found

But changing rets2 to rets runs the script just fine. The next issue is, I’m getting different results. Similar, in an order of magnitude sort of way, but definitely different. Example from my first row of the ‘stats’ print:

> stats

PCwin_6_MomWin_1 PCwin_9_MomWin_1 PCwin_12_MomWin_1 PCwin_15_MomWin_1 PCwin_18_MomWin_1

Annualized Return 0.0794000 0.0996000 0.1162000 -1.00000 0.1228000

Annualized Std Dev 0.1736000 0.1699000 0.1676000 0.44000 0.1462000

Annualized Sharpe (Rf=0%) 0.4571000 0.5862000 0.6933000 -2.27260 0.8400000

Worst Drawdown 0.1922706 0.1920255 0.1905680 1.00000 0.1881902

Calmar Ratio 0.4129597 0.5186812 0.6097563 -1.00000 0.6525313

Ulcer Performance Index 1.2303255 1.6009687 1.8358537 -13.92493 1.7611151

( the whole text is here at pastebin: https://pastebin.com/BW0iU9MV )

Granted, formatting didn’t carry well, but the numbers are there to compare with your screenshot and they’re quite different, with significantly higher ann Std Dev (which probably could not simply result from a few days difference in the ending date of the price data). Any thoughts on this discrepancy? Perhaps post a new version of the script with a prescribed date window for testing? Particularly troubling is the -1.0000 CAGR for PCWin_15_MomWin_1; this happens again with all the other MomWins except for MomWin_6, which gives ordinary looking values (not matching yours, higher Std Dev).

Best, Paul

Paul,

Check your data. There’s no way this strategy would lose all the money, which is what a CAGR of -1 represents. And thanks for the rets2 thing, I thought I converted all of them to rets when I edited the script (the original rets was using Quandl data).

-Ilya

Pingback: Quantocracy's Daily Wrap for 09/17/2018 | Quantocracy

Sure, I get that, but as presented the script isn’t giving reproducible results. MY results reproduce (I get the same crappy results each time I run it, including the notably higher std dev), but they don’t match yours since I guess you used Quandl data. I’ll dive deeper, thanks.

Hi again Ilya. How do YOU import using Quandl? I’m having trouble getting import and formats right. Perhaps please you could stick your Quandl approach in the script, commented out (and sans your API key) so that we could better replicate your findings? Sorry to be a bother.

Oh, I think I know the issue–you need to update your PerformanceAnalytics. Considering you got a CAGR of -1.00, it means you’re using an outdated PerformanceAnalytics that doesn’t automatically reallocate weights that don’t sum up to 1 to a zero asset. You should update that library. I just re-ran my script and got my results.

> packageVersion(“PerformanceAnalytics”)

[1] ‘1.5.2’

Ran again, same thing. Removed “GLD” from the list and it works (no -1.00 CAGR), albeit with higher than expected std dev. Exported the rets with write.table to csv file, and other than some 0 values for returns on some days, nothing looks wrong with the data (max 11.29%, min -8.78%).

Did you rerun with Yahoo data, or Quandl?

Reran it with Yahoo. I would update all your libraries.

I’m baffled, same results again. After updating, and re-copy-pasting your code from above:

> packageVersion(“PerformanceAnalytics”) [1] ‘1.5.2’

> packageVersion(“quantmod”) [1] ‘0.4.13’

> packageVersion(“stats”) [1] ‘3.5.0’

> packageVersion(“xts”) [1] ‘0.11.1’

and R version 3.5.0 (2018-04-23)

I’ve write.table’d my rets out as a csv file (actually, space separated instead of commas) which you can look at to see if it’s any different than yours, and put the whole thing in pastebin: https://pastebin.com/EZT6LUH0

Pingback: Distilled News | Data Analytics & R

Hi Iliya. Running your code produced an error for me too. I think it has to do with the fact that the pc1 vector sometimes shows a positive loading for all symbols meaning that the sum of wtsMinus is zero causing Return.Portfolio to return NaN.

I can make it work using an if statement checking for this issue to occur and making sure the wtsMinus gets a positive loading on “IEF” (the defensive asset) like this.

if(sum(wtsMinus) == 0) {

wtsMinus[“IEF”] <- 1

}

Not sure if this messes up the logic though.

Ilya, the source code you provided loads data from Yahoo starting 1990, but I believe not all tickers (e.g. GLD) start from that year. Do you somehow create artificial data for such tickers prior to their inception?

Thank you!

I use na.omit to truncate to the latest start after combining.

Ilya, from my understanding your simulation is completely different from Varadi’s one. Varadi computes a single PCA for the whole data period (1995-2018) than he calculates the weights for the two portfolios, offense and defense. Now he used the calculated weights (that is known at 2018) to simulate starting from 1995. The difference between your implementation and his, is not the momentum versus 200 days SMA, but he used one single weight calculation which is know at the year 2018, to simulate starting from year 1995.

Oh, if that’s the case, well, then sure, that’s a huge explanation. So then Varadi’s simulation is basically explaining vs. predicting.