# The Marcos Lopez de Prado Hierarchical Risk Parity Algorithm

This post will be about replicating the Marcos Lopez de Prado algorithm from his paper building diversified portfolios that outperform out of sample. This algorithm is one that attempts to make a tradeoff between the classic mean-variance optimization algorithm that takes into account a covariance structure, but is unstable, and an inverse volatility algorithm that ignores covariance, but is more stable.

This is a paper that I struggled with until I ran the code in Python (I have anaconda installed but have trouble installing some packages such as keras because I’m on windows…would love to have someone walk me through setting up a Linux dual-boot), as I assumed that the clustering algorithm actually was able to concretely group every asset into a particular cluster (I.E. ETF 1 would be in cluster 1, ETF 2 in cluster 3, etc.). Turns out, that isn’t at all the case.

Here’s how the algorithm actually works.

First off, it computes a covariance and correlation matrix (created from simulated data in Marcos’s paper). Next, it uses a hierarchical clustering algorithm on a distance-transformed correlation matrix, with the “single” method (I.E. friend of friends–do ?hclust in R to read up more on this). The key output here is the order of the assets from the clustering algorithm. Note well: this is the only relevant artifact of the entire clustering algorithm.

Using this order, it then uses an algorithm that does the following:

Initialize a vector of weighs equal to 1 for each asset.

Then, run the following recursive algorithm:

1) Break the order vector up into two equal-length (or as close to equal length) lists as possible.

2) For each half of the list, compute the inverse variance weights (that is, just the diagonal) of the covariance matrix slice containing the assets of interest, and then compute the variance of the cluster when multiplied by the weights (I.E. w’ * S^2 * w).

3) Then, do a basic inverse-variance weight for the two clusters. Call the weight of cluster 0 alpha = 1-cluster_variance_0/(cluster_variance_0 + cluster_variance_1), and the weight of cluster 1 its complement. (1 – alpha).

4) Multiply all assets in the original vector of weights containing assets in cluster 0 with the weight of cluster 0, and all weights containing assets in cluster 1 with the weight of cluster 1. That is, weights[index_assets_cluster_0] *= alpha, weights[index_assets_cluster_1] *= 1-alpha.

5) Lastly, if the list isn’t of length 1 (that is, not a single asset), repeat this entire process until every asset is its own cluster.

Here is the implementation in R code.

First off, the correlation matrix and the covariance matrix for use in this code, obtained from Marcos Lopez De Prado’s code in the appendix in his paper.

```> covMat
V1           V2           V3           V4           V5          V6           V7           V8           V9          V10
1   1.000647799 -0.003050479  0.010033224 -0.010759689 -0.005036503 0.008762563  0.998201625 -0.001393196 -0.001254522 -0.009365991
2  -0.003050479  1.009021349  0.008613817  0.007334478 -0.009492688 0.013031817 -0.009420720 -0.015346223  1.010520047  1.013334849
3   0.010033224  0.008613817  1.000739363 -0.000637885  0.001783293 1.001574768  0.006385368  0.001922316  0.012902050  0.007997935
4  -0.010759689  0.007334478 -0.000637885  1.011854725  0.005759976 0.000905812 -0.011912269  0.000461894  0.012572661  0.009621670
5  -0.005036503 -0.009492688  0.001783293  0.005759976  1.005835878 0.005606343 -0.009643250  1.008567427 -0.006183035 -0.007942770
6   0.008762563  0.013031817  1.001574768  0.000905812  0.005606343 1.064309825  0.004413960  0.005780148  0.017185396  0.011601336
7   0.998201625 -0.009420720  0.006385368 -0.011912269 -0.009643250 0.004413960  1.058172027 -0.006755374 -0.008099181 -0.016240271
8  -0.001393196 -0.015346223  0.001922316  0.000461894  1.008567427 0.005780148 -0.006755374  1.074833155 -0.011903469 -0.013738378
9  -0.001254522  1.010520047  0.012902050  0.012572661 -0.006183035 0.017185396 -0.008099181 -0.011903469  1.075346677  1.015220126
10 -0.009365991  1.013334849  0.007997935  0.009621670 -0.007942770 0.011601336 -0.016240271 -0.013738378  1.015220126  1.078586686
> corMat
V1           V2           V3           V4           V5          V6           V7           V8           V9          V10
1   1.000000000 -0.003035829  0.010026270 -0.010693011 -0.005020245 0.008490954  0.970062043 -0.001343386 -0.001209382 -0.009015412
2  -0.003035829  1.000000000  0.008572055  0.007258718 -0.009422702 0.012575370 -0.009117080 -0.014736040  0.970108941  0.971348946
3   0.010026270  0.008572055  1.000000000 -0.000633903  0.001777455 0.970485047  0.006205079  0.001853505  0.012437239  0.007698212
4  -0.010693011  0.007258718 -0.000633903  1.000000000  0.005709500 0.000872861 -0.011512172  0.000442908  0.012052964  0.009210090
5  -0.005020245 -0.009422702  0.001777455  0.005709500  1.000000000 0.005418538 -0.009347204  0.969998023 -0.005945165 -0.007625721
6   0.008490954  0.012575370  0.970485047  0.000872861  0.005418538 1.000000000  0.004159261  0.005404237  0.016063910  0.010827955
7   0.970062043 -0.009117080  0.006205079 -0.011512172 -0.009347204 0.004159261  1.000000000 -0.006334331 -0.007592568 -0.015201540
8  -0.001343386 -0.014736040  0.001853505  0.000442908  0.969998023 0.005404237 -0.006334331  1.000000000 -0.011072068 -0.012759610
9  -0.001209382  0.970108941  0.012437239  0.012052964 -0.005945165 0.016063910 -0.007592568 -0.011072068  1.000000000  0.942667300
10 -0.009015412  0.971348946  0.007698212  0.009210090 -0.007625721 0.010827955 -0.015201540 -0.012759610  0.942667300  1.000000000
```

Now, for the implementation.

This reads in the two matrices above and gets the clustering order.

```covMat <- read.csv('cov.csv', header = FALSE)

clustOrder <- hclust(dist(corMat), method = 'single')\$order
```

This is the clustering order:

```> clustOrder
[1]  9  2 10  1  7  3  6  4  5  8
```

Next, the getIVP (get Inverse Variance Portfolio) and getClusterVar functions (note: I’m trying to keep the naming conventions identical to Dr. Lopez’s paper)

```getIVP <- function(covMat) {
# get inverse variance portfolio from diagonal of covariance matrix
invDiag <- 1/diag(as.matrix(covMat))
weights <- invDiag/sum(invDiag)
return(weights)
}

getClusterVar <- function(covMat, cItems) {
# compute cluster variance from the inverse variance portfolio above
covMatSlice <- covMat[cItems, cItems]
weights <- getIVP(covMatSlice)
cVar <- t(weights) %*% as.matrix(covMatSlice) %*% weights
return(cVar)
}
```

Next, my code diverges from the code in the paper, because I do not use the list comprehension structure, but instead opt for a recursive algorithm, as I find that style to be more readable.

One wrinkle to note is the use of the double arrow dash operator, to assign to a variable outside the scope of the recurFun function. I assign the initial weights vector w in the global environment, and update it from within the recurFun function. I am aware that it is a faux pas to create variables in the global environment, but my attempts at creating a temporary environment in which to update the weight vector did not produce the updating mechanism I had hoped to, so a little bit of assistance with refactoring this code would be appreciated.

```getRecBipart <- function(covMat, sortIx) {
# keeping track of weights vector in the global environment
assign("w", value = rep(1, ncol(covMat)), envir = .GlobalEnv)

# run recursion function
recurFun(covMat, sortIx)
return(w)
}

recurFun <- function(covMat, sortIx) {
# get first half of sortIx which is a cluster order
subIdx <- 1:trunc(length(sortIx)/2)

# subdivide ordering into first half and second half
cItems0 <- sortIx[subIdx]
cItems1 <- sortIx[-subIdx]

# compute cluster variances of covariance matrices indexed
# on first half and second half of ordering
cVar0 <- getClusterVar(covMat, cItems0)
cVar1 <- getClusterVar(covMat, cItems1)
alpha <- 1 - cVar0/(cVar0 + cVar1)

# updating weights outside the function using scoping mechanics
w[cItems0] <<- w[cItems0] * alpha
w[cItems1] <<- w[cItems1] * (1-alpha)

# rerun the function on a half if the length of that half is greater than 1
if(length(cItems0) > 1) {
recurFun(covMat, cItems0)
}
if(length(cItems1) > 1) {
recurFun(covMat, cItems1)
}
}
```

Lastly, let’s run the function.

```out <- getRecBipart(covMat, clustOrder)
```

With the result (which matches the paper):

```> out
[1] 0.06999366 0.07592151 0.10838948 0.19029104 0.09719887 0.10191545 0.06618868 0.09095933 0.07123881 0.12790318
```

So, hopefully this democratizes the use of this technology in R. While I have seen a raw Rcpp implementation and one from the Systematic Investor Toolbox, neither of those implementations satisfied me from a “plug and play” perspective. This implementation solves that issue. Anyone here can copy and paste these functions into their environment and immediately make use of one of the algorithms devised by one of the top minds in quantitative finance.

A demonstration in a backtest using this methodology will be forthcoming.

NOTE: I am always interested in networking and full-time opportunities which may benefit from my skills. Furthermore, I am also interested in project work in the volatility ETF trading space. My linkedin profile can be found here.

# Constant Expiry VIX Futures (Using Public Data)

This post will be about creating constant expiry (E.G. a rolling 30-day contract) using VIX settlement data from the CBOE and the spot VIX calculation (from Yahoo finance, or wherever else). Although these may be able to be traded under certain circumstances, this is not always the case (where the desired expiry is shorter than the front month’s time to expiry).

The last time I visited this topic, I created a term structure using publicly available data from the CBOE, along with an external expiry calendar.

The logical next step, of course, is to create constant-expiry contracts, which may or may not be tradable (if your contract expiry is less than 30 days, know that the front month has days in which the time to expiry is more than 30 days).

So here’s where we left off: a way to create a continuous term structure using CBOE settlement VIX data.

So from here, before anything, we need to get VIX data. And while the getSymbols command used to be easier to use, because Yahoo broke its API (what else do you expect from an otherwise-irrelevant, washed-up web 1.0 dinosaur?), it’s not possible to get free Yahoo data at this point in time (in the event that Josh Ulrich doesn’t fix this issue in the near future, I’m open to suggestions for other free sources of data which provide data of reputable quality), so we need to get VIX data from elsewhere (particularly, the CBOE itself, which is a one-stop shop for all VIX-related data…and most likely some other interesting futures as well.)

So here’s how to get VIX data from the CBOE (thanks, all you awesome CBOE people! And a shoutout to all my readers from the CBOE, I’m sure some of you are from there).

```VIX <- fread("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv", skip = 1)
VIXdates <- VIX\$Date
VIX\$Date <- NULL; VIX <- xts(VIX, order.by=as.Date(VIXdates, format = '%m/%d/%Y'))
spotVix <- Cl(VIX)
```

Next, there’s a need for some utility functions to help out with identifying which futures contracts to use for constructing synthetics.

```# find column with greatest days to expiry less than or equal to desired days to expiry
shortDurMinMax <- function(row, daysToExpiry) {
return(max(which(row <= daysToExpiry)))
}

# find column with least days to expiry greater desired days to expiry
longDurMinMax <- function(row, daysToExpiry) {
return(min(which(row > daysToExpiry)))
}

# gets the difference between the two latest contracts (either expiry days or price)
getLastDiff <- function(row) {
indices <- rev(which(!is.na(row)))
out <- row[indices[1]] - row[indices[2]]
return(out)
}

# gets the rightmost non-NA value of a row
getLastValue <- function(row) {
indices <- rev(which(!is.na(row)))
out <- row[indices[1]]
return(out)
}
```

The first two functions are to determine short-duration and long-duration contracts. Simply, provided a row of data and the desired constant time to expiry, the first function finds the contract with a time closest to expiry less than or equal to the desired amount, while the second function does the inverse.

The next two functions are utilized in the scenario of a function whose time to expiry is greater than the expiry of the longest trading contract. Such a synthetic would obviously not be able to be traded, but can be created for the purposes of using as an indicator. The third function gets the last two non-NA values in a row (I.E. the two last prices, the two last times to expiry), and the fourth one simply gets the rightmost non-NA value in a row.

The algorithm to create a synthetic constant-expiry contract/indicator is divided into three scenarios:

One, in which the desired time to expiry of the contract is shorter than the front month, such as a constant 30-day expiry contract, when the front month has more than 30 days to maturity (such as on Nov 17, 2016), at which point, the weight will be the desired time to expiry over the remaining time to expiry in the front month, and the remainder in spot VIX (another asset that cannot be traded, at least conventionally).

The second scenario is one in which the desired time to expiry is longer than the last traded contract. For instance, if the desire was to create a contract
with a year to expiry when the furthest out is eight months, there obviously won’t be data for such a contract. In such a case, the algorithm is to compute the linear slope between the last two available contracts, and add the extrapolated product of the slope multiplied by the time remaining between the desired and the last contract to the price of the last contract.

Lastly, the third scenario (and the most common one under most use cases) is that of the synthetic for which there is both a trading contract that has less time to expiry than the desired constant rate, and one with more time to expiry. In this instance, a matter of linear algebra (included in the comments) denotes the weight of the short expiry contract, which is (desired – expiry_long)/(expiry_short – expiry_long).

The algorithm iterates through all three scenarios, and due to the mechanics of xts automatically sorting by timestamp, one obtains an xts object in order of dates of a synthetic, constant expiry futures contract.

Here is the code for the function.

```
constantExpiry <- function(spotVix, termStructure, expiryStructure, daysToExpiry) {

# Compute synthetics that are too long (more time to expiry than furthest contract)

# can be Inf if no column contains values greater than daysToExpiry (I.E. expiry is 3000 days)
suppressWarnings(longCol <- xts(apply(expiryStructure, 1, longDurMinMax, daysToExpiry), order.by=index(termStructure)))
longCol[longCol == Inf] <- 10

# xts for too long to expiry -- need a NULL for rbinding if empty
tooLong <- NULL

# Extend the last term structure slope an arbitrarily long amount of time for those with too long expiry
tooLongIdx <- index(longCol[longCol==10])
if(length(tooLongIdx) > 0) {
tooLongTermStructure <- termStructure[tooLongIdx]
tooLongExpiryStructure <- expiryStructure[tooLongIdx]

# difference in price/expiry for longest two contracts, use it to compute a slope
priceDiff <- xts(apply(tooLongTermStructure, 1, getLastDiff), order.by = tooLongIdx)
expiryDiff <- xts(apply(tooLongExpiryStructure, 1, getLastDiff), order.by = tooLongIdx)
slope <- priceDiff/expiryDiff

# get longest contract price and compute additional days to expiry from its time to expiry
# I.E. if daysToExpiry is 180 and longest is 120, additionalDaysToExpiry is 60
maxDaysToExpiry <- xts(apply(tooLongExpiryStructure, 1, max, na.rm = TRUE), order.by = tooLongIdx)
longestContractPrice <- xts(apply(tooLongTermStructure, 1, getLastValue), order.by = tooLongIdx)

# add slope multiplied by additional days to expiry to longest contract price
tooLong <- longestContractPrice + additionalDaysToExpiry * slope
}

# compute synthetics that are too short (less time to expiry than shortest contract)

# can be -Inf if no column contains values less than daysToExpiry (I.E. expiry is 5 days)
suppressWarnings(shortCol <- xts(apply(expiryStructure, 1, shortDurMinMax, daysToExpiry), order.by=index(termStructure)))
shortCol[shortCol == -Inf] <- 0

# xts for too short to expiry -- need a NULL for rbinding if empty
tooShort <- NULL

tooShortIdx <- index(shortCol[shortCol==0])

if(length(tooShortIdx) > 0) {
tooShort <- termStructure[,1] * daysToExpiry/expiryStructure[,1] + spotVix * (1 - daysToExpiry/expiryStructure[,1])
tooShort <- tooShort[tooShortIdx]
}

# compute everything in between (when time to expiry is between longest and shortest)

# get unique permutations for contracts that term structure can create
colPermutes <- cbind(shortCol, longCol)
colnames(colPermutes) <- c("short", "long")
colPermutes <- colPermutes[colPermutes\$short > 0,]
colPermutes <- colPermutes[colPermutes\$long < 10,]

regularSynthetics <- NULL

# if we can construct synthetics from regular futures -- someone might enter an extremely long expiry
# so this may not always be the case

if(nrow(colPermutes) > 0) {

# pasting long and short expiries into a single string for easier subsetting
shortLongPaste <- paste(colPermutes\$short, colPermutes\$long, sep="_")
uniqueShortLongPaste <- unique(shortLongPaste)

regularSynthetics <- list()
for(i in 1:length(uniqueShortLongPaste)) {
# get unique permutation of short-expiry and long-expiry contracts
permuteSlice <- colPermutes[which(shortLongPaste==uniqueShortLongPaste[i]),]
expirySlice <- expiryStructure[index(permuteSlice)]
termStructureSlice <- termStructure[index(permuteSlice)]

# what are the parameters?
shortCol <- unique(permuteSlice\$short); longCol <- unique(permuteSlice\$long)

# computations -- some linear algebra

# S/L are weights, ex_S/ex_L are time to expiry
# D is desired constant time to expiry

# S + L = 1
# L = 1 - S
# S + (1-S) = 1
#
# ex_S * S + ex_L * (1-S) = D
# ex_S * S + ex_L - ex_L * S = D
# ex_S * S - ex_L * S = D - ex_L
# S(ex_S - ex_L) = D - ex_L
# S = (D - ex_L)/(ex_S - ex_L)

weightShort <- (daysToExpiry - expirySlice[, longCol])/(expirySlice[, shortCol] - expirySlice[, longCol])
weightLong <- 1 - weightShort
syntheticValue <- termStructureSlice[, shortCol] * weightShort + termStructureSlice[, longCol] * weightLong

regularSynthetics[[i]] <- syntheticValue
}

regularSynthetics <- do.call(rbind, regularSynthetics)
}

out <- rbind(tooShort, regularSynthetics, tooLong)
colnames(out) <- paste0("Constant_", daysToExpiry)
return(out)
}
```

And here’s how to use it:

```constant30 <- constantExpiry(spotVix = vixSpot, termStructure = termStructure, expiryStructure = expiryStructure, daysToExpiry = 30)
constant180 <- constantExpiry(spotVix = vixSpot, termStructure = termStructure, expiryStructure = expiryStructure, daysToExpiry = 180)

constantTermStructure <- cbind(constant30, constant180)

chart.TimeSeries(constantTermStructure, legend.loc = 'topright', main = "Constant Term Structure")
```

With the result:

Which means that between the CBOE data itself, and this function that creates constant expiry futures from CBOE spot and futures prices, one can obtain any futures contract, whether real or synthetic, to use as an indicator for volatility trading strategies. This allows for exploration of a wide variety of volatility trading strategies.

NOTE: I am always interested in networking and hearing about full-time opportunities related to my skill set. My linkedin can be found here.

Furthermore, if you are a volatility ETF/futures trading professional, I am interested in possible project-based collaboration. If you are interested, please contact me.

# Creating a VIX Futures Term Structure In R From Official CBOE Settlement Data

This post will be detailing a process to create a VIX term structure from freely available CBOE VIX settlement data and a calendar of freely obtainable VIX expiry dates. This has applications for volatility trading strategies.

So this post, as has been the usual for quite some time, will not be about a strategy, but rather, a tool that can be used for exploring future strategies. Particularly, volatility strategies–which seems to have been a hot topic on this blog some time ago (and might very well be still, especially since the Volatility Made Simple blog has just stopped tracking open-sourced strategies for the past year).

This post’s topic is the VIX term structure–that is, creating a set of continuous contracts–properly rolled according to VIX contract specifications, rather than a hodgepodge of generic algorithms as found on some other websites. The idea is, as of the settlement of a previous day (or whenever the CBOE actually releases their data), you can construct a curve of contracts, and see if it’s in contango (front month cheaper than next month and so on) or backwardation (front month more expensive than next month, etc.).

The first (and most code-intensive) part of the procedure is fairly simple–map the contracts to an expiration date, then put their settlement dates and times to expiry into two separate xts objects, with one column for each contract.

The expiries text file is simply a collection of copied and pasted expiry dates from this site. It includes the January 2018 expiration date. Here is what it looks like:

```> head(expiries)
V1       V2   V3
1 18  January 2006
2 15 February 2006
3 22    March 2006
4 19    April 2006
5 17      May 2006
6 21     June 2006
```
```require(xts)
require(data.table)

# 06 through 17
years <- c(paste0("0", c(6:9)), as.character(c(10:17)))

# futures months
futMonths <- c("F", "G", "H", "J", "K", "M",
"N", "Q", "U", "V", "X", "Z")

# expiries come from http://www.macroption.com/vix-expiration-calendar/

# convert expiries into dates in R
dateString <- paste(expiries\$V3, expiries\$V2, expiries\$V1, sep = "-")
dates <- as.Date(dateString, format = "%Y-%B-%d")

# map futures months to numbers for dates
monthMaps <- cbind(futMonths, c("01", "02", "03", "04", "05", "06",
"07", "08", "09", "10", "11", "12"))
monthMaps <- data.frame(monthMaps)
colnames(monthMaps) <- c("futureStem", "monthNum")

dates <- data.frame(dates)
dates\$dateMon <- substr(dates\$dates, 1, 7)

contracts <- expand.grid(futMonths, years)
contracts <- paste0(contracts[,1], contracts[,2])
contracts <- c(contracts, "F18")
#contracts <- paste0(stem, contracts, "_VX.csv")

masterlist <- list()
timesToExpiry <- list()
for(i in 1:length(contracts)) {

# obtain data
contract <- contracts[i]
dataFile <- paste0(stem, contract, "_VX.csv")
expiryYear <- paste0("20",substr(contract, 2, 3))
expiryMonth <- monthMaps\$monthNum[monthMaps\$futureStem == substr(contract,1,1)]
expiryDate <- dates\$dates[dates\$dateMon == paste(expiryYear, expiryMonth, sep="-")]

# create dates

# create time to expiration xts
colnames(toExpiry) <- contract
timesToExpiry[[i]] <- toExpiry

# get settlements
colnames(settlement) <- contract
masterlist[[i]] <- settlement
}

# cbind outputs
masterlist <- do.call(cbind, masterlist)
timesToExpiry <- do.call(cbind, timesToExpiry)

# NA out zeroes in settlements
masterlist[masterlist==0] <- NA
```

From there, we need to visualize how many contracts are being traded at once on any given day (I.E. what’s a good steady state number for the term structure)?

```sumNonNA <- function(row) {
return(sum(!is.na(row)))
}

simultaneousContracts <- xts(apply(masterlist, 1, sumNonNA), order.by=index(masterlist))
chart.TimeSeries(simultaneousContracts)
```

The result looks like this:

So, 8 contracts (give or take) at any given point in time. This is confirmed by the end of the master list of settlements.

```dim(masterlist)
tail(masterlist[,135:145])
```
```> dim(masterlist)
[1] 3002  145
> tail(masterlist[,135:145])
H17    J17    K17    M17    N17    Q17    U17    V17    X17    Z17   F18
2017-04-18  NA 14.725 14.325 14.525 15.175 15.475 16.225 16.575 16.875 16.925    NA
2017-04-19  NA 14.370 14.575 14.525 15.125 15.425 16.175 16.575 16.875 16.925    NA
2017-04-20  NA     NA 14.325 14.325 14.975 15.375 16.175 16.575 16.875 16.900    NA
2017-04-21  NA     NA 14.325 14.225 14.825 15.175 15.925 16.350 16.725 16.750    NA
2017-04-24  NA     NA 12.675 13.325 14.175 14.725 15.575 16.025 16.375 16.475 17.00
2017-04-25  NA     NA 12.475 13.125 13.975 14.425 15.225 15.675 16.025 16.150 16.75
```

Using this information, an algorithm can create eight continuous contracts, ranging from front month to eight months out. The algorithm starts at the first day of the master list to the first expiry, then moves between expiry windows, and just appends the front month contract, and the next seven contracts to a list, before rbinding them together, and does the same with the expiry structure.

```termStructure <- list()
expiryStructure <- list()
masterDates <- unique(c(first(index(masterlist)), dates\$dates[dates\$dates %in% index(masterlist)], Sys.Date()-1))
for(i in 1:(length(masterDates)-1)) {
subsetDates <- masterDates[c(i, i+1)]
dateRange <- paste(subsetDates[1], subsetDates[2], sep="::")
subset <- masterlist[dateRange,c(i:(i+7))]
subset <- subset[-1,]
expirySubset <- timesToExpiry[index(subset), c(i:(i+7))]
colnames(subset) <- colnames(expirySubset) <- paste0("C", c(1:8))
termStructure[[i]] <- subset
expiryStructure[[i]] <- expirySubset
}

termStructure <- do.call(rbind, termStructure)
expiryStructure <- do.call(rbind, expiryStructure)
```

Again, one more visualization of when we have a suitable number of contracts:

```simultaneousContracts <- xts(apply(termStructure, 1, sumNonNA), order.by=index(termStructure))
chart.TimeSeries(simultaneousContracts)
```

And in order to preserve the most data, we’ll cut the burn-in period off when we first have 7 contracts trading at once.

```first(index(simultaneousContracts)[simultaneousContracts >= 7])
termStructure <- termStructure["2006-10-23::"]
expiryStructure <- expiryStructure[index(termStructure)]
```

So there you have it–your continuous VIX futures contract term structure, as given by the official CBOE settlements. While some may try and simulate a trading strategy based on these contracts, I myself prefer to use them as indicators or features to a model that would rather buy XIV or VXX.

One last trick, for those that want to visualize things, a way to actually visualize the term structure on any given day, in particular, the most recent one in the term structure.

```plot(t(coredata(last(termStructure))), type = 'b')
```

A clear display of contango.

A post on how to compute synthetic constant-expiry contracts (EG constant 30 day expiry contracts) will be forthcoming in the near future.

NOTE: I am currently interested in networking and full-time positions which may benefit from my skills. I may be contacted at my LinkedIn profile found here.

# Nuts and Bolts of Quantstrat, Part V

This post will be about pre-processing custom indicators in quantstrat–that is, how to add values to your market data that do not arise from the market data itself.

The first four parts of my nuts and bolts of quantstrat were well received. They are even available as a datacamp course. For those that want to catch up to today’s post, I highly recommend the datacamp course.

To motivate this post, the idea is that say you’re using alternative data that isn’t simply derived from a transformation of the market data itself. I.E. you have a proprietary alternative data stream that may predict an asset’s price, you want to employ a cross-sectional ranking system, or any number of things. How do you do this within the context of quantstrat?

The answer is that it’s as simple as binding a new xts to your asset data, as this demonstration will show.

First, let’s get the setup out of the way.

```require(quantstrat)
require(PerformanceAnalytics)

initDate="1990-01-01"
from="2003-01-01"
to="2012-12-31"
options(width=70)

options("getSymbols.warning4.0"=FALSE)

currency('USD')
Sys.setenv(TZ="UTC")

symbols <- 'SPY'

stock(symbols, currency="USD", multiplier=1)
```

Now, we have our non-derived indicator. In this case, it’s a toy example–the value is 1 if the year is odd (I.E. 2003, 2005, 2007, 2009), and 0 if it’s even. We compute that and simply column-bind (cbind) it to the asset data.

```nonDerivedIndicator <- as.numeric(as.character(substr(index(SPY), 1, 4)))%%2 == 1
nonDerivedIndicator <- xts(nonDerivedIndicator, order.by=index(SPY))

SPY <- cbind(SPY, nonDerivedIndicator)
colnames(SPY)[7] = "nonDerivedIndicator"
```

Next, we just have a very simple strategy–buy a share of SPY on odd years, sell on even years. That is, buy when the nonDerivedIndicator column crosses above 0.5 (from 0 to 1), and sell when the opposite occurs.

```strategy.st <- portfolio.st <- account.st <- "nonDerivedData"
rm.strat(strategy.st)
initPortf(portfolio.st, symbols=symbols, initDate=initDate, currency='USD')
initAcct(account.st, portfolios=portfolio.st, initDate=initDate, currency='USD')
initOrders(portfolio.st, initDate=initDate)
strategy(strategy.st, store=TRUE)

arguments = list(column = "nonDerivedIndicator", threshold = 0.5, relationship = "gte", cross = TRUE),
label = "longEntry")

arguments = list(column = "nonDerivedIndicator", threshold = 0.5, relationship = "lte", cross = TRUE),
label = "longExit")

tmp <- applySignals(strategy = strategy.st, mktdata=SPY)

arguments=list(sigcol="longEntry", sigval=TRUE, ordertype="market",
orderside="long", replace=FALSE, prefer="Open", orderqty = 1),
type="enter", path.dep=TRUE)

arguments=list(sigcol="longExit", sigval=TRUE, orderqty="all",
ordertype="market", orderside="long",
replace=FALSE, prefer="Open"),
type="exit", path.dep=TRUE)

#apply strategy
t1 <- Sys.time()
out <- applyStrategy(strategy=strategy.st,portfolios=portfolio.st)
t2 <- Sys.time()
print(t2-t1)

#set up analytics
updatePortf(portfolio.st)
dateRange <- time(getPortfolio(portfolio.st)\$summary)[-1]
updateAcct(portfolio.st,dateRange)
updateEndEq(account.st)
```

And the result:

```chart.Posn(portfolio.st, 'SPY')
```

In conclusion, you can create signals based off of any data in quantstrat. Whether that means volatility ratios, fundamental data, cross-sectional ranking, or whatever proprietary alternative data source you may have access to, this very simple process is how you can use quantstrat to add all of those things to your systematic trading backtest research.

Note: I am always interested in full-time opportunities which may benefit from my skills. I have experience in data analytics, asset management, and systematic trading research. If you know of any such opportunities, do not hesitate to contact me on my LinkedIn, found here.

# Ehlers’s Autocorrelation Periodogram

This post will introduce John Ehlers’s Autocorrelation Periodogram mechanism–a mechanism designed to dynamically find a lookback period. That is, the most common parameter optimized in backtests is the lookback period.

Before beginning this post, I must give credit where it’s due, to one Mr. Fabrizio Maccallini, the head of structured derivatives at Nordea Markets in London. You can find the rest of the repository he did for Dr. John Ehlers’s Cycle Analytics for Traders on his github. I am grateful and honored that such intelligent and experienced individuals are helping to bring some of Dr. Ehlers’s methods into R.

The point of the Ehlers Autocorrelation Periodogram is to dynamically set a period between a minimum and a maximum period length. While I leave the exact explanation of the mechanic to Dr. Ehlers’s book, for all practical intents and purposes, in my opinion, the punchline of this method is to attempt to remove a massive source of overfitting from trading system creation–namely specifying a lookback period.

SMA of 50 days? 100 days? 200 days? Well, this algorithm takes that possibility of overfitting out of your hands. Simply, specify an upper and lower bound for your lookback, and it does the rest. How well it does it is a topic of discussion for those well-versed in the methodologies of electrical engineering (I’m not), so feel free to leave comments that discuss how well the algorithm does its job, and feel free to blog about it as well.

In any case, here’s the original algorithm code, courtesy of Mr. Maccallini:

```AGC <- function(loCutoff = 10, hiCutoff = 48, slope = 1.5) {      accSlope = -slope # acceptableSlope = 1.5 dB   ratio = 10 ^ (accSlope / 20)   if ((hiCutoff - loCutoff) > 0)
factor <-  ratio ^ (2 / (hiCutoff - loCutoff));
return (factor)
}

autocorrPeriodogram <- function(x, period1 = 10, period2 = 48, avgLength = 3) {
# high pass filter
alpha1 <- (cos(sqrt(2) * pi / period2) + sin(sqrt(2) * pi / period2) - 1) / cos(sqrt(2) * pi / period2)
hp <- (1 - alpha1 / 2) ^ 2 * (x - 2 * lag(x) + lag(x, 2))
hp <- hp[-c(1, 2)]
hp <- filter(hp, (1 - alpha1), method = "recursive")
hp <- c(NA, NA, hp)
hp <- xts(hp, order.by = index(x))
# super smoother
a1 <- exp(-sqrt(2) * pi / period1)
b1 <- 2 * a1 * cos(sqrt(2) * pi / period1)
c2 <- b1
c3 <- -a1 * a1
c1 <- 1 - c2 - c3
filt <- c1 * (hp + lag(hp)) / 2
filt <- filter(filt, c(c2, c3), method = "recursive")
filt <- xts(filt, order.by = index(x))
# Pearson correlation for each value of lag
autocorr <- matrix(0, period2, length(filt))
for (lag in 2: period2) {
# Set the average length as M
if (avgLength == 0) M <- lag
else M <- avgLength
autocorr[lag, ] <- runCor(filt, lag(filt, lag), M)
}
autocorr[is.na(autocorr)] <- 0
# Discrete Fourier transform
# Correlate autocorrelation values with the cosine and sine of each period of interest
# The sum of the squares of each value represents relative power at each period
cosinePart <- sinePart <- sqSum <- R <- Pwr <- matrix(0, period2, length(filt))
for (period in period1: period2) {
for (N in 2: period2) {
cosinePart[period, ] = cosinePart[period, ] + autocorr[N, ] * cos(2 * N * pi / period)
sinePart[period, ] = sinePart[period, ] + autocorr[N, ] * sin(2 * N * pi / period)
}
sqSum[period, ] = cosinePart[period, ] ^ 2 + sinePart[period, ] ^ 2
R[period, ] <- EMA(sqSum[period, ] ^ 2, ratio = 0.2)
}
R[is.na(R)] <- 0
# Normalising Power
K <- AGC(period1, period2, 1.5)
maxPwr <- rep(0, length(filt))   for(period in period1: period2) {     for (i in 1: length(filt)) {       if (R[period, i] >= maxPwr[i]) maxPwr[i] <- R[period, i]
else maxPwr[i] <- K * maxPwr[i]
}
}
for(period in 2: period2) {
Pwr[period, ] <- R[period, ] / maxPwr
}
# Compute the dominant cycle using the Center of Gravity of the spectrum
Spx <- Sp <- rep(0, length(filter))
for(period in period1: period2) {
Spx <- Spx + period * Pwr[period, ] * (Pwr[period, ] >= 0.5)
Sp <- Sp + Pwr[period, ] * (Pwr[period, ] >= 0.5)
}
dominantCycle <- Spx / Sp
dominantCycle[is.nan(dominantCycle)] <- 0
dominantCycle <- xts(dominantCycle, order.by=index(x))
dominantCycle <- dominantCycle[dominantCycle > 0]
return(dominantCycle)
#heatmap(Pwr, Rowv = NA, Colv = NA, na.rm = TRUE, labCol = "", add.expr = lines(dominantCycle, col = 'blue'))
}
```

One thing I do notice is that this code uses a loop that says for(i in 1:length(filt)), which is an O(data points) loop, which I view as the plague in R. While I’ve used Rcpp before, it’s been for only the most basic of loops, so this is definitely a place where the algorithm can stand to be improved with Rcpp due to R’s inherent poor looping.

Those interested in the exact logic of the algorithm will, once again, find it in John Ehlers’s Cycle Analytics For Traders book (see link earlier in the post).

Of course, the first thing to do is to test how well the algorithm does what it purports to do, which is to dictate the lookback period of an algorithm.

Let’s run it on some data.

```getSymbols('SPY', from = '1990-01-01')

t1 <- Sys.time()
out <- autocorrPeriodogram(Ad(SPY), period1 = 120, period2 = 252, avgLength = 3)
t2 <- Sys.time() print(t2-t1) ```

And the result:

``` > t1 <- Sys.time() > out <- autocorrPeriodogram(Ad(SPY), period1 = 120, period2 = 252, avgLength = 3) > t2 <- Sys.time() > print(t2-t1)
Time difference of 33.25429 secs
```

Now, what does the algorithm-set lookback period look like?

```plot(out)
```

Let’s zoom in on 2001 through 2003, when the markets went through some upheaval.

```plot(out['2001::2003']
```

In this zoomed-in image, we can see that the algorithm’s estimates seem fairly jumpy.

Here’s some code to feed the algorithm’s estimates of n into an indicator to compute an indicator with a dynamic lookback period as set by Ehlers’s autocorrelation periodogram.

```acpIndicator <- function(x, minPeriod, maxPeriod, indicatorFun = EMA, ...) {
acpOut <- autocorrPeriodogram(x = x, period1 = minPeriod, period2 = maxPeriod)
roundedAcpNs <- round(acpOut, 0) # round to the nearest integer
uniqueVals <- unique(roundedAcpNs) # unique integer values
out <- xts(rep(NA, length(roundedAcpNs)), order.by=index(roundedAcpNs))

for(i in 1:length(uniqueVals)) { # loop through unique values, compute indicator
tmp <- indicatorFun(x, n = uniqueVals[i], ...)
out[roundedAcpNs==uniqueVals[i]] <- tmp[roundedAcpNs==uniqueVals[i]]
}
return(out)
}
```

And here is the function applied with an SMA, to tune between 120 and 252 days.

```ehlersSMA <- acpIndicator(Ad(SPY), 120, 252, indicatorFun = SMA)

```

And the result:

As seen, this algorithm is less consistent than I would like, at least when it comes to using a simple moving average.

For now, I’m going to leave this code here, and let people experiment with it. I hope that someone will find that this indicator is helpful to them.

NOTES: I am always interested in networking/meet-ups in the northeast (Philadelphia/NYC). Furthermore, if you believe your firm will benefit from my skills, please do not hesitate to reach out to me. My linkedin profile can be found here.

Lastly, I am volunteering to curate the R section for books on quantocracy. If you have a book about R that can apply to finance, be sure to let me know about it, so that I can review it and possibly recommend it. Thakn you.

# The Problem With Depmix For Online Regime Prediction

This post will be about attempting to use the Depmix package for online state prediction. While the depmix package performs admirably when it comes to describing the states of the past, when used for one-step-ahead prediction, under the assumption that tomorrow’s state will be identical to today’s, the hidden markov model process found within the package does not perform to expectations.

So, to start off, this post was motivated by Michael Halls-Moore, who recently posted some R code about using the depmixS4 library to use hidden markov models. Generally, I am loath to create posts on topics I don’t feel I have an absolutely front-to-back understanding of, but I’m doing this in the hope of learning from others on how to appropriately do online state-space prediction, or “regime switching” detection, as it may be called in more financial parlance.

While I’ve seen the usual theory of hidden markov models (that is, it can rain or it can be sunny, but you can only infer the weather judging by the clothes you see people wearing outside your window when you wake up), and have worked with toy examples in MOOCs (Udacity’s self-driving car course deals with them, if I recall correctly–or maybe it was the AI course), at the end of the day, theory is only as good as how well an implementation can work on real data.

For this experiment, I decided to take SPY data since inception, and do a full in-sample “backtest” on the data. That is, given that the HMM algorithm from depmix sees the whole history of returns, with this “god’s eye” view of the data, does the algorithm correctly classify the regimes, if the backtest results are any indication?

Here’s the code to do so, inspired by Dr. Halls-Moore’s.

```require(depmixS4)
require(quantmod)
getSymbols('SPY', from = '1990-01-01', src='yahoo', adjust = TRUE)

set.seed(123)

hmm <- depmix(SPY.Adjusted ~ 1, family = gaussian(), nstates = 3, data=spyRets)
hmmfit <- fit(hmm, verbose = FALSE)
post_probs <- posterior(hmmfit)
post_probs <- xts(post_probs, order.by=index(spyRets))
plot(post_probs\$state)
summaryMat <- data.frame(summary(hmmfit))
colnames(summaryMat) <- c("Intercept", "SD")
bullState <- which(summaryMat\$Intercept > 0)
bearState <- which(summaryMat\$Intercept < 0)

hmmRets <- spyRets * lag(post_probs\$state == bullState) - spyRets * lag(post_probs\$state == bearState)
charts.PerformanceSummary(hmmRets)
table.AnnualizedReturns(hmmRets)
```

Essentially, while I did select three states, I noted that anything with an intercept above zero is a bull state, and below zero is a bear state, so essentially, it reduces to two states.

With the result:

```table.AnnualizedReturns(hmmRets)
Annualized Return               0.1355
Annualized Std Dev              0.1434
Annualized Sharpe (Rf=0%)       0.9448
```

So, not particularly terrible. The algorithm works, kind of, sort of, right?

Well, let’s try online prediction now.

```require(DoMC)

dailyHMM <- function(data, nPoints) {
subRets <- data[1:nPoints,]
hmm <- depmix(SPY.Adjusted ~ 1, family = gaussian(), nstates = 3, data = subRets)
hmmfit <- fit(hmm, verbose = FALSE)
post_probs <- posterior(hmmfit)
summaryMat <- data.frame(summary(hmmfit))
colnames(summaryMat) <- c("Intercept", "SD")
bullState <- which(summaryMat\$Intercept > 0)
bearState <- which(summaryMat\$Intercept < 0)
if(last(post_probs\$state) %in% bullState) {
state <- xts(1, order.by=last(index(subRets)))
} else if (last(post_probs\$state) %in% bearState) {
state <- xts(-1, order.by=last(index(subRets)))
} else {
state <- xts(0, order.by=last(index(subRets)))
}
colnames(state) <- "State"
return(state)
}

# took 3 hours in parallel
t1 <- Sys.time()
set.seed(123)
registerDoMC((detectCores() - 1))
states <- foreach(i = 500:nrow(spyRets), .combine=rbind) %dopar% {
dailyHMM(data = spyRets, nPoints = i)
}
t2 <- Sys.time()
print(t2-t1)
```

So what I did here was I took an expanding window, starting from 500 days since SPY’s inception, and kept increasing it, by one day at a time. My prediction, was, trivially enough, the most recent day, using a 1 for a bull state, and a -1 for a bear state. I ran this process in parallel (on a linux cluster, because windows’s doParallel library seems to not even know that certain packages are loaded, and it’s more messy), and the first big issue is that this process took about three hours on seven cores for about 23 years of data. Not exactly encouraging, but computing time isn’t expensive these days.

So let’s see if this process actually works.

First, let’s test if the algorithm does what it’s actually supposed to do and use one day of look-ahead bias (that is, the algorithm tells us the state at the end of the day–how correct is it even for that day?).

```
onlineRets <- spyRets * states
charts.PerformanceSummary(onlineRets)
table.AnnualizedReturns(onlineRets)

```

With the result:

```> table.AnnualizedReturns(onlineRets)
Annualized Return               0.2216
Annualized Std Dev              0.1934
Annualized Sharpe (Rf=0%)       1.1456
```

So, allegedly, the algorithm seems to do what it was designed to do, which is to classify a state for a given data set. Now, the most pertinent question: how well do these predictions do even one day ahead? You’d think that state space predictions would be parsimonious from day to day, given the long history, correct?

```
onlineRets <- spyRets * lag(states)
charts.PerformanceSummary(onlineRets)
table.AnnualizedReturns(onlineRets)

```

With the result:

```> table.AnnualizedReturns(onlineRets)
Annualized Return               0.0172
Annualized Std Dev              0.1939
Annualized Sharpe (Rf=0%)       0.0888
```

That is, without the lookahead bias, the state space prediction algorithm is atrocious. Why is that?

Well, here’s the plot of the states:

In short, the online hmm algorithm in the depmix package seems to change its mind very easily, with obvious (negative) implications for actual trading strategies.

So, that wraps it up for this post. Essentially, the main message here is this: there’s a vast difference between loading doing descriptive analysis (AKA “where have you been, why did things happen”) vs. predictive analysis (that is, “if I correctly predict the future, I get a positive payoff”). In my opinion, while descriptive statistics have their purpose in terms of explaining why a strategy may have performed how it did, ultimately, we’re always looking for better prediction tools. In this case, depmix, at least in this “out-of-the-box” demonstration does not seem to be the tool for that.

If anyone has had success with using depmix (or other regime-switching algorithm in R) for prediction, I would love to see work that details the procedure taken, as it’s an area I’m looking to expand my toolbox into, but don’t have any particular good leads. Essentially, I’d like to think of this post as me describing my own experiences with the package.

NOTE: On Oct. 5th, I will be in New York City. On Oct. 6th, I will be presenting at The Trading Show on the Programming Wars panel.

NOTE: My current analytics contract is up for review at the end of the year, so I am officially looking for other offers as well. If you have a full-time role which may benefit from the skills you see on my blog, please get in touch with me. My linkedin profile can be found here.

# An Introduction to Portfolio Component Conditional Value At Risk

This post will introduce component conditional value at risk mechanics found in PerformanceAnalytics from a paper written by Brian Peterson, Kris Boudt, and Peter Carl. This is a mechanism that is an easy-to-call mechanism for computing component expected shortfall in asset returns as they apply to a portfolio. While the exact mechanics are fairly complex, the upside is that the running time is nearly instantaneous, and this method is a solid tool for including in asset allocation analysis.

For those interested in an in-depth analysis of the intuition of component conditional value at risk, I refer them to the paper written by Brian Peterson, Peter Carl, and Kris Boudt.

Essentially, here’s the idea: all assets in a given portfolio have a marginal contribution to its total conditional value at risk (also known as expected shortfall)–that is, the expected loss when the loss surpasses a certain threshold. For instance, if you want to know your 5% expected shortfall, then it’s the average of the worst 5 returns per 100 days, and so on. For returns using daily resolution, the idea of expected shortfall may sound as though there will never be enough data in a sufficiently fast time frame (on one year or less), the formula for expected shortfall in the PerformanceAnalytics defaults to an approximation calculation using a Cornish-Fisher expansion, which delivers very good results so long as the p-value isn’t too extreme (that is, it works for relatively sane p values such as the 1%-10% range).

Component Conditional Value at Risk has two uses: first off, given no input weights, it uses an equal weight default, which allows it to provide a risk estimate for each individual asset without burdening the researcher to create his or her own correlation/covariance heuristics. Secondly, when provided with a set of weights, the output changes to reflect the contribution of various assets in proportion to those weights. This means that this methodology works very nicely with strategies that exclude assets based on momentum, but need a weighting scheme for the remaining assets. Furthermore, using this methodology also allows an ex-post analysis of risk contribution to see which instrument contributed what to risk.

First, a demonstration of how the mechanism works using the edhec data set. There is no strategy here, just a demonstration of syntax.

```require(quantmod)
require(PerformanceAnalytics)

data(edhec)

tmp &lt;- CVaR(edhec, portfolio_method = &quot;component&quot;)
```

This will assume an equal-weight contribution from all of the funds in the edhec data set.

So tmp is the contribution to expected shortfall from each of the various edhec managers over the entire time period. Here’s the output:

```\$MES
[,1]
[1,] 0.03241585

\$contribution
Convertible Arbitrage             CTA Global  Distressed Securities       Emerging Markets  Equity Market Neutral
0.0074750513          -0.0028125166           0.0039422674           0.0069376579           0.0008077760
Event Driven Fixed Income Arbitrage           Global Macro      Long/Short Equity       Merger Arbitrage
0.0037114666           0.0043125937           0.0007173036           0.0036152960           0.0013693293
Relative Value          Short Selling         Funds of Funds
0.0037650911          -0.0048178690           0.0033924063

\$pct_contrib_MES
Convertible Arbitrage             CTA Global  Distressed Securities       Emerging Markets  Equity Market Neutral
0.23059863            -0.08676361             0.12161541             0.21402052             0.02491917
Event Driven Fixed Income Arbitrage           Global Macro      Long/Short Equity       Merger Arbitrage
0.11449542             0.13303965             0.02212817             0.11152864             0.04224258
Relative Value          Short Selling         Funds of Funds
0.11614968            -0.14862694             0.10465269
```

The salient part of this is the percent contribution (the last output). Notice that it can be negative, meaning that certain funds gain when others lose. At least, this was the case over the current data set. These assets diversify a portfolio and actually lower expected shortfall.

```&gt; tmp2 &lt;- CVaR(edhec, portfolio_method = &quot;component&quot;, weights = c(rep(.1, 10), rep(0,3)))
&gt; tmp2
\$MES
[,1]
[1,] 0.04017453

\$contribution
Convertible Arbitrage             CTA Global  Distressed Securities       Emerging Markets  Equity Market Neutral
0.0086198045          -0.0046696862           0.0058778855           0.0109152240           0.0009596620
Event Driven Fixed Income Arbitrage           Global Macro      Long/Short Equity       Merger Arbitrage
0.0054824325           0.0050398011           0.0009638502           0.0044568333           0.0025287234
Relative Value          Short Selling         Funds of Funds
0.0000000000           0.0000000000           0.0000000000

\$pct_contrib_MES
Convertible Arbitrage             CTA Global  Distressed Securities       Emerging Markets  Equity Market Neutral
0.21455894            -0.11623499             0.14630875             0.27169512             0.02388732
Event Driven Fixed Income Arbitrage           Global Macro      Long/Short Equity       Merger Arbitrage
0.13646538             0.12544767             0.02399157             0.11093679             0.06294345
Relative Value          Short Selling         Funds of Funds
0.00000000             0.00000000             0.00000000
```

In this case, I equally weighted the first ten managers in the edhec data set, and put zero weight in the last three. Furthermore, we can see what happens when the weights are not equal.

```&gt; tmp3 &lt;- CVaR(edhec, portfolio_method = &quot;component&quot;, weights = c(.2, rep(.1, 9), rep(0,3)))
&gt; tmp3
\$MES
[,1]
[1,] 0.04920372

\$contribution
Convertible Arbitrage             CTA Global  Distressed Securities       Emerging Markets  Equity Market Neutral
0.0187406982          -0.0044391078           0.0057235762           0.0102706768           0.0007710434
Event Driven Fixed Income Arbitrage           Global Macro      Long/Short Equity       Merger Arbitrage
0.0051541429           0.0055944367           0.0008028457           0.0044085104           0.0021768951
Relative Value          Short Selling         Funds of Funds
0.0000000000           0.0000000000           0.0000000000

\$pct_contrib_MES
Convertible Arbitrage             CTA Global  Distressed Securities       Emerging Markets  Equity Market Neutral
0.38087972            -0.09021895             0.11632406             0.20873782             0.01567043
Event Driven Fixed Income Arbitrage           Global Macro      Long/Short Equity       Merger Arbitrage
0.10475109             0.11369947             0.01631677             0.08959710             0.04424249
Relative Value          Short Selling         Funds of Funds
0.00000000             0.00000000             0.00000000
```

This time, notice that as the weight increased in the convertible arb manager, so too did his contribution to maximum expected shortfall.

For a future backtest, I would like to make some data requests. I would like to use the universe found in Faber’s Global Asset Allocation book. That said, the simulations in that book go back to 1972, and I was wondering if anyone out there has daily returns for those assets/indices. While some ETFs go back into the early 2000s, there are some that start rather late such as DBC (commodities, early 2006), GLD (gold, early 2004), BWX (foreign bonds, late 2007), and FTY (NAREIT, early 2007). As an eight-year backtest would be a bit short, I was wondering if anyone had data with more history.

One other thing, I will in New York for the trading show, and speaking on the “programming wars” panel on October 6th.

NOTE: While I am currently contracting, I am also looking for a permanent position which can benefit from my skills for when my current contract ends. If you have or are aware of such an opening, I will be happy to speak with you.

# A Return.Portfolio Wrapper to Automate Harry Long Seeking Alpha Backtests

This post will cover a function to simplify creating Harry Long (of Houston) type rebalancing strategies from SeekingAlpha for interested readers. As Harry Long has stated, most, if not all of his strategies are more for demonstrative purposes rather than actual recommended investments.

So, since Harry Long has been posting some more articles on Seeknig Alpha, I’ve had a reader or two ask me to analyze his strategies (again). Instead of doing that, however, I’ll simply put this tool here, which is a wrapper that automates the acquisition of data and simulates portfolio rebalancing with one line of code.

Here’s the tool.

```require(quantmod)
require(PerformanceAnalytics)

LongSeeker &lt;- function(symbols, weights, rebalance_on = "years",
displayStats = TRUE, outputReturns = FALSE) {
getSymbols(symbols, src='yahoo', from = '1990-01-01')
prices &lt;- list()
for(i in 1:length(symbols)) {
if(symbols[i] == "ZIV") {
prices[[i]] &lt;- Cl(ziv)
} else if (symbols[i] == "VXX") {
destfile="vxx.txt")
prices[[i]] &lt;- Cl(vxx)
}
else {
}
}
prices &lt;- do.call(cbind, prices)
prices &lt;- na.locf(prices)
returns &lt;- na.omit(Return.calculate(prices))

returns\$zeroes &lt;- 0
weights &lt;- c(weights, 1-sum(weights))
stratReturns &lt;- Return.portfolio(R = returns, weights = weights, rebalance_on = rebalance_on)

if(displayStats) {
stats &lt;- rbind(table.AnnualizedReturns(stratReturns), maxDrawdown(stratReturns), CalmarRatio(stratReturns))
rownames(stats)[4] &lt;- "Max Drawdown"
print(stats)
charts.PerformanceSummary(stratReturns)
}

if(outputReturns) {
return(stratReturns)
}
}

It fetches the data for you (usually from Yahoo, but a big thank you to Mr. Helumth Vollmeier in the case of ZIV and VXX), and has the option of either simply displaying an equity curve and some statistics (CAGR, annualized standard dev, Sharpe, max Drawdown, Calmar), or giving you the return stream as an output if you wish to do more analysis in R.
Here’s an example of simply getting the statistics, with an 80% XLP/SPLV (they’re more or less interchangeable) and 20% TMF (aka 60% TLT, so an 80/60 portfolio), from one of Harry Long’s articles.
LongSeeker(c("XLP", "TLT"), c(.8, .6))

Statistics:

portfolio.returns
Annualized Return                 0.1321000
Annualized Std Dev                0.1122000
Annualized Sharpe (Rf=0%)         1.1782000
Max Drawdown                      0.2330366
Calmar Ratio                      0.5670285

Equity curve:

Nothing out of the ordinary of what we might expect from a balanced equity/bonds portfolio. Generally does well, has its largest drawdown in the financial crisis, and some other bumps in the road, but overall, I’d think a fairly vanilla “set it and forget it” sort of thing.
And here would be the way to get the stream of individual daily returns, assuming you wanted to rebalance these two instruments weekly, instead of yearly (as is the default).
tmp &lt;- LongSeeker(c("XLP", "TLT"), c(.8, .6), rebalance_on="weeks",
displayStats = FALSE, outputReturns = TRUE)

And now let’s get some statistics.
table.AnnualizedReturns(tmp)
maxDrawdown(tmp)
CalmarRatio(tmp)

Which give:
&gt; table.AnnualizedReturns(tmp)
portfolio.returns
Annualized Return                    0.1328
Annualized Std Dev                   0.1137
Annualized Sharpe (Rf=0%)            1.1681
&gt; maxDrawdown(tmp)
[1] 0.2216417
&gt; CalmarRatio(tmp)
portfolio.returns
Calmar Ratio         0.5990087

Turns out, moving the rebalancing from annually to weekly didn’t have much of an effect here (besides give a bunch of money to your broker, if you factored in transaction costs, which this doesn’t).
So, that’s how this tool works. The results, of course, begin from the latest instrument’s inception. The trick, in my opinion, is to try and find proxy substitutes with longer histories for newer ETFs that are simply leveraged ETFs, such as using a 60% weight in TLT with an 80% weight in XLP instead of a 20% weight in TMF with 80% allocation in SPLV.
For instance, here are some proxies:
SPXL = XLP
SPXL/UPRO = SPY * 3
TMF = TLT * 3
That said, I’ve worked with Harry Long before, and he develops more sophisticated strategies behind the scenes, so I’d recommend that SeekingAlpha readers take his publicly released strategies as concept demonstrations, as opposed to fully-fledged investment ideas, and contact Mr. Long himself about more customized, private solutions for investment institutions if you are so interested.
NOTE: I am currently in the northeast. While I am currently contracting, I am interested in networking with individuals or firms with regards to potential collaboration opportunities.
```

# How To Compute Turnover With Return.Portfolio in R

This post will demonstrate how to take into account turnover when dealing with returns-based data using PerformanceAnalytics and the Return.Portfolio function in R. It will demonstrate this on a basic strategy on the nine sector SPDRs.

So, first off, this is in response to a question posed by one Robert Wages on the R-SIG-Finance mailing list. While there are many individuals out there with a plethora of questions (many of which can be found to be demonstrated on this blog already), occasionally, there will be an industry veteran, a PhD statistics student from Stanford, or other very intelligent individual that will ask a question on a topic that I haven’t yet touched on this blog, which will prompt a post to demonstrate another technical aspect found in R. This is one of those times.

So, this demonstration will be about computing turnover in returns space using the PerformanceAnalytics package. Simply, outside of the PortfolioAnalytics package, PerformanceAnalytics with its Return.Portfolio function is the go-to R package for portfolio management simulations, as it can take a set of weights, a set of returns, and generate a set of portfolio returns for analysis with the rest of PerformanceAnalytics’s functions.

Again, the strategy is this: take the 9 three-letter sector SPDRs (since there are four-letter ETFs now), and at the end of every month, if the adjusted price is above its 200-day moving average, invest into it. Normalize across all invested sectors (that is, 1/9th if invested into all 9, 100% into 1 if only 1 invested into, 100% cash, denoted with a zero return vector, if no sectors are invested into). It’s a simple, toy strategy, as the strategy isn’t the point of the demonstration.

Here’s the basic setup code:

```require(TTR)
require(PerformanceAnalytics)
require(quantmod)

symbols <- c("XLF", "XLK", "XLU", "XLE", "XLP", "XLF", "XLB", "XLV", "XLY")
getSymbols(symbols, src='yahoo', from = '1990-01-01-01')
prices <- list()
for(i in 1:length(symbols)) {
prices[[i]] <- tmp
}
prices <- do.call(cbind, prices)

# Our signal is a simple adjusted price over 200 day SMA
signal <- prices > xts(apply(prices, 2, SMA, n = 200), order.by=index(prices))

# equal weight all assets with price above SMA200
returns <- Return.calculate(prices)
weights <- signal/(rowSums(signal)+1e-16)

# With Return.portfolio, need all weights to sum to 1
weights\$zeroes <- 1 - rowSums(weights)
returns\$zeroes <- 0

monthlyWeights <- na.omit(weights[endpoints(weights, on = 'months'),])
weights <- na.omit(weights)
returns <- na.omit(returns)
```

So, get the SPDRs, put them together, compute their returns, generate the signal, and create the zero vector, since Return.Portfolio treats weights less than 1 as a withdrawal, and weights above 1 as the addition of more capital (big FYI here).

Now, here’s how to compute turnover:

```out <- Return.portfolio(R = returns, weights = monthlyWeights, verbose = TRUE)
beginWeights <- out\$BOP.Weight
endWeights <- out\$EOP.Weight
txns <- beginWeights - lag(endWeights)
monthlyTO <- xts(rowSums(abs(txns[,1:9])), order.by=index(txns))
plot(monthlyTO)
```

So, the trick is this: when you call Return.portfolio, use the verbose = TRUE option. This creates several objects, among them returns, BOP.Weight, and EOP.Weight. These stand for Beginning Of Period Weight, and End Of Period Weight.

The way that turnover is computed is simply the difference between how the day’s return moves the allocated portfolio from its previous ending point to where that portfolio actually stands at the beginning of next period. That is, the end of period weight is the beginning of period drift after taking into account the day’s drift/return for that asset. The new beginning of period weight is the end of period weight plus any transacting that would have been done. Thus, in order to find the actual transactions (or turnover), one subtracts the previous end of period weight from the beginning of period weight.

This is what such transactions look like for this strategy.

Something we can do with such data is take a one-year rolling turnover, accomplished with the following code:

```yearlyTO <- runSum(monthlyTO, 252)
plot(yearlyTO, main = "running one year turnover")
```

It looks like this:

This essentially means that one year’s worth of two-way turnover (that is, if selling an entirely invested portfolio is 100% turnover, and buying an entirely new set of assets is another 100%, then two-way turnover is 200%) is around 800% at maximum. That may be pretty high for some people.

Now, here’s the application when you penalize transaction costs at 20 basis points per percentage point traded (that is, it costs 20 cents to transact \$100).

```txnCosts <- monthlyTO * -.0020
retsWithTxnCosts <- out\$returns + txnCosts
compare <- na.omit(cbind(out\$returns, retsWithTxnCosts))
colnames(compare) <- c("NoTxnCosts", "TxnCosts20BPs")
charts.PerformanceSummary(compare)
table.AnnualizedReturns(compare)
```

And the result:

```
NoTxnCosts TxnCosts20BPs
Annualized Return             0.0587        0.0489
Annualized Std Dev            0.1554        0.1553
Annualized Sharpe (Rf=0%)     0.3781        0.3149
```

So, at 20 basis points on transaction costs, that takes about one percent in returns per year out of this (admittedly, terrible) strategy. This is far from negligible.

So, that is how you actually compute turnover and transaction costs. In this case, the transaction cost model was very simple. However, given that Return.portfolio returns transactions at the individual asset level, one could get as complex as they would like with modeling the transaction costs.

NOTE: I will be giving a lightning talk at R/Finance, so for those attending, you’ll be able to find me there.

# Create Amazing Looking Backtests With This One Wrong–I Mean Weird–Trick! (And Some Troubling Logical Invest Results)

This post will outline an easy-to-make mistake in writing vectorized backtests–namely in using a signal obtained at the end of a period to enter (or exit) a position in that same period. The difference in results one obtains is massive.

Today, I saw two separate posts from Alpha Architect and Mike Harris both referencing a paper by Valeriy Zakamulin on the fact that some previous trend-following research by Glabadanidis was done with shoddy results, and that Glabadanidis’s results were only reproducible through instituting lookahead bias.

The following code shows how to reproduce this lookahead bias.

First, the setup of a basic moving average strategy on the S&P 500 index from as far back as Yahoo data will provide.

```require(quantmod)
require(xts)
require(TTR)
require(PerformanceAnalytics)

getSymbols('^GSPC', src='yahoo', from = '1900-01-01')
monthlyGSPC <- Ad(GSPC)[endpoints(GSPC, on = 'months')]

# change this line for signal lookback
movAvg <- SMA(monthlyGSPC, 10)

signal <- monthlyGSPC > movAvg
gspcRets <- Return.calculate(monthlyGSPC)
```

And here is how to institute the lookahead bias.

```lookahead <- signal * gspcRets
correct <- lag(signal) * gspcRets
```

These are the “results”:

```compare <- na.omit(cbind(gspcRets, lookahead, correct))
colnames(compare) <- c("S&P 500", "Lookahead", "Correct")
charts.PerformanceSummary(compare)
rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare))
logRets <- log(cumprod(1+compare))
chart.TimeSeries(logRets, legend.loc='topleft')
```

Of course, this equity curve is of no use, so here’s one in log scale.

As can be seen, lookahead bias makes a massive difference.

Here are the numerical results:

```                            S&P 500  Lookahead   Correct
Annualized Return         0.0740000 0.15550000 0.0695000
Annualized Std Dev        0.1441000 0.09800000 0.1050000
Annualized Sharpe (Rf=0%) 0.5133000 1.58670000 0.6623000
Worst Drawdown            0.5255586 0.08729914 0.2699789
Calmar Ratio              0.1407286 1.78119192 0.2575219
```

Again, absolutely ridiculous.

Note that when using Return.Portfolio (the function in PerformanceAnalytics), that package will automatically give you the next period’s return, instead of the current one, for your weights. However, for those writing “simple” backtests that can be quickly done using vectorized operations, an off-by-one error can make all the difference between a backtest in the realm of reasonable, and pure nonsense. However, should one wish to test for said nonsense when faced with impossible-to-replicate results, the mechanics demonstrated above are the way to do it.

Now, onto other news: I’d like to thank Gerald M for staying on top of one of the Logical Invest strategies–namely, their simple global market rotation strategy outlined in an article from an earlier blog post.

Up until March 2015 (the date of the blog post), the strategy had performed well. However, after said date?

It has been a complete disaster, which, in hindsight, was evident when I passed it through the hypothesis-driven development framework process I wrote about earlier.

So, while there has been a great deal written about not simply throwing away a strategy because of short-term underperformance, and that anomalies such as momentum and value exist because of career risk due to said short-term underperformance, it’s never a good thing when a strategy creates historically large losses, particularly after being published in such a humble corner of the quantitative financial world.

In any case, this was a post demonstrating some mechanics, and an update on a strategy I blogged about not too long ago.