# Introducing Stepwise Correlation Rank

So in the last post, I attempted to replicate the Flexible Asset Allocation paper. I’d like to offer a thanks to Pat of Intelligent Trading Tech (not updated recently, hopefully this will change) for helping me corroborate the results, so that I have more confidence there isn’t an error in my code.

One of the procedures the authors of the FAA paper used is a correlation rank, which I interpreted as the average correlation of each security to the others.

The issue, pointed out to me in a phone conversation I had with David Varadi is that when considering correlation, shouldn’t the correlations the investor is concerned about be between instruments within the portfolio, as opposed to simply all the correlations, including to instruments not in the portfolio? To that end, when selecting assets (or possibly features in general), conceptually, it makes more sense to select in a stepwise fashion–that is, start off at a subset of the correlation matrix, and then rank assets in order of their correlation to the heretofore selected assets, as opposed to all of them. This was explained in Mr. Varadi’s recent post.

Here’s a work in progress function I wrote to formally code this idea:

```stepwiseCorRank <- function(corMatrix, startNames=NULL, stepSize=1, bestHighestRank=FALSE) {
#edge cases
if(dim(corMatrix) == 1) {
return(corMatrix)
} else if (dim(corMatrix) == 2) {
ranks <- c(1.5, 1.5)
names(ranks) <- colnames(corMatrix)
return(ranks)
}
if(is.null(startNames)) {
corSums <- rowSums(corMatrix)
corRanks <- rank(corSums)
startNames <- names(corRanks)[corRanks <= stepSize]
}
nameList <- list()
nameList[] <- startNames
rankList <- list()
rankCount <- 1
rankList[] <- rep(rankCount, length(startNames))
rankedNames <- do.call(c, nameList)

while(length(rankedNames) < nrow(corMatrix)) {
rankCount <- rankCount+1
subsetCor <- corMatrix[, rankedNames]
if(class(subsetCor) != "numeric") {
subsetCor <- subsetCor[!rownames(corMatrix) %in% rankedNames,]
if(class(subsetCor) != "numeric") {
corSums <- rowSums(subsetCor)
corSumRank <- rank(corSums)
lowestCorNames <- names(corSumRank)[corSumRank <= stepSize]
nameList[[rankCount]] <- lowestCorNames
rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
} else { #1 name remaining
nameList[[rankCount]] <- rownames(corMatrix)[!rownames(corMatrix) %in% names(subsetCor)]
rankList[[rankCount]] <- rankCount
}
} else {  #first iteration, subset on first name
subsetCorRank <- rank(subsetCor)
lowestCorNames <- names(subsetCorRank)[subsetCorRank <= stepSize]
nameList[[rankCount]] <- lowestCorNames
rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
}
rankedNames <- do.call(c, nameList)
}

ranks <- do.call(c, rankList)
names(ranks) <- rankedNames
if(bestHighestRank) {
ranks <- 1+length(ranks)-ranks
}
return(ranks)
}
```

So the way the function works is that it takes in a correlation matrix, a starting name (if provided), and a step size (that is, how many assets to select per step, so that the process doesn’t become extremely long when dealing with larger amounts of assets/features). Then, it iterates–subset the correlation matrix on the starting name, and find the minimum value, and add it to a list of already-selected names. Next, subset the correlation matrix columns on the selected names, and the rows on the not selected names, and repeat, until all names have been accounted for. Due to R’s little habit of wiping out labels when a matrix becomes a vector, I had to write some special case code, which is the reason for two nested if/else statements (the first one being for the first column subset, and the second being for when there’s only one row remaining).

Also, if there’s an edge case (1 or 2 securities), then there is some functionality to handle those trivial cases.

Here’s a test script I wrote to test this function out:

```require(PerformanceAnalytics)
require(quantmod)

#mid 1997 to end of 2012
getSymbols(mutualFunds, from="1997-06-30", to="2012-12-31")
tmp <- list()
for(fund in mutualFunds) {
}

#always use a list hwne intending to cbind/rbind large quantities of objects
adPrices <- do.call(cbind, args = tmp)

corMat <- cor(subset)

tmp <- list()
for(i in 1:length(mutualFunds)) {
rankRow <- stepwiseCorRank(corMat, startNames=mutualFunds[i])
tmp[[i]] <- rankRow
}
rankDemo <- do.call(rbind, tmp)
rownames(rankDemo) <- mutualFunds
origRank <- rank(rowSums(corMat))
rankDemo <- rbind(rankDemo, origRank)
rownames(rankDemo) <- "Original (VBMFX)"

heatmap(-rankDemo, Rowv=NA, Colv=NA, col=heat.colors(8), margins=c(6,6))
```

Essentially, using the 2012 year of returns for the 7 FAA mutual funds, I compared how different starting securities changed the correlation ranking sequence.

Here are the results:

```               VTSMX FDIVX VEIEX VFISX VBMFX QRAAX VGSIX
VTSMX              1     6     7     4     2     3     5
FDIVX              6     1     7     4     2     5     3
VEIEX              6     7     1     4     2     3     5
VFISX              2     6     7     1     3     4     5
VBMFX              2     6     7     4     1     3     5
QRAAX              5     6     7     4     2     1     3
VGSIX              5     6     7     4     2     3     1
Non-Sequential     5     6     7     2     1     3     4
```

In short, the algorithm is rather robust to starting security selection, at least judging by this small example. However, comparing VBMFX start to the non-sequential ranking, we see that VFISX changes from rank 2 in the non-sequential to rank 4, with VTSMX going from rank 5 to rank 2. From an intuitive perspective, this makes sense, as both VBMFX and VFISX are bond funds, which have a low correlation with the other 5 equity-based mutual funds, but a higher correlation with each other, thus signifying that the algorithm seems to be working as intended, at least insofar as this small example demonstrates. Here’s a heatmap to demonstrate this in visual form. The ranking order (starting security) is the vertical axis, and the horizontal are the ranks, from white being first, to red being last. Notice once again that the ranking orders are robust in general (consider each column of colors descending), but each particular ranking order is unique.

So far, this code still has to be tested in terms of its applications to portfolio management and asset allocation, but for those interested in such an idea, it’s my hope that this provides a good reference point.

# An Attempt At Replicating Flexible Asset Allocation (FAA)

Since the people at Alpha Architect were so kind as to feature my blog in a post, I figured I’d investigate an idea that I first found out about from their site–namely, flexible asset allocation. Here’s the SSRN, and the corresponding Alpha Architect post.

Here’s the script I used for this replication, which is completely self-contained.

```require(PerformanceAnalytics)
require(quantmod)

mutualFunds <- c("VTSMX", #Vanguard Total Stock Market Index
"FDIVX", #Fidelity Diversified International Fund
"VEIEX", #Vanguard Emerging Markets Stock Index Fund
"VFISX", #Vanguard Short-Term Treasury Fund
"VBMFX", #Vanguard Total Bond Market Index Fund
"QRAAX", #Oppenheimer Commodity Strategy Total Return
"VGSIX" #Vanguard REIT Index Fund
)

#mid 1997 to end of 2012
getSymbols(mutualFunds, from="1997-06-30", to="2012-12-31")
tmp <- list()
for(fund in mutualFunds) {
}

#always use a list hwne intending to cbind/rbind large quantities of objects
adPrices <- do.call(cbind, args = tmp)

FAAreturns <- function(prices, monthLookback = 4,
weightMom=1, weightVol=.5, weightCor=.5,
riskFreeName="VFISX", bestN=3) {

returns <- Return.calculate(prices)
monthlyEps <- endpoints(prices, on = "months")
riskFreeCol <- grep(riskFreeName, colnames(prices))
tmp <- list()
dates <- list()

for(i in 2:(length(monthlyEps) - monthLookback)) {

#subset data
priceData <- prices[monthlyEps[i]:monthlyEps[i+monthLookback],]
returnsData <- returns[monthlyEps[i]:monthlyEps[i+monthLookback],]

#perform computations
momentum <- data.frame(t(t(priceData[nrow(priceData),])/t(priceData[1,]) - 1))
priceData <- priceData[, momentum > 0] #remove securities with momentum < 0
returnsData <- returnsData[, momentum > 0]
momentum <- momentum[momentum > 0]
names(momentum) <- colnames(returnsData)

vol <- as.numeric(-sd.annualized(returnsData))
#sumCors <- -colSums(cor(priceData[endpoints(priceData, on="months")]))
sumCors <- -colSums(cor(returnsData, use="complete.obs"))
stats <- data.frame(cbind(momentum, vol, sumCors))

if(nrow(stats) > 1) {

#perform ranking
ranks <- data.frame(apply(stats, 2, rank))
weightRankSum <- weightMom*ranks\$momentum + weightVol*ranks\$vol + weightCor*ranks\$sumCors
totalRank <- rank(weightRankSum)

#find top N values, from http://stackoverflow.com/questions/2453326/fastest-way-to-find-second-third-highest-lowest-value-in-vector-or-column
#thanks to Dr. Rob J. Hyndman
upper <- length(names(returnsData))
lower <- max(upper-bestN+1, 1)
topNvals <- sort(totalRank, partial=seq(from=upper, to=lower))[c(upper:lower)]

#compute weights
longs <- totalRank %in% topNvals #invest in ranks length - bestN or higher (in R, rank 1 is lowest)
longs <- longs/sum(longs) #equal weight all candidates
longs[longs > 1/bestN] <- 1/bestN #in the event that we have fewer than top N invested into, lower weights to 1/top N
names(longs) <- rownames(ranks)

} else if(nrow(stats) == 1) { #only one security had positive momentum
longs <- 1/bestN
names(longs) <- rownames(stats)
} else { #no securities had positive momentum
longs <- 1
names(longs) <- riskFreeName
}

#append removed names (those with momentum < 0)
removedZeroes <- rep(0, ncol(returns)-length(longs))
names(removedZeroes) <- names(returns)[!names(returns) %in% names(longs)]
longs <- c(longs, removedZeroes)

#reorder to be in the same column order as original returns/prices
longs <- data.frame(t(longs))
longs <- longs[, names(returns)]

#append lists
tmp[[i]] <- longs
dates[[i]] <- index(returnsData)[nrow(returnsData)]
}

weights <- do.call(rbind, tmp)
dates <- do.call(c, dates)
weights <- xts(weights, order.by=as.Date(dates))
weights[, riskFreeCol] <- weights[, riskFreeCol] + 1-rowSums(weights)
strategyReturns <- Return.rebalancing(R = returns, weights = weights, geometric = FALSE)
return(strategyReturns)
}

N3vol1cor1 <- FAAreturns(adPrices, weightVol = 1, weightCor = 1)
minRisk <- FAAreturns(adPrices, weightMom = 0, weightVol=1, weightCor=1)
pureMomentum <- FAAreturns(adPrices, weightMom=1, weightVol=0, weightCor=0)
maxDecor <- FAAreturns(adPrices, weightMom=0, weightVol=0, weightCor=1)
momDecor <- FAAreturns(adPrices, weightMom=1, weightVol=0, weightCor=1)

all <- cbind(replicaAttempt, bestN4, N3vol1cor1, minRisk, pureMomentum, maxDecor, momDecor)
colnames(all) <- c("Replica Attempt", "N4", "vol_1_cor_1", "minRisk", "pureMomentum", "maxDecor", "momDecor")
charts.PerformanceSummary(all, colorset=c("black", "red", "blue", "green", "darkgrey", "purple", "orange"))

stats <- data.frame(t(rbind(Return.annualized(all)*100,
maxDrawdown(all)*100,
SharpeRatio.annualized(all))))
stats\$Return_To_Drawdown <- stats[,1]/stats[,2]
```

Here’s the formal procedure:

Using the monthly endpoint functionality in R, every month, looking over the past four months, I computed momentum as the most recent price over the first price in the observed set (that is, the price four months ago) minus one, and instantly removed any funds with a momentum less than zero (this was a suggestion from Mr. David Varadi of CSS Analytics, with whom I’ll be collaborating in the near future). Next, with the pared down universe, I ranked the funds by momentum, by annualized volatility (the results are identical with just standard deviation), and by the sum of the correlations with each other. Since volatility and correlation are worse at higher values, I multiplied each by negative one. Next, I invested in the top N funds every period, or if there were fewer than N funds with positive momentum, each remaining fund received a weight of 1/N, with the rest eventually being placed into the “risk-free” asset, in this case, VFISX. All price and return data were daily adjusted (as per the SSRN paper) data.

However, my results do not match the paper’s (or Alpha Architect’s) in that I don’t see the annualized returns breaking 20%, nor, most importantly, do I see the single-digit drawdowns. I hope my code is clear for every step as to what the discrepancy may be, but that aside, let me explain what the idea is.

The idea is, from those that are familiar with trend following, that in addition to seeking return through the momentum anomaly (stacks of literature available on the simple idea that what goes up will keep going up to an extent), that there is also a place for risk management. This comes in the form of ranking correlation and volatility, and giving different weights to each individual component rank (that is, momentum has a weight of 1, correlation .5, and volatility .5). Next, the weighted sum of the ranks is then also ranked (so two layers of ranking) for a final aggregate rank.

Unfortunately, when it comes to the implementation, the code has to be cluttered with some data munging and edge-case checking, which takes a little bit away from the readability. To hammer a slight technical tangent home, in R, whenever one plans on doing iterated appending (E.G. one table that’s repeatedly appended), due to R copying an object on assignment when doing repeated rbinding or cbinding, but simply appending the last iteration onto a list object, outside of tiny data frames, it’s always better to use a list and only call rbind/cbind once at the end. The upside to data frames is that they’re much easier to print out to a console and to do vectorized operations on. However, lists are more efficient when it comes to iteration.

In any case, here’s an examination of some variations of this strategy.

The first is a simple attempt at replication (3 of 7 securities, 1 weight to momentum, .5 to volatility and correlation each). The second is that same setting, just with the top four securities instead of the top three. A third one is with three securities, but double the weighting to the two risk metrics (vol & cor). The next several are conceptual permutations–a risk minimization profile that puts no weight on the actual nature of momentum (analogous to what the Smart Beta folks would call min-vol), a pure momentum strategy (disregard vol and cor), a max decorrelation strategy (all weight on correlation), and finally, a hybrid of momentum and max decorrelation.

Here is the performance chart: Overall, this looks like evidence of robustness, given that I fundamentally changed the nature of the strategies in quite a few cases, rather than simply tweaked the weights here or there. The momentum/decorrelation hybrid is a bit difficult to see, so here’s a clearer image for how it compared with the original strategy. Overall, a slightly smoother ride, though slightly lower in terms of returns. Here’s the table comparing all seven variations:

```> stats
Annualized.Return Worst.Drawdown Annualized.Sharpe.Ratio..Rf.0.. Return_To_Drawdown
Replica Attempt          14.43802      13.156252                        1.489724          1.0974268
N4                       12.48541      10.212778                        1.492447          1.2225281
vol_1_cor_1              12.86459      12.254390                        1.608721          1.0497944
minRisk                  11.26158       9.223409                        1.504654          1.2209786
pureMomentum             13.88501      14.401121                        1.135252          0.9641619
maxDecor                 11.89159      11.685492                        1.434220          1.0176368
momDecor                 14.03615      10.951574                        1.489358          1.2816563
```

Overall, there doesn’t seem to be any objectively best variant, though pure momentum is definitely the worst (as may be expected, otherwise the original paper wouldn’t be as meaningful). If one is looking for return to max drawdown, then the momentum/max decorrelation hybrid stands out, though the 4-security variant and minimum risk variants also work (though they’d have to be leveraged a tiny bit to get the annualized returns to the same spot). On Sharpe Ratio, the variant with double the original weighting on volatility and correlation stands out, though its return to drawdown ratio isn’t the greatest.

However, the one aspect that I take away from this endeavor is that the number of assets were relatively tiny, and the following statistic:

```> SharpeRatio.annualized(Return.calculate(adPrices))
VTSMX     FDIVX     VEIEX    VFISX    VBMFX       QRAAX     VGSIX
Annualized Sharpe Ratio (Rf=0%) 0.2520994 0.3569858 0.2829207 1.794041 1.357554 -0.01184516 0.3062336
```

Aside from the two bond market funds, which are notorious for lower returns for lower risk, the Sharpe ratios of the individual securities are far below 1. The strategy itself, on the other hand, has very respectable Sharpe ratios, working with some rather sub-par components.

Simply put, consider running this asset allocation heuristic on your own set of strategies, as opposed to pre-set funds. Furthermore, it is highly likely that the actual details of the ranking algorithm can be improved, from different ranking metrics (add drawdown?) to more novel concepts such as stepwise correlation ranking/selection.