The Problem With Depmix For Online Regime Prediction

This post will be about attempting to use the Depmix package for online state prediction. While the depmix package performs admirably when it comes to describing the states of the past, when used for one-step-ahead prediction, under the assumption that tomorrow’s state will be identical to today’s, the hidden markov model process found within the package does not perform to expectations.

So, to start off, this post was motivated by Michael Halls-Moore, who recently posted some R code about using the depmixS4 library to use hidden markov models. Generally, I am loath to create posts on topics I don’t feel I have an absolutely front-to-back understanding of, but I’m doing this in the hope of learning from others on how to appropriately do online state-space prediction, or “regime switching” detection, as it may be called in more financial parlance.

Here’s Dr. Halls-Moore’s post.

While I’ve seen the usual theory of hidden markov models (that is, it can rain or it can be sunny, but you can only infer the weather judging by the clothes you see people wearing outside your window when you wake up), and have worked with toy examples in MOOCs (Udacity’s self-driving car course deals with them, if I recall correctly–or maybe it was the AI course), at the end of the day, theory is only as good as how well an implementation can work on real data.

For this experiment, I decided to take SPY data since inception, and do a full in-sample “backtest” on the data. That is, given that the HMM algorithm from depmix sees the whole history of returns, with this “god’s eye” view of the data, does the algorithm correctly classify the regimes, if the backtest results are any indication?

Here’s the code to do so, inspired by Dr. Halls-Moore’s.

getSymbols('SPY', from = '1990-01-01', src='yahoo', adjust = TRUE)
spyRets <- na.omit(Return.calculate(Ad(SPY)))


hmm <- depmix(SPY.Adjusted ~ 1, family = gaussian(), nstates = 3, data=spyRets)
hmmfit <- fit(hmm, verbose = FALSE)
post_probs <- posterior(hmmfit)
post_probs <- xts(post_probs,
summaryMat <- data.frame(summary(hmmfit))
colnames(summaryMat) <- c("Intercept", "SD")
bullState <- which(summaryMat$Intercept > 0)
bearState <- which(summaryMat$Intercept < 0)

hmmRets <- spyRets * lag(post_probs$state == bullState) - spyRets * lag(post_probs$state == bearState)

Essentially, while I did select three states, I noted that anything with an intercept above zero is a bull state, and below zero is a bear state, so essentially, it reduces to two states.

With the result:

Annualized Return               0.1355
Annualized Std Dev              0.1434
Annualized Sharpe (Rf=0%)       0.9448

So, not particularly terrible. The algorithm works, kind of, sort of, right?

Well, let’s try online prediction now.


dailyHMM <- function(data, nPoints) {
  subRets <- data[1:nPoints,]
  hmm <- depmix(SPY.Adjusted ~ 1, family = gaussian(), nstates = 3, data = subRets)
  hmmfit <- fit(hmm, verbose = FALSE)
  post_probs <- posterior(hmmfit)
  summaryMat <- data.frame(summary(hmmfit))
  colnames(summaryMat) <- c("Intercept", "SD")
  bullState <- which(summaryMat$Intercept > 0)
  bearState <- which(summaryMat$Intercept < 0)
  if(last(post_probs$state) %in% bullState) {
    state <- xts(1,
  } else if (last(post_probs$state) %in% bearState) {
    state <- xts(-1,
  } else {
    state <- xts(0,
  colnames(state) <- "State"

# took 3 hours in parallel
t1 <- Sys.time()
registerDoMC((detectCores() - 1))
states <- foreach(i = 500:nrow(spyRets), .combine=rbind) %dopar% {
  dailyHMM(data = spyRets, nPoints = i)
t2 <- Sys.time()

So what I did here was I took an expanding window, starting from 500 days since SPY’s inception, and kept increasing it, by one day at a time. My prediction, was, trivially enough, the most recent day, using a 1 for a bull state, and a -1 for a bear state. I ran this process in parallel (on a linux cluster, because windows’s doParallel library seems to not even know that certain packages are loaded, and it’s more messy), and the first big issue is that this process took about three hours on seven cores for about 23 years of data. Not exactly encouraging, but computing time isn’t expensive these days.

So let’s see if this process actually works.

First, let’s test if the algorithm does what it’s actually supposed to do and use one day of look-ahead bias (that is, the algorithm tells us the state at the end of the day–how correct is it even for that day?).

onlineRets <- spyRets * states 

With the result:

> table.AnnualizedReturns(onlineRets)
Annualized Return               0.2216
Annualized Std Dev              0.1934
Annualized Sharpe (Rf=0%)       1.1456

So, allegedly, the algorithm seems to do what it was designed to do, which is to classify a state for a given data set. Now, the most pertinent question: how well do these predictions do even one day ahead? You’d think that state space predictions would be parsimonious from day to day, given the long history, correct?

onlineRets <- spyRets * lag(states)

With the result:

> table.AnnualizedReturns(onlineRets)
Annualized Return               0.0172
Annualized Std Dev              0.1939
Annualized Sharpe (Rf=0%)       0.0888

That is, without the lookahead bias, the state space prediction algorithm is atrocious. Why is that?

Well, here’s the plot of the states:

In short, the online hmm algorithm in the depmix package seems to change its mind very easily, with obvious (negative) implications for actual trading strategies.

So, that wraps it up for this post. Essentially, the main message here is this: there’s a vast difference between loading doing descriptive analysis (AKA “where have you been, why did things happen”) vs. predictive analysis (that is, “if I correctly predict the future, I get a positive payoff”). In my opinion, while descriptive statistics have their purpose in terms of explaining why a strategy may have performed how it did, ultimately, we’re always looking for better prediction tools. In this case, depmix, at least in this “out-of-the-box” demonstration does not seem to be the tool for that.

If anyone has had success with using depmix (or other regime-switching algorithm in R) for prediction, I would love to see work that details the procedure taken, as it’s an area I’m looking to expand my toolbox into, but don’t have any particular good leads. Essentially, I’d like to think of this post as me describing my own experiences with the package.

Thanks for reading.

NOTE: On Oct. 5th, I will be in New York City. On Oct. 6th, I will be presenting at The Trading Show on the Programming Wars panel.

NOTE: My current analytics contract is up for review at the end of the year, so I am officially looking for other offers as well. If you have a full-time role which may benefit from the skills you see on my blog, please get in touch with me. My linkedin profile can be found here.

35 thoughts on “The Problem With Depmix For Online Regime Prediction

  1. Hi Ilya: I’ve never done any work with HMM’s but my guess is that you have to somehow tell it that you want regimes to “linger”. I don’t know what the R package that you used actually does but there are
    many ways to implement regime switching. Check out Hamilton’s seminal “regime switching” paper because he probably does a similar thing but in the context of economics which is more along the lines of what you would want to use in your application. It’s free ( see link below ) which is surprising for such a well known paper. good luck.

    Click to access Hamilton1989.pdf

  2. Pingback: The Problem With Depmix For Online Regime Prediction | A bunch of data

  3. Pingback: The Problem With Depmix For Online Regime Prediction – Mubashir Qasim

  4. You’re assessing the hmm’s suitability on whether it can predict day ahead asset returns.

    There’s no reason to think it would that the model would be able to do this.

    There are many useful ways to implement hmm’s in online trading (usually in risk mgmt / position sizing) – but you’re not going to generate short term directional alpha with one. That’s a real stretch.

    • Well, sort of. The way I was looking at it is “hey, we only want to make long trades when our regime detection says things are on the up and up, and only want to make short trades when our regime detection says things are going down. How well does it do this?”

      The answer is “well, not well at all”.

      That said, I’d like to see an example of hmms used for risk mgmt/position sizing. It’d be intriguing for me.

      • You might find some interesting relationship if you use your regime prediction as a feature in other machine learning algorithms. It could be that your regime is good, but only in context of other market features.

      • Oh hey, Kyle! How much of this blog have you read anyway ?=P

        Good suggestion, but what I wonder is if the regime prediction is far worse than even some basic SMA filter, what good is it? Feels like garbage features would hurt performance, no?

  5. I understand what you mean.

    I can’t share details, but we use hmms with a lot of success to identify of likelihood of order flows being toxic (I.e. Trending rather than mean reverting at very high frequency). We use this regime prediction to turn off market making algos or lean on the order book on the side of toxic flow.

  6. I have tried with Python packages before long time ago , I got the same result like you. That is the in-sample test result is good but the out-of-sample test is not as good as the in-sample test. I tried to expand the in-sample data set by add 1 bar and redo the test continuously and finally found that the previous in-sample results may changes after add the new bar data in. I think it’s caused by look-ahead bias .

  7. Pingback: Quantocracy's Daily Wrap for 10/05/2016 | Quantocracy

  8. May be I didn’t look the code comprehensively enough, but I see to caveats to the correctness of your algorithm:
    1) If you want stablilty in predictions may be you should use priors for the new initializations of hmm
    2) To make prediction to the next day you should not only take the prediction of the previous day, but also multiply it at the probability to change the state and choose the maximum from results.

    • 1) I wonder if there’s a way to do that with depmix

      2) Usually, with every new iteration, it seems to me that the algorithm is very sure of itself, (EG probability of a state change is very low, yet that’s not borne out by results).

  9. As stated previously, HMM are prone to “backpainting” where state assignations are reviewed when new data is added. This makes it impractical for real time regime identification out right. As Kyle Benton proposed, it is possible to use the output of the historical HMM as a classification target variable. From personal experience, this works to a certain extent but the ML approach tends to lag the HMM by a number of time periods. In terms of return distribution the HMM does identify different “regimes”, i.e. different means, variances and higher moments. The predicted regimes by the HMM + ML approach don’t, in my experience give such a good split (in part because of the lag). The HMM tends to place a regime switch before a large shock (because of look forward bias). The ML trained on the HMM tends to wait after such a shock to change its prediction. Finally the most important “features” or observable that could predict regimes were not the mean returns but the variance and higher moments.

    P.S. This is regarding the application to daily returns of the SP500 over the period from 1928 to 2016. Such is the context of these remarks they are not necessarily applicable to other data sets.

  10. first of all thanks for sharing your code. Here are some suggestions from my expereince on HMM and how i solved some problems with it.
    1. fit an HMM model as usual.
    2. refit the model using the following function i wrote by myself:
    for(i in 1:40){
    where fm2 is the fitted model and mod2 is the model itself.

    3. once you have new data or realtime data you have to create a new model than fit it setting the parameters by the older ones:
    mod_realtime<-setpars(mod_realtime, getpars(fm2))

    Hope this helps

      • you are right, i don’t know why but if statement is lost.
        for(i in 1:40){

        i'm not so good in writing posts, but if i can help in someway than i'm more tha happy to do that.

  11. Should it be lag(post_probs$state == bullState,-1) ? You calculate the posterior probability of the state using the close of day t, so you want to lag it by a day and use it to get the return on day t+1.

    lag() will actually go the other way and introduce look-forward bias, no? Apologies if I’m mistaken.

      • > post_probs[1:5,]
        state S1 S2 S3
        2004-01-05 1 1.0000000 0.000000000 0.00000000
        2004-01-06 3 0.9099963 0.002004599 0.08799906
        2004-01-07 3 0.8227678 0.001836454 0.17539575
        2004-01-08 3 0.7183967 0.001708458 0.27989481
        2004-01-09 3 0.8398295 0.002389283 0.15778123
        > lag(post_probs[1:5,],1)
        state S1 S2 S3
        2004-01-05 3 0.9099963 0.002004599 0.08799906
        2004-01-06 3 0.8227678 0.001836454 0.17539575
        2004-01-07 3 0.7183967 0.001708458 0.27989481
        2004-01-08 3 0.8398295 0.002389283 0.15778123
        > lag(post_probs[1:5,],-1)
        state S1 S2 S3
        2004-01-06 1 1.0000000 0.000000000 0.00000000
        2004-01-07 3 0.9099963 0.002004599 0.08799906
        2004-01-08 3 0.8227678 0.001836454 0.17539575
        2004-01-09 3 0.7183967 0.001708458 0.27989481

        My original state of 1 on 2004-01-05 (top row) is now appearing at the bottom (with lag of -1) with time stamp 2004-01-06, because I’ve lagged it by one day to multiply it by the returns from 2004-01-06. This reflects the fact that I am conservatively saying I can’t act on the information until the next day.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s