2014 Ebola Epidemic

Introduction

About this document.

This is a living document, as the Ebola epidemic is rapidly evolving. Presented below is a preliminary modeling approach to the crisis, originally conceived as an illustration of the spatial SEIR model family generally and the capabilities of the rapidly developing (and totally unfinished) libspatialSEIR software library particularly. This is not yet peer reviewed research, or even a particularly complete analysis. Nevertheless, we hope that the initial exploration given below is instructive and useful.

Past versions of this analysis are cached in a repository on Github, and remain available:

Additional analyses are underway. An informal look at prediction performance over time is also available.

Summary of Recent Changes

Oct 2 There have been a lot of changes for this release. The temporal component was dramatically simplified, so we're fitting a baseline SEIR model with a different intensity parameter for each nation. In addition, the spatial structure was generalized to allow a separate mixing parameter for each pair of countries. Finally, three different methods of calculating R0 are compared, and a lot of behind the scenes changes to libspatialSEIR and the R API took place. Upcoming changes include the introduction of "hybrid" samplers which may help deal with some of the autocorrelation, seen for example in the E to I transition probability chains. Despite the model changes, we continue to see a dramatic and prolonged predicted increase in cases.
Sep 8 Fixed R0 calculations, which were showing the per-report R0 components, but didn't account for infection length. They have now been appropriately integrated over time.
Sep 4 The situation continues to worsen, which is especially worrying given that the data used are almost certainly undercounts. Removed the polynomial analysis in favor of the spline basis analysis to save on computation time.
Aug 31 Basic reproductive number clearly very heterogenous between countries. Other interpretations TBD.
Aug 28 With the addition of Nigeria and new data, the situation appears dire across the board, especially in Liberia. The basic reproductive number calculations no longer look reasonable, so more must be done to estimate them separately for each country. The changes may also be due to recent changes to the R0 estimation code, which must now be reevaluated. This work is underway.
Aug 12 Liberia continues to worsen, though Sierra Leone appears to have leveled off. Guinea might be worsening slightly.
Aug 10 Predictions remain much the same, though perhaps not so immediately catestrophic as the August 5 predictions.
Aug 5 Models begin to show catastrophic predictions. Epidemic is changing too quickly for these simple models.

Shorten prediction window
Increase intensity process flexibility (quartic rather than cubic)
Recall that these models can't account for changing inteventions, of which there are many. The predictions are therefore made under the assumption that the epidemic will continue to evolve according to the same process it has so far.

Aug 3 Predicted epidmic curves start to strongly favor a uniformly worsening situation.
July 30 Previous predictions of mid-fall resolution of epidemic begin to shift.

The Outbreak

The 2014 Ebola outbreak in West Africa is an ongoing public health crisis, which has killed thousands of people so far. The cross-border nature of this epidemic, which emerged in Guinea, Liberia and Sierra Leone has complicated mitigation efforts, as has the poor health infrastructure in the region. This document explores a simple spatial SEIR model to make some initial predictions.

The Data

A summary of the WHO case reports is very helpfully compiled on wikipedia. It can be easily read into R with the xml library:

library(knitr)
library(coda) # Load the coda library for MCMC convergence diagnosis

## Loading required package: lattice

library(spatialSEIR)

## Loading required package: Rcpp
## 
## Attaching package: 'spatialSEIR'
## 
## The following object is masked _by_ '.GlobalEnv':
## 
##     dataModel

library(XML) # Load the XML library to read in data from Wikipedia
library(parallel) # Load the parallel library to enable multiple chains to be run simultaneously. 

## Define Document Compilation Parameters

documentCompilationMode = "release"
#documentCompilationMode = "debug"
modelDF = 0
pred.days = 30


## Compute number of samples/batches
numBurnInBatches =      ifelse(documentCompilationMode == "release", 1000,  1)
numConvergenceBatches = ifelse(documentCompilationMode == "release", 3000,  10)
convergenceBatchSize =  ifelse(documentCompilationMode == "release", 1000, 100)
extraR0Iterations =     ifelse(documentCompilationMode == "release", 200,   10)
iterationStride =       ifelse(documentCompilationMode == "release", 750,   50)
targetDaysPerRecord = 6

## Read in the data
url = 'http://en.wikipedia.org/wiki/West_Africa_Ebola_virus_outbreak'
tbls = readHTMLTable(url)
# These lines changes depending on the page formatting.
table1 = tbls[[8]]
table2 = tbls[[9]]

dat = rbind(table1[2:nrow(table1),c(1, 4, 6, 8, 10,12)],
            table2[2:nrow(table2),c(1, 4, 6, 8, 10,12)])

# One date is now (sometimes) duplicated on the Wikipedia page, due to using different sources. 
# Clean that up first. 
dup.indices = which(as.Date(dat[,1], "%d %b %Y") == as.Date("2014-06-05"))

if (length(dup.indices) > 1)
{
  dat[dup.indices[1],4] = dat[dup.indices[2],4]
  dat = rbind(dat[1:dup.indices[1],], dat[(dup.indices[2]+1):nrow(dat),])
}

charDate = as.character(dat[2:nrow(dat),1])
for (i in 1:length(charDate))
{
  charDate[i] = gsub("Sept", "Sep", charDate[i])
}

rptDate = as.Date(charDate, "%d %b %Y")
numDays = max(rptDate) - min(rptDate) + 1
numDays.pred = numDays + pred.days

original.rptDate = rptDate
ascendingOrder = order(rptDate)
rptDate = rptDate[ascendingOrder][2:length(rptDate)]
original.rptDate = original.rptDate[ascendingOrder]


cleanData = function(dataColumn, ascendingOrder)
{
    # Remove commas
    dataColumn = gsub(",", "", dataColumn, fixed = TRUE)
    # Remove +1 -1 stuff
    charCol = as.character(
      lapply(
        strsplit(
          as.character(
            dataColumn)[ascendingOrder], "\n"), function(x){ifelse(length(x) == 0, "—", x[[1]])}
        )
      )
    if (is.na(charCol[1]) || charCol[1] == "—")
    {
      charCol[1] = "0"
    }
    charCol = as.numeric(ifelse(charCol == "—", "", charCol))
    for (i in 2:length(charCol))
    {
      if (is.na(charCol[i]))
      {
        charCol[i] = charCol[i-1]
      }
    }
    charCol
    # Correct for undercounts
    for (i in seq(length(charCol), 2))
    {
        if (charCol[i-1] > charCol[i])
        {
            charCol[i-1] = charCol[i]
        }
    }
    charCol
}

Guinea = cleanData(dat[,2], ascendingOrder)
Liberia = cleanData(dat[,3], ascendingOrder)
Sierra.Leone = cleanData(dat[,4], ascendingOrder)

## Warning: NAs introduced by coercion

Nigeria = cleanData(dat[,5], ascendingOrder)
rawData = cbind(Guinea, Sierra.Leone, Liberia, Nigeria)
rownames(rawData) = as.character(original.rptDate)
colnames(rawData) = paste(paste("&nbsp;&nbsp", c("Guinea", "Liberia", "Sierra Leone", "Nigeria")), "&nbsp;&nbsp;")

# The data needs to be aggregated: there's some error in the measurements, 
# and we're not actually observing infection times. The data is therefore
# recorded at an artifically high time scale.
uncumulate = function(x)
{
    out = c(x[2:length(x)]-x[1:(length(x)-1)])
    ifelse(out >= 0, out, 0)
}
nDays = uncumulate(original.rptDate)

thinIndices = function(minDays, weights)
{
    keepIdx = c(length(weights))
    currentWeight = 0
    lastIdx = -1
    for (i in seq(length(weights)-1, 1))
    {
      currentWeight = currentWeight + weights[i]
      if (currentWeight >= minDays)
      {
          currentWeight = 0
          keepIdx = c(keepIdx, i)
          lastIdx = i
      }
    }
    if (currentWeight != 0)
    {
      keepIdx = c(keepIdx, lastIdx-1)
    }
    keepIdx
}

keepIdx = thinIndices(targetDaysPerRecord, c(1,nDays))
keepIdx = keepIdx[order(keepIdx)]



# Define the plot for the next section
ylim = c(min(c(Guinea, Sierra.Leone, Liberia, Nigeria)),
             max(c(Guinea, Sierra.Leone, Liberia, Nigeria)))
figure1 = function()
{
      plot(original.rptDate, Guinea, type = "l",
           main = "Raw Data: Case Counts From Wikipedia",
           xlab = "Date",
           ylab = "Total Cases",
           ylim = ylim, lwd = 3)
      abline(h = seq(0,100000, 100), lty = 2, col = "lightgrey")
      lines(original.rptDate, Liberia, lwd = 3, col = "blue", lty = 2)
      lines(original.rptDate, Sierra.Leone, lwd = 3, col = "red", lty = 3)
      lines(original.rptDate, Nigeria, lwd = 3, col = "green", lty = 4)
      legend(x = original.rptDate[1], y = max(ylim), legend =
               c("Guinea", "Liberia", "Sierra Leone", "Nigeria"),
             lty = 1:3, col = c("black", "blue","red", "green"), bg="white", cex = 1.1)
}


Guinea = Guinea[keepIdx]
Sierra.Leone = Sierra.Leone[keepIdx]
Liberia = Liberia[keepIdx]
Nigeria = Nigeria[keepIdx]
original.rptDate = original.rptDate[keepIdx]
rptDate = original.rptDate[2:length(original.rptDate)]

With data in hand, let's begin where every analysis should begin: graphs.

In addition to the above graph, the raw data is archived for posterity in the code block below. Wikipedia changes a lot, so it's important to record the context of the results presented here.

kable(rawData)

	Guinea	Liberia	Sierra Leone	Nigeria
2014-03-22	86	0	0	0
2014-03-24	86	0	0	0
2014-03-25	86	0	0	0
2014-03-26	103	0	2	0
2014-03-27	112	0	2	0
2014-03-28	112	0	2	0
2014-03-29	122	0	8	0
2014-03-31	127	0	8	0
2014-04-01	151	0	12	0
2014-04-07	158	0	12	0
2014-04-09	158	0	12	0
2014-04-10	158	0	12	0
2014-04-11	168	0	12	0
2014-04-14	197	0	12	0
2014-04-16	203	0	12	0
2014-04-17	208	0	12	0
2014-04-20	208	0	12	0
2014-04-21	218	0	12	0
2014-04-23	218	0	12	0
2014-04-24	221	0	12	0
2014-04-30	226	0	12	0
2014-05-01	226	0	12	0
2014-05-02	231	0	12	0
2014-05-03	233	0	12	0
2014-05-07	233	0	12	0
2014-05-10	248	0	12	0
2014-05-12	253	0	12	0
2014-05-18	258	0	12	0
2014-05-23	281	16	12	0
2014-05-27	291	16	12	0
2014-05-28	291	50	12	0
2014-05-29	328	79	12	0
2014-06-01	344	79	12	0
2014-06-03	351	81	12	0
2014-06-05	351	89	12	0
2014-06-06	372	89	12	0
2014-06-10	390	95	33	0
2014-06-15	390	95	33	0
2014-06-16	390	97	33	0
2014-06-17	390	158	33	0
2014-06-20	390	158	51	0
2014-06-22	406	239	107	0
2014-06-30	406	252	115	0
2014-07-02	406	305	131	0
2014-07-06	406	337	142	0
2014-07-08	406	386	172	0
2014-07-12	410	397	174	0
2014-07-14	410	442	196	0
2014-07-17	415	454	224	0
2014-07-20	427	525	249	0
2014-07-23	460	533	329	1
2014-07-27	472	574	391	3
2014-07-30	485	646	468	4
2014-08-01	495	691	516	9
2014-08-04	495	717	554	12
2014-08-06	506	730	599	12
2014-08-09	510	783	670	12
2014-08-11	519	810	786	12
2014-08-13	543	848	834	15
2014-08-16	579	907	972	15
2014-08-18	607	910	1082	16
2014-08-20	648	1026	1378	19
2014-08-25	771	1216	1698	20
2014-08-31	823	1292	1863	20
2014-09-03	861	1424	2081	20
2014-09-07	899	1509	2415	20
2014-09-10	942	1655	2720	20
2014-09-14	965	1753	3022	20
2014-09-17	1022	1940	3280	20
2014-09-21	1074	2021	3458	20
2014-09-23	1103	2120	3564	20
2014-09-25	1157	2317	3696	20

This graph and corresponding table represent cumulative counts, but because case reports can be revised downward due to non-Ebola illnesses the graphs are not strictly monotone. A quick, but effective solution to this problem is to simply "un-cumulate" the data and bound it at zero to get a rough estimate of new case counts over time.

For better graphical representation, the "un-cumulated" counts are scaled to represent average number of infections per day, and linearly interpolated. The process is a bit noisier from this perspective when compared to the original cumulative counts.

One can also represent this data geographically to get an idea of the spatial epidemic pattern, and to place the problem in a more relatable context.

Average Number of Infections Per Day:

Day:

Compartmental Models

Now that the data is read in (and now that we have several plots to suggest that we haven't done anything too terribly stupid with it) , let's do some compartmental epidemic modeling. Not only has Ebola been well modeled in the past using compartmental modeling techniques, but this author happens to be working on a software library designed to fit compartmental models in the spatial SEIRS family. What a strange coincidence! Specifically, we'll be using heirarchical Bayesian estimation methods to fit a spatial SEIR model to the data.

While a full treatment of this field of epidemic modeling is (far) beyond the scope of this writing, the basic idea is pretty intuitive. In order to come up with a simplified model of a disease process, discrete disease states (aka, compartments) are defined. The most common of these are S, E, I, and R which stand for:

Susceptible to a particular disease
Exposed and infected, but not yet infectious
Infectious and capable of transmitting the disease
Removed or recovered

This sequence, traversed by members of a population (S to E to I to R), forms what we might call the temporal process model of our analysis. This analysis belongs to the stochastic branch of the compartmental modeling family, which has its roots in deterministic systems of ordinary and partial differential equations. In the stochastic framework, transitions between the compartments occur according to unknown probabilities. It is the S to E probability, which captures infection activity, into which we introduce spatial structure. Some details of this are given as comments to the code below, and more information than you probably want on the statistical particulars is available in this pdf document. For now, suffice it to say that we'll place a simple spatial structure on the epidemic process which simply allows disease to spread between the three nations involved, and we'll try to estimate the strength of that relationship. Many other potential structures are possible, limited primarily by the amount of additional research and data compilation one is willing to do.

For the purposes of this analysis, we will not do anything fancy with demographic information or public health intervention dates. Demographic parameters are relatively difficult to estimate here, as there are only four spatial units which are all from the same region. Intervention dates are more promising, but their inclusion requires much more background research than we have time for here. In the interest of simplicity and estimability, we'll just fit a different disease intensity parameter for each of the three countries to capture aggregate differences in Ebola susceptibility in addition to using a set of basis functions to capture the temporal trend.

Analysis

Set Up

There are some things we need to define before we can start fitting models and making predictions.

The population sizes need to be determined.
Initial values for the four compartments must be determined.
The time points are not evenly spaced, so we need to define appropriate offset values to capture the amount of aggregation performed (time between reports).
We must define the spatial correlation structure.
(optionally) A set of basis functions to capture the temporal trend.
Prior parameters and parameter staring values must be specified for each chain.
A whole bunch of bookkeeping stuff for which I haven't yet programmed sensible default behavior needs to be set up.

Compartment starting values follow the usual convention of letting the entire initial population be divided into susceptibles and infectious individuals. The starting value for the number of infectious individuals was 86 for Guinea and zero for the other nations. Temporal offsets are actually calculated in the first code block (above) as the differences between the report times. For temporal basis functions, several options have been explored over the life of this project, including orthogonal polynomial and natural splines. The current analysis uses no temporal basis function, instead choosing to increase the granularity of the spatial correlation structure. Future work will introduce time varying intervention terms to quantify the effects of the early efforts of MSF and the ongoing international response. Prior parameters for the E to I and I to R transitions were chosen based on well documented values for the average latent and infectious times, and the rest of the prior parameters were left vague. These decisions are addressed in more detail as comments to the code below.

library(splines)
# Guinea, Liberia, Sierra Leone, Nigeria
N = matrix(c(10057975, 4128572, 6190280,174507539), nrow = nrow(I_star),ncol = 4,
           byrow=TRUE)
X = diag(ncol(N))
X.predict = X

daysSinceJan = as.numeric(rptDate - as.Date("2014-01-01"))
daysSinceJan.predict = c(max(daysSinceJan) + 1, max(daysSinceJan)
                         + seq(2,pred.days-2,2))
{
if (modelDF != 0)
{
  splineBasis = ns(daysSinceJan, df = modelDF)
  splineBasis.predict = predict(splineBasis, daysSinceJan.predict)
  Z = matrix(splineBasis, ncol = modelDF)
  Z.predict = splineBasis.predict
  # These co-variates are the same for each spatial location, 
  # so duplicate them row-wise. 
  Z = Z[rep(1:nrow(Z), nrow(X)),,drop=FALSE]
  Z.predict = Z.predict[rep(1:nrow(Z.predict), nrow(X)),,drop=FALSE]

  # For convenience, let's combine X and Z for prediction.
  X.pred = cbind(X.predict[rep(1:nrow(X.predict),
                               each = nrow(Z.predict)/nrow(X)),], Z.predict)
}
else
{
  splineBasis = c()
  splineBasis.predict = c()
  Z = NA
  Z.predict = NA
  X.pred = cbind(X.predict[rep(1:nrow(X.predict), each = length(daysSinceJan.predict)),])
}
}
DM1 = matrix(c(0,1,0,0,
               1,0,0,0,
               0,0,0,0,
               0,0,0,0), nrow = 4, ncol = 4, byrow = TRUE)
DM2 = matrix(c(0,0,1,0,
               0,0,0,0,
               1,0,0,0,
               0,0,0,0), nrow = 4, ncol = 4, byrow = TRUE)
DM3 = matrix(c(0,0,0,1,
               0,0,0,0,
               0,0,0,0,
               1,0,0,0), nrow = 4, ncol = 4, byrow = TRUE)
DM4 = matrix(c(0,0,0,0,
               0,0,1,0,
               0,1,0,0,
               0,0,0,0), nrow = 4, ncol = 4, byrow = TRUE)
DM5 = matrix(c(0,0,0,0,
               0,0,0,1,
               0,0,0,0,
               0,1,0,0), nrow = 4, ncol = 4, byrow = TRUE)
DM6 = matrix(c(0,0,0,0,
               0,0,0,0,
               0,0,0,1,
               0,0,1,0), nrow = 4, ncol = 4, byrow = TRUE)
spatialExplanations = c("Guinea <-> Liberia Spread",
                        "Guinea <-> Sierra Leone Spread",
                        "Guinea <-> Nigeria Spread",
                        "Liberia <-> Sierra Leone Spread",
                        "Liberia <-> Nigeria Spread",
                        "Sierra Leone <-> Nigeria Spread")

dmList = list(DM1,DM2,DM3,DM4,DM5,DM6)


# Define population sizes for the three countries of interest. This data also 
# from Wikipedia. 

# Define prediction offsets. 
offset.pred = c(1,seq(2,pred.days-2,2))

# There's no reinfection process for Ebola, but we still need to provide dummy
# values for the reinfection terms. This will be changed (along with most of 
# the R level API) Dummy covariate matrix:
X_p_rs = matrix(0)

# Dummy value for reinfection params
beta_p_rs = rep(0, ncol(X_p_rs))
# Dummy value for reinfection params prior precision
betaPrsPriorPrecision = 0.5



# Declare prior parameters for the E to I and I to R probabilities. 
priorAlpha_gammaEI = 250;
priorBeta_gammaEI = 1000;
priorAlpha_gammaIR = 140;
priorBeta_gammaIR = 1000;
# Declare prior parameters for the overdispersion precision
priorAlpha_phi = 10
priorBeta_phi = 0.1

# Declare prior precision for exposure model paramters
betaPriorPrecision = 0.1

# Declare a function which can come up with several different starting values 
# for the model parameters. This will allow us to assess convergence. 
proposeParameters = function(seedVal, chainNumber)
{
    set.seed(seedVal)

    # 2 to 21 day incubation period according to who
    p_ei = 0.25 + rnorm(1, 0, 0.02)
    # Up to 7 weeks even after recovery
    p_ir = 0.14 + rnorm(1, 0, 0.01)
    gamma_ei=-log(1-p_ei)
    gamma_ir=-log(1-p_ir)

    # Starting value for exposure regression parameters
    beta = rep(0, ncol(X) + ifelse(is.na(Z), 0, ncol(Z)))
    beta[1] = 2.5 + rnorm(1,0,0.5)

    phi = 0.01 # Overdispersion precision

    outFileName = paste("./chain_output_ebola_", chainNumber ,".txt", sep = "")

    # Make a crude guess as to the true compartments:
    # S_star, E_star, R_star, and thus S,E,I and R
    DataModel = buildDataModel(I_star, type = "overdispersion",
                               params=c(priorAlpha_phi,priorBeta_phi))
    ExposureModel = buildExposureModel(X, Z, beta, betaPriorPrecision, offsets, nTpt = nrow(I_star))
    ReinfectionModel = buildReinfectionModel("SEIR")
    SamplingControl = buildSamplingControl(iterationStride=iterationStride,
                                           sliceWidths=c(1, # S_star
                                                         1, # E_star
                                                         1,  # R_star
                                                         1,  # S_0
                                                         1,  # I_0
                                                         0.05,  # beta
                                                         0.0,  # beta_p_rs, fixed in this case
                                                         0.01, # rho
                                                         0.01, # gamma_ei
                                                         0.01,  # gamma_ir
                                                         0.01)) # phi)

    InitContainer = buildInitialValueContainer(I_star, N,
                                               S0 = N[1,]-I_star[1,] - I0,
                                               I0 = I0,
                                               p_ir = 0.2,
                                               p_rs = 0.00)
    DistanceModel = buildDistanceModel(dmList)
    TransitionPriors = buildTransitionPriorsManually(priorAlpha_gammaEI,priorBeta_gammaEI,
                                                     priorAlpha_gammaIR,priorBeta_gammaIR)
    return(list(DataModel=DataModel,
                ExposureModel=ExposureModel,
                ReinfectionModel=ReinfectionModel,
                SamplingControl=SamplingControl,
                InitContainer=InitContainer,
                DistanceModel=DistanceModel,
                TransitionPriors=TransitionPriors,
                outFileName=outFileName))
}

With the set up out of the way, we can finally build and run the models the models. The code presented below has recently (8/27/2014) been set up to use the "parallel" library which is included with the R statistical analysis software. While this allows us to spend considerably less time waiting for the document to compile, the code may be more difficult to understand for those unfamiliar with this useful parallelization library. The previous analyses, linked above, may be a better guide to the basic use of the spatialSEIR library for such users.

paramsList = list(list("estimateR0"=FALSE, "traceCompartments"=TRUE, "seedVal"=133,"chainNumber"=4),
                  list("estimateR0"=TRUE, "traceCompartments"=FALSE, "seedVal"=1224,"chainNumber"=5),
                  list("estimateR0"=FALSE,"traceCompartments"=FALSE, "seedVal"=12325,"chainNumber"=6))

buildAndRunModel = function(params)
{
  library(spatialSEIR)
  proposal = proposeParameters(params[["seedVal"]], params[["chainNumber"]])
  SEIRmodel =  buildSEIRModel(proposal$outFileName,
                              proposal$DataModel,
                              proposal$ExposureModel,
                              proposal$ReinfectionModel,
                              proposal$DistanceModel,
                              proposal$TransitionPriors,
                              proposal$InitContainer,
                              proposal$SamplingControl)

  SEIRmodel$setRandomSeed(params[["seedVal"]])

  # Do we need to keep track of compartment values for prediction? 
  # No sense doing this for all of the chains.
  if (params[["traceCompartments"]])
  {
    SEIRmodel$setTrace(0) #Guinea 
    SEIRmodel$setTrace(1) #Liberia
    SEIRmodel$setTrace(2) #Sierra Leone
    SEIRmodel$setTrace(3) #Nigeria
  }

  # Make a helper function to run each chain, as well as update the metropolis 
  # tuning parameters. 
  runSimulation = function(modelObject,
                           numBatches=500,
                           batchSize=20,
                           targetAcceptanceRatio=0.2,
                           tolerance=0.05,
                           proportionChange = 0.1
                          )
  {
      for (batch in 1:numBatches)
      {
          modelObject$simulate(batchSize)
          modelObject$updateSamplingParameters(targetAcceptanceRatio,
                                               tolerance,
                                               proportionChange)
      }
  }

  # Burn in tuning parameters
  runSimulation(SEIRmodel, numBatches = numBurnInBatches)
  SEIRmodel$compartmentSamplingMode = 16
  SEIRmodel$parameterSamplingMode = 7
  if (modelDF > 0)
  {
      SEIRmodel$useDecorrelation = 500
  }
  # Run Simulation
  cat(paste("Running chain ", params[["chainNumber"]], "\n", sep =""))

  tm = 0


  tm = tm + system.time(runSimulation(SEIRmodel,
            numBatches=numConvergenceBatches,
            batchSize=convergenceBatchSize,
            targetAcceptanceRatio=0.2,
            tolerance=0.025,
            proportionChange = 0.05))


  cat(paste("Time elapsed for ", proposal$outFileName," : ", round(tm[3]/60,3),
              " minutes\n", sep = ""))
  dat = read.csv(proposal$outFileName)

  ## Do we need to estimate R0 for this chain?
  if (params[["estimateR0"]])
  {
    R0 = array(0, dim = c(nrow(I_star), ncol(I_star), extraR0Iterations))
    effectiveR0 = array(0, dim = c(nrow(I_star), ncol(I_star), extraR0Iterations))
    empiricalR0 = array(0, dim = c(nrow(I_star), ncol(I_star), extraR0Iterations))
    for (i in 1:extraR0Iterations)
    {
        SEIRmodel$simulate(iterationStride)
        for (j in 0:(nrow(I_star)-1))
        {
            R0[j,,i] = SEIRmodel$estimateR0(j)
            effectiveR0[j,,i] = SEIRmodel$estimateEffectiveR0(j)
            empiricalR0[j,,i] = apply(SEIRmodel$getIntegratedGenerationMatrix(j), 1, sum)
        }
    }

    R0Mean = apply(R0, 1:2, mean)
    R0LB = apply(R0, 1:2, quantile, probs = 0.05)
    R0UB = apply(R0, 1:2, quantile, probs = 0.95)
    effectiveR0Mean = apply(effectiveR0, 1:2, mean)
    effectiveR0LB = apply(effectiveR0, 1:2, quantile, probs = 0.05)
    effectiveR0UB = apply(effectiveR0, 1:2, quantile, probs = 0.95)
    empiricalR0Mean = apply(empiricalR0, 1:2, mean)
    empiricalR0LB = apply(empiricalR0, 1:2, quantile, probs = 0.05)
    empiricalR0UB = apply(empiricalR0, 1:2, quantile, probs = 0.95)
    orig.R0 = R0
    R0 = list("R0" = list("mean"=R0Mean, "LB" = R0LB, "UB" = R0UB),
              "effectiveR0" = list("mean"=effectiveR0Mean, "LB" = effectiveR0LB,
                                   "UB" = effectiveR0UB),
              "empiricalR0" = list("mean"=empiricalR0Mean, "LB" = empiricalR0LB,
                                   "UB" = empiricalR0UB))
  } else
  {
     R0 = NULL
     orig.R0 = NULL
  }

  return(list("chainOutput" = dat, "R0" = R0, "rawSamples" = orig.R0))
}


cl = makeCluster(3, outfile = "err.txt")
clusterExport(cl, c( "offsets",
                     "X",
                     "Z",
                     "I0",
                     "X_p_rs",
                     "priorAlpha_gammaEI",
                     "priorBeta_gammaEI",
                     "priorAlpha_gammaIR",
                     "priorBeta_gammaIR",
                     "priorAlpha_phi",
                     "priorBeta_phi",
                     "betaPriorPrecision",
                     "beta_p_rs",
                     "betaPrsPriorPrecision",
                     "N",
                     "dmList",
                     "iterationStride",
                     "proposeParameters",
                     "generateCompartmentProposal",
                     "numConvergenceBatches",
                     "spatialExplanations",
                     "convergenceBatchSize",
                     "extraR0Iterations",
                     "I_star",
                     "modelDF",
                     "numBurnInBatches"))


chains = parLapply(cl, paramsList, buildAndRunModel)
stopCluster(cl)

chain1 = chains[[1]]$chainOutput
chain2 = chains[[2]]$chainOutput
chain3 = chains[[3]]$chainOutput


plotChains = function(c1, c2, c3, main)
{
    idx = floor(length(c1)/2):length(c1)
    mcl = mcmc.list(as.mcmc(c1),
                    as.mcmc(c2),
                    as.mcmc(c3))
    g.d = gelman.diag(mcl)
    main = paste(main, "\n", "Gelman Convergence Diagnostic and UL: \n",
                 round(g.d[[1]][1],2), ", ", round(g.d[[1]][2],2))

    plot(chain1$Iteration[idx], c1[idx], type = "l", main = main,
         xlab = "Iteration", ylab = "value")
    lines(chain2$Iteration[idx],c2[idx], col = "red", lty=2)
    lines(chain3$Iteration[idx],c3[idx], col = "green", lty=3)
}


figure8 = function()
{
  par(mfrow = c(2,2))
  plotChains(chain1$BetaP_SE_0,
             chain2$BetaP_SE_0,
             chain3$BetaP_SE_0,
             "Guinea Exposure Intercept")
  plotChains(chain1$BetaP_SE_3,
             chain2$BetaP_SE_3,
             chain3$BetaP_SE_3,
             "Nigeria Exposure Intercept")
  plotChains(chain1$BetaP_SE_1,
             chain2$BetaP_SE_1,
             chain3$BetaP_SE_1,
             "Liberia Exposure Intercept")
  plotChains(chain1$BetaP_SE_2,
             chain2$BetaP_SE_2,
             chain3$BetaP_SE_2,
             "Sierra Leone Exposure Intercept")
}
figure9 = function()
{
  par(mfrow = c(2,1))
  plotChains(1-exp(-chain1$gamma_ei),
             1-exp(-chain2$gamma_ei),
             1-exp(-chain3$gamma_ei)
             , "E to I Transition Probability")
  plotChains(1-exp(-chain1$gamma_ir),
             1-exp(-chain2$gamma_ir),
             1-exp(-chain3$gamma_ir)
             , "I to R Transition Probability")
}

figure9_5 = function()
{
  #par(mfrow = c(length(dmList), 1))
  for (i in 1:length(dmList))
  {
      plotChains(chain1[[paste("rho_", i-1, sep = "")]],
                 chain2[[paste("rho_", i-1, sep = "")]],
                 chain3[[paste("rho_", i-1, sep = "")]],
                 spatialExplanations[i]
                 )
  }
}

## Parameter Estimates 

nbeta = ncol(X) + ifelse(class("Z") == "matrix", ncol(Z), 0)
c1 = chain1[floor(nrow(chain1)/2):nrow(chain1),c(1:(nbeta + length(dmList)),
                                                 (nbeta+length(dmList)+1):
                                                   (nbeta+length(dmList)+3))]
c2 = chain2[floor(nrow(chain2)/2):nrow(chain2),c(1:(nbeta + length(dmList)),
                                                 (nbeta+length(dmList)+1):
                                                   (nbeta+length(dmList)+3))]
c3 = chain3[floor(nrow(chain3)/2):nrow(chain3),c(1:(nbeta + length(dmList)),
                                                 (nbeta+length(dmList)+1):
                                                   (nbeta+length(dmList)+3))]

c1$gamma_ei = 1-exp(-c1$gamma_ei)
c1$gamma_ir = 1-exp(-c1$gamma_ir)
c2$gamma_ei = 1-exp(-c2$gamma_ei)
c2$gamma_ir = 1-exp(-c2$gamma_ir)
c3$gamma_ei = 1-exp(-c3$gamma_ei)
c3$gamma_ir = 1-exp(-c3$gamma_ir)
{
if (modelDF != 0)
{
  colnames(c1) = c("Guinea Intercept", "Liberia Intercept",
                   "Sierra Leone Intercept","Nigeria Intercept",
                   paste("Time component ", (1:(nbeta-ncol(X))), sep = ""),
                   "Overdispersion Precision",
                   paste("Spatial Dependence Parameter", 1:length(dmList)),
                   "E to I probability",
                   "I to R probability")
}
else
{
      colnames(c1) = c("Guinea Intercept", "Liberia Intercept",
                   "Sierra Leone Intercept","Nigeria Intercept",
                   "Overdispersion Precision",
                   paste("Spatial Dependence Parameter", 1:length(dmList)),
                   "E to I probability",
                   "I to R probability")
}
}
colnames(c2) = colnames(c1)
colnames(c3) = colnames(c1)

mcl = mcmc.list(as.mcmc(c1),
                as.mcmc(c2),
                as.mcmc(c3))
summary(mcl)

## 
## Iterations = 1:2015
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 2015 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                                     Mean       SD Naive SE Time-series SE
## Guinea Intercept               -2.85e+00 6.48e-02 8.34e-04       9.53e-04
## Liberia Intercept              -2.42e+00 2.89e-02 3.71e-04       8.48e-04
## Sierra Leone Intercept         -2.78e+00 6.20e-02 7.98e-04       9.61e-04
## Nigeria Intercept              -4.49e+00 9.51e-01 1.22e-02       1.22e-02
## Overdispersion Precision        1.60e-03 1.52e-03 1.96e-05       1.96e-05
## Spatial Dependence Parameter 1  1.84e-01 1.84e-02 2.36e-04       2.58e-04
## Spatial Dependence Parameter 2  1.51e-04 1.50e-04 1.92e-06       1.91e-06
## Spatial Dependence Parameter 3  1.44e-01 1.63e-02 2.09e-04       2.08e-04
## Spatial Dependence Parameter 4  2.36e-05 2.32e-05 2.98e-07       2.98e-07
## Spatial Dependence Parameter 5  3.46e-04 1.29e-04 1.66e-06       1.66e-06
## Spatial Dependence Parameter 6  1.00e+02 3.22e+01 4.14e-01       4.14e-01
## E to I probability              2.59e-01 9.31e-03 1.20e-04       9.82e-04
## I to R probability              7.99e-02 3.62e-03 4.65e-05       1.38e-04
## 
## 2. Quantiles for each variable:
## 
##                                     2.5%       25%       50%       75%
## Guinea Intercept               -2.98e+00 -2.89e+00 -2.85e+00 -2.80e+00
## Liberia Intercept              -2.47e+00 -2.44e+00 -2.42e+00 -2.40e+00
## Sierra Leone Intercept         -2.91e+00 -2.82e+00 -2.78e+00 -2.74e+00
## Nigeria Intercept              -6.79e+00 -5.01e+00 -4.34e+00 -3.81e+00
## Overdispersion Precision        4.31e-05  4.77e-04  1.16e-03  2.23e-03
## Spatial Dependence Parameter 1  1.50e-01  1.72e-01  1.84e-01  1.96e-01
## Spatial Dependence Parameter 2  4.44e-06  4.35e-05  1.04e-04  2.12e-04
## Spatial Dependence Parameter 3  1.12e-01  1.33e-01  1.44e-01  1.55e-01
## Spatial Dependence Parameter 4  6.12e-07  7.10e-06  1.65e-05  3.29e-05
## Spatial Dependence Parameter 5  1.09e-04  2.55e-04  3.39e-04  4.29e-04
## Spatial Dependence Parameter 6  4.76e+01  7.75e+01  9.68e+01  1.19e+02
## E to I probability              2.42e-01  2.53e-01  2.59e-01  2.65e-01
## I to R probability              7.31e-02  7.74e-02  7.99e-02  8.24e-02
##                                    97.5%
## Guinea Intercept               -2.72e+00
## Liberia Intercept              -2.36e+00
## Sierra Leone Intercept         -2.66e+00
## Nigeria Intercept              -3.06e+00
## Overdispersion Precision        5.57e-03
## Spatial Dependence Parameter 1  2.22e-01
## Spatial Dependence Parameter 2  5.49e-04
## Spatial Dependence Parameter 3  1.75e-01
## Spatial Dependence Parameter 4  8.67e-05
## Spatial Dependence Parameter 5  6.17e-04
## Spatial Dependence Parameter 6  1.71e+02
## E to I probability              2.78e-01
## I to R probability              8.73e-02

## R0 stuff

R0_list = chains[[2]]$R0




figure10 = function(R0_list, type)
{
  r0.ylim = c(min(R0_list$LB), max(R0_list$UB))
  par(mfrow = c(2,2))
  plotR0 = function(main, idx)
  {
    plot(rptDate[1:(length(rptDate)-1)], R0_list$mean[1:(length(rptDate)-1),idx] , type = "l", xlab = "Date",
         ylab = expression('R'[0]),
         main = main,
         ylim = r0.ylim, lwd = 2)
    lines(rptDate[1:(length(rptDate)-1)], R0_list$LB[1:(length(rptDate)-1),idx], lty = 2)
    lines(rptDate[1:(length(rptDate)-1)], R0_list$UB[1:(length(rptDate)-1),idx], lty = 2)
    abline(h=seq(0, 50, 0.5), lty=2, col="lightgrey")
    abline(h = 1.0, col = "blue", lwd = 1.5, lty = 2)
  }
  plotR0(paste("Guinea ",type, "R0",sep=""), 1)
  plotR0(paste("Liberia ", type, "R0",sep=""),2)
  plotR0(paste("Sierra Leone ", type,"R0",sep=""), 3)
  plotR0(paste("Nigeria ",type,"R0",sep=""), 4)
}

# Guinea, Liberia, Sierra Leone

getMeanAndCI = function(loc,tpt,baseStr="I_")
{
    vec = chain1[[paste(baseStr, loc, "_", tpt, sep = "")]]
    vec = vec[floor(length(vec)/2):length(vec)]
    return(c(mean(vec), quantile(vec, probs = c(0.05, 0.95))))
}

Guinea.I.Est = sapply(0:(nrow(I_star)- 1), getMeanAndCI, loc=0)
Liberia.I.Est = sapply(0:(nrow(I_star)- 1), getMeanAndCI, loc=1)
SierraLeone.I.Est = sapply(0:(nrow(I_star)- 1), getMeanAndCI, loc=2)
Nigeria.I.Est = sapply(0:(nrow(I_star)- 1), getMeanAndCI, loc=3)

# Declare prediction functions
predictEpidemic = function(beta.pred,
                           X.pred,
                           gamma.ei,
                           gamma.ir,
                           S0,
                           E0,
                           I0,
                           R0,
                           rho,
                           offsets.pred)
{
    N = (S0+E0+I0+R0)
    p_se_components = matrix(exp(X.pred %*% beta.pred), ncol=length(S0))
    p_se = matrix(0, ncol = length(S0), nrow = nrow(p_se_components))
    p_ei = 1-exp(-gamma.ei*offsets.pred)
    p_ir = 1-exp(-gamma.ir*offsets.pred)
    S_star = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    E_star = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    I_star = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    R_star = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    S = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    E = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    I = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    R = matrix(0, ncol=length(S0),nrow = nrow(p_se_components))
    S[1,] = S0
    E[1,] = E0
    I[1,] = I0
    R[1,] = R0
    S_star[1,] = rbinom(rep(1, length(S0)), R0, 0)
    p_se[1,] = I[1,]/N*p_se_components[1,]
    for (i in 1:length(dmList))
    {
      p_se[1,] = p_se[1,] + rho[i]*(dmList[[i]] %*% (I[1,]/N*p_se_components[1,]))
    }
    p_se[1,] = 1-exp(-offsets.pred[1]*(p_se[1,]))

    E_star[1,] = rbinom(rep(1, length(S0)), S0, p_se[1,])
    I_star[1,] = rbinom(rep(1, length(S0)), E0, p_ei[1])
    R_star[1,] = rbinom(rep(1, length(S0)), I0, p_ir[1])

    for (i in 2:nrow(S))
    {

      S[i,] = S[i-1,] + S_star[i-1,] - E_star[i-1,]
      E[i,] = E[i-1,] + E_star[i-1,] - I_star[i-1,]
      I[i,] = I[i-1,] + I_star[i-1,] - R_star[i-1,]
      R[i,] = R[i-1,] + R_star[i-1,] - S_star[i-1,]

      p_se[i,] = I[i,]/N*p_se_components[i,]
      for (j in 1:length(dmList))
      {
        p_se[i,] = p_se[i,] + rho[j]*(dmList[[j]] %*% (I[i,]/N*p_se_components[i,]))
      }
      p_se[i,] = 1-exp(-offsets.pred[i]*(p_se[i,]))


      S_star[i,] = rbinom(rep(1, length(S0)), R[i,], 0)
      E_star[i,] = rbinom(rep(1, length(S0)), S[i,], p_se[i,])
      I_star[i,] = rbinom(rep(1, length(S0)), E[i,], p_ei[i])
      R_star[i,] = rbinom(rep(1, length(S0)), I[i,], p_ir[i])
    }
    return(list(S=S,E=E,I=I,R=R,
                S_star=S_star,E_star=E_star,
                I_star=I_star,R_star=R_star,
                p_se=p_se,p_ei=p_ei,p_ir=p_ir))
}


predict.i = function(i)
{
  dataRow = chain1[i,]
  rho = rep(0, length(dmList))
  for (i in 1:length(dmList))
  {
    rho[i] = dataRow[[paste("rho_", i-1, sep = "")]]
  }
  beta = rep(0, modelDF+ncol(X))
  for (i in 0:(modelDF+3))
  {
    beta[i+1] = dataRow[[paste("BetaP_SE_", i, sep = "")]]
  }

  S0 = c(dataRow[[paste("S_0_", maxIdx-1, sep = "")]],
         dataRow[[paste("S_1_", maxIdx-1, sep = "")]],
         dataRow[[paste("S_2_", maxIdx-1, sep = "")]],
         dataRow[[paste("S_3_", maxIdx-1, sep = "")]])
  E0 = c(dataRow[[paste("E_0_", maxIdx-1, sep = "")]],
         dataRow[[paste("E_1_", maxIdx-1, sep = "")]],
         dataRow[[paste("E_2_", maxIdx-1, sep = "")]],
         dataRow[[paste("E_3_", maxIdx-1, sep = "")]])
  I0 = c(dataRow[[paste("I_0_", maxIdx-1, sep = "")]],
         dataRow[[paste("I_1_", maxIdx-1, sep = "")]],
         dataRow[[paste("I_2_", maxIdx-1, sep = "")]],
         dataRow[[paste("I_3_", maxIdx-1, sep = "")]])
  R0 = c(dataRow[[paste("R_0_", maxIdx-1, sep = "")]],
         dataRow[[paste("R_1_", maxIdx-1, sep = "")]],
         dataRow[[paste("R_2_", maxIdx-1, sep = "")]],
         dataRow[[paste("R_3_", maxIdx-1, sep = "")]])


  return(predictEpidemic(beta,
                         X.pred,
                         dataRow$gamma_ei,
                         dataRow$gamma_ir,
                         S0,
                         E0,
                         I0,
                         R0,
                         rho,
                         offset.pred
                         ))
}

preds = lapply((nrow(chain1) - floor(nrow(chain1)/2)):
                  nrow(chain1), predict.i)


pred.dates = c(rptDate[(which.max(rptDate))],
               rptDate[(which.max(rptDate))] + seq(2,pred.days-2,2))
pred.xlim = c(min(rptDate), max(pred.dates))
lastIdx = nrow(I_star)
Guinea.Pred = preds[[1]]$I[,1]
Liberia.Pred = preds[[1]]$I[,2]
SierraLeone.Pred = preds[[1]]$I[,3]
Nigeria.Pred = preds[[1]]$I[,4]


breakpoint = mean(c(max(rptDate), min(pred.dates)))

for (predIdx in 2:length(preds))
{
   Guinea.Pred = rbind(Guinea.Pred, preds[[predIdx]]$I[,1])
   Liberia.Pred = rbind(Liberia.Pred, preds[[predIdx]]$I[,2])
   SierraLeone.Pred = rbind(SierraLeone.Pred, preds[[predIdx]]$I[,3])
   Nigeria.Pred = rbind(Nigeria.Pred, preds[[predIdx]]$I[,4])
}

Guinea.mean = apply(Guinea.Pred, 2, mean)
Liberia.mean = apply(Liberia.Pred, 2, mean)
SierraLeone.mean = apply(SierraLeone.Pred, 2, mean)
Nigeria.mean = apply(Nigeria.Pred, 2, mean)

Guinea.LB = apply(Guinea.Pred, 2, quantile, probs = c(0.05))
Guinea.UB = apply(Guinea.Pred, 2, quantile, probs = c(0.95))

Liberia.LB = apply(Liberia.Pred, 2, quantile, probs = c(0.05))
Liberia.UB = apply(Liberia.Pred, 2, quantile, probs = c(0.95))

SierraLeone.LB = apply(SierraLeone.Pred, 2, quantile, probs = c(0.05))
SierraLeone.UB = apply(SierraLeone.Pred, 2, quantile, probs = c(0.95))

Nigeria.LB = apply(Nigeria.Pred, 2, quantile, probs = c(0.05))
Nigeria.UB = apply(Nigeria.Pred, 2, quantile, probs = c(0.95))

maxI = max(c(max(c(Guinea.I.Est, Liberia.I.Est, SierraLeone.I.Est, Nigeria.I.Est)), Guinea.UB, Liberia.UB, SierraLeone.UB, Nigeria.UB))

est.idx = seq(1, length(Guinea.I.Est[1,]), 2)
pred.table1 = cbind(Guinea.I.Est[1,],
                    Liberia.I.Est[1,],
                    SierraLeone.I.Est[1,],
                    Nigeria.I.Est[1,]
                    )[est.idx,]
pred.table2 = cbind(Guinea.mean,
                    Liberia.mean,
                    SierraLeone.mean,
                    Nigeria.mean)
pred.table = rbind(pred.table1, pred.table2)
rownames(pred.table) = paste("&nbsp;", c(as.character(rptDate)[est.idx], as.character(pred.dates)),
                                          sep = "")
rownames(pred.table) = paste(rownames(pred.table), "&nbsp;", sep = "")
colnames(pred.table) = c("&nbsp;&nbsp;&nbsp;Guinea &nbsp;&nbsp;&nbsp;",
                         "&nbsp;&nbsp;&nbsp; Liberia &nbsp;&nbsp;&nbsp;",
                         "&nbsp;&nbsp;&nbsp; Sierra Leone &nbsp;&nbsp;&nbsp;",
                         "&nbsp;&nbsp;&nbsp; Nigeria &nbsp;&nbsp;&nbsp;")


figure11 = function()
{

  ## Guinea 
  par(mfrow = c(2,2))
  plot(rptDate, Guinea.I.Est[1,], ylim = c(0, maxI), xlim = pred.xlim,
       main = "Guinea Estimated Epidemic Size\n 90% Credible Interval",
       type = "l", lwd = 2, ylab = "Infectious Count", xlab = "Date")
  abline(h = seq(0,1000000,5000), lty = 2, col = "lightgrey")
  lines(rptDate, Guinea.I.Est[1,], lty = 2)
  lines(rptDate, Guinea.I.Est[2,], lty = 2)
  lines(rptDate, Guinea.I.Est[3,], lty = 2)

  lines(pred.dates,Guinea.mean,
          lty=1, col = "black", lwd = 1)
  lines(pred.dates,Guinea.LB,
          lty=2, col = "black", lwd = 1)
  lines(pred.dates,Guinea.UB,
          lty=2, col = "black", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")

  ## Liberia 
  plot(rptDate, Liberia.I.Est[1,], ylim = c(0, maxI),  xlim = pred.xlim,
       main = "Liberia Estimated Epidemic Size\n 90% Credible Interval",
       type = "l", lwd = 2, col = "blue", ylab = "Infectious Count",
       xlab = "Date")
  abline(h = seq(0,1000000,5000), lty = 2, col = "lightgrey")
  lines(rptDate, Liberia.I.Est[1,], lty = 2, col = "blue")
  lines(rptDate, Liberia.I.Est[2,], lty = 2, col = "blue")
  lines(rptDate, Liberia.I.Est[3,], lty = 2, col = "blue")

  lines(pred.dates,Liberia.mean,
          lty=1, col = "blue", lwd = 1)
  lines(pred.dates,Liberia.LB,
          lty=2, col = "blue", lwd = 1)
  lines(pred.dates,Liberia.UB,
          lty=2, col = "blue", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")

  ## Sierra Leone
  plot(rptDate, SierraLeone.I.Est[1,], ylim = c(0, maxI),  xlim = pred.xlim,
       main = "Sierra Leone Estimated Epidemic Size\n 90% Credible Interval",
       type = "l", lwd = 2, col = "red",ylab = "Infectious Count",
       xlab = "Date")
  abline(h = seq(0,1000000,5000), lty = 2, col = "lightgrey")
  lines(rptDate, SierraLeone.I.Est[1,], lty = 2, col = "red")
  lines(rptDate, SierraLeone.I.Est[2,], lty = 2, col = "red")
  lines(rptDate, SierraLeone.I.Est[3,], lty = 2, col ="red")

  lines(pred.dates,SierraLeone.mean,
          lty=1, col = "red", lwd = 1)
  lines(pred.dates,SierraLeone.LB,
          lty=2, col = "red", lwd = 1)
  lines(pred.dates,SierraLeone.UB,
          lty=2, col = "red", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")


  ## Nigeria
  plot(rptDate, Nigeria.I.Est[1,], ylim = c(0, maxI),  xlim = pred.xlim,
       main = "Nigeria Estimated Epidemic Size\n 90% Credible Interval",
       type = "l", lwd = 2, col = "green",ylab = "Infectious Count",
       xlab = "Date")
  abline(h = seq(0,1000000,5000), lty = 2, col = "lightgrey")
  lines(rptDate, Nigeria.I.Est[1,], lty = 2, col = "green")
  lines(rptDate, Nigeria.I.Est[2,], lty = 2, col = "green")
  lines(rptDate, Nigeria.I.Est[3,], lty = 2, col ="green")

  lines(pred.dates,Nigeria.mean,
          lty=1, col = "green", lwd = 1)
  lines(pred.dates,Nigeria.LB,
          lty=2, col = "green", lwd = 1)
  lines(pred.dates,Nigeria.UB,
          lty=2, col = "green", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")
}


figure11_5 = function()
{

  ## Guinea 
  par(mfrow = c(2,2))
  plot(rptDate, Guinea.I.Est[1,] + 1, ylim = c(1, maxI), xlim = pred.xlim,
       main = "Guinea Estimated Epidemic Size\n 90% Credible Interval (log scale)",
       type = "l", lwd = 2, ylab = "Infectious Count", xlab = "Date", log ="y")
  abline(h = 10^seq(0,100), lty = 2, col = "lightgrey")
  lines(rptDate, Guinea.I.Est[1,] + 1, lty = 2)
  lines(rptDate, Guinea.I.Est[2,] + 1, lty = 2)
  lines(rptDate, Guinea.I.Est[3,] + 1, lty = 2)

  lines(pred.dates,Guinea.mean + 1,
          lty=1, col = "black", lwd = 1)
  lines(pred.dates,Guinea.LB + 1,
          lty=2, col = "black", lwd = 1)
  lines(pred.dates,Guinea.UB + 1,
          lty=2, col = "black", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")

  ## Liberia 
  plot(rptDate, Liberia.I.Est[1,] + 1, ylim = c(1, maxI),  xlim = pred.xlim,
       main = "Liberia Estimated Epidemic Size\n 90% Credible Interval (log scale)",
       type = "l", lwd = 2, col = "blue", ylab = "Infectious Count",
       xlab = "Date", log ="y")
  abline(h = 10^seq(0,100), lty = 2, col = "lightgrey")
  lines(rptDate, Liberia.I.Est[1,] + 1, lty = 2, col = "blue")
  lines(rptDate, Liberia.I.Est[2,] + 1, lty = 2, col = "blue")
  lines(rptDate, Liberia.I.Est[3,] + 1, lty = 2, col = "blue")

  lines(pred.dates,Liberia.mean + 1,
          lty=1, col = "blue", lwd = 1)
  lines(pred.dates,Liberia.LB + 1,
          lty=2, col = "blue", lwd = 1)
  lines(pred.dates,Liberia.UB + 1,
          lty=2, col = "blue", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")

  ## Sierra Leone
  plot(rptDate, SierraLeone.I.Est[1,] + 1, ylim = c(1, maxI),  xlim = pred.xlim,
       main = "Sierra Leone Estimated Epidemic Size\n 90% Credible Interval (log scale)",
       type = "l", lwd = 2, col = "red",ylab = "Infectious Count",
       xlab = "Date", log ="y")
  abline(h = 10^seq(0,100), lty = 2, col = "lightgrey")
  lines(rptDate, SierraLeone.I.Est[1,] + 1, lty = 2, col = "red")
  lines(rptDate, SierraLeone.I.Est[2,] + 1, lty = 2, col = "red")
  lines(rptDate, SierraLeone.I.Est[3,] + 1, lty = 2, col ="red")

  lines(pred.dates,SierraLeone.mean + 1,
          lty=1, col = "red", lwd = 1)
  lines(pred.dates,SierraLeone.LB + 1,
          lty=2, col = "red", lwd = 1)
  lines(pred.dates,SierraLeone.UB + 1,
          lty=2, col = "red", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")


  ## Nigeria
  plot(rptDate, Nigeria.I.Est[1,] + 1, ylim = c(1, maxI),  xlim = pred.xlim,
       main = "Nigeria Estimated Epidemic Size\n 90% Credible Interval (log scale)",
       type = "l", lwd = 2, col = "green",ylab = "Infectious Count",
       xlab = "Date", log ="y")
  abline(h = 10^seq(0,100), lty = 2, col = "lightgrey")
  lines(rptDate, Nigeria.I.Est[1,] + 1, lty = 2, col = "green")
  lines(rptDate, Nigeria.I.Est[2,] + 1, lty = 2, col = "green")
  lines(rptDate, Nigeria.I.Est[3,] + 1, lty = 2, col ="green")

  lines(pred.dates,Nigeria.mean + 1,
          lty=1, col = "green", lwd = 1)
  lines(pred.dates,Nigeria.LB + 1,
          lty=2, col = "green", lwd = 1)
  lines(pred.dates,Nigeria.UB + 1,
          lty=2, col = "green", lwd = 1)
  abline(v = breakpoint, lty = 3, col= "lightgrey")
}

Convergence

As this is a Bayesian analysis in which the posterior distribution is sampled using MCMC techniques, we really need some indication that the samplers have indeed converged to the posterior distribution in order to make any inferences about the problem at hand. In the code below, we'll read in the MCMC output files created so far, plot the three chains for each of several important parameters, and take a look at the Gelman and Rubin convergence diagnostic (which should be close to 1 if the chains have converged.)

Basic Reproductive Number Calculation

A common tool for describing the evolution of an epidemic is a quantity known as the basic reproductive numer, the basic reproductive ratio, or one of several other variants on that theme. The basic idea is to quantify how many secondary infections a single infectious individual is expected to cause in a large, fully susceptible population. Naturally, when this ratio exceeds one we expect the epidemic to spread. Conversely, a basic reproductive number less than one indicates that a pathogen is more likely to die out.

In truth, there is a lot more to "basic reproductive number" calculation, especially given that different authors use different names for the same quantity, and vice versa. Here we present three different versions. First, we introduce a fairly standard "time varying basic reproductive number". This quantity is based on the intensity process parameter estimates over time, and is described in Lekone and Finkenstadt (2006). Also described in this work is the "effective" reproductive rate, which is the same quantity scaled by the number of susceptibles (almost the same in this case, due to the relatively small infectious fraction). Finally, we introduce our own, tentatively titled "empirical reproductive number". This measure follows actual transmission probabilities through time and looks at the number of secondary cases produced, on average, by one of the infectious individuals in a particular location at a particular time point. While still influenced by the parametric form chosen, this alternative approach to basic reproductive number calculation may more closely reflect population dynamics.

While the basic reproductive number is a useful quantity to know, it does not directly make any predictions about future epidemic behavior. In order to do that, we need to simulate epidemics based on the MCMC samples we have obtained and summarize their variability over time.

Epidemic Prediction

Below, we will attempt to predict the course of the epidemic through early fall. We must be cautious when making predictions about a chaotic process this far into the future. The intensity function which drives the exposure process is based on simple smooth functions of time, rather than any external information. While this is perfectly adequate for estimation, it may or may not provide good prediction performance.

It is sometimes helpful to visualize exponential growth on the log scale.

It can also be helpful to take a look at the data in tabular form.

Estimated and Predicted number of Infectious Individuals

	Guinea	Liberia	Sierra Leone	Nigeria
2014-03-27	5	1	0	0
2014-04-14	50	10	0	0
2014-04-30	49	4	0	0
2014-05-18	45	4	0	0
2014-05-29	57	3	16	0
2014-06-15	54	3	68	0
2014-06-30	39	20	105	0
2014-07-12	17	88	166	0
2014-07-23	15	141	174	0
2014-08-06	59	305	224	4
2014-08-20	78	479	257	8
2014-09-07	224	784	374	4
2014-09-21	173	1050	477	1
2014-09-25	207	1026	536	1
2014-09-27	224	1031	601	1
2014-09-29	243	1027	674	1
2014-10-01	254	1001	707	2
2014-10-03	273	1023	722	4
2014-10-05	316	1110	778	5
2014-10-07	379	1263	882	7
2014-10-09	474	1507	1050	10
2014-10-11	614	1888	1312	13
2014-10-13	827	2461	1712	17
2014-10-15	1153	3352	2331	24
2014-10-17	1665	4735	3300	34
2014-10-19	2484	6955	4853	51
2014-10-21	3831	10552	7383	78
2014-10-23	6089	16565	11618	122

Such data can also be visualized in map form, though the recent surge in predicted cases has the effect of swamping the earlier dynamics:

Total Infection Size - Estimated and Predicted:

Day:

Conclusions

This epidemic is evolving extremely rapidly. As of 8/12, it looked like the situation in Liberia was set to continue worsening, and that Guinea is at risk of the same (though not to nearly the same degree). On the other hand, the epidemic in Sierra Leone appeared to be leveling off (though not disappearing). As of 8/28, these look like reasonable predictions, however the models appear to have resumed predicting a fairly catastrophic continued spread, especially in liberia. In particular, the models predict that the epidemic will take off in Nigeria, as the countries are assumed in this case to share several intensity parameters. We may hope that this particular simplifying assumption is invalid, however it is not a hopeful sign that WHO predictions are also becoming catastrophic. These models can not anticipate public health interventions and sudden changes in governmental policy and individual behavior, but recent news from the region gives little reason to hope for a swift end to the epidemic. It is more important now than ever to support the efforts of involved governmental and non-governmental organizations like the WHO and MSF

That wraps up the analyses for now. This document will continue to be updated as the epidemic progresses, reflecting new data and perhaps additional analysis techniques. As the document is tracked via source control it will be easy to see how well past predictions held up and how they change in response to new information. Questions and comments can be shared here

Estimating and Predicting Epidemic Behavior for the 2014 West African Ebola Outbreak

A Quick Stochastic Spatial SEIR Modeling Approach

Grant Brown

Jacob Oleson

Last Updated: 2014-10-02

Table of Contents