cross-validation and out-of-sample mean squared-error #206

phelps-sg · 2020-05-10T13:18:22Z

Related to #194, I could not find out how the model output is validated against out-of-sample data in the model overview or any of the cited papers (apologies if missed). Do we need some code to perform e.g. k-fold-cross-validation using the latest time-series data on numbers of infected etc.? It would be good to see some out-of-sample MSE calculations to check for over-fitting or model bias.

bbolker · 2020-05-10T16:59:18Z

This is a fine idea, but if you're not familiar with time series modeling (sorry if you are and I'm telling you things you already know) you should note that cross-validating time series fits can be tricky, e.g. see here. Naive cross-validation doesn't in general work, instead you need to set up some way to do 'n-step-ahead' prediction (1-step-ahead is often overoptimistic) on unused data and compare the fitted and observed values.

phelps-sg · 2020-05-10T18:49:40Z

Yes I have some familiarity with time-series modelling, but I am not very familiar with this particular model, particularly how it is calibrated. I think the appropriate cross-validation methodology will depend on what variables in the model we consider as predictors, which are responses, whether the model is auto-regressive (I am not sure I understand #194, but it seems to suggest the use of an auto-regressive or recurrent approach). Also, if we include spatially-distributed data in the validation, then we should also consider spatial auto-correlation.

I think the most important starting might be to get at least some out-of-sample MSE calculations, so just calculating the MSE for a single block of held-back validation data would be a good start. Does anybody know if this has been done already?

insidedctm · 2020-05-10T19:02:04Z

The cited papers are in the top-level README. The code is for a mathematical model that simulates the evolution of a pandemic. It is not a statistical or machine learning model although some of the input parameters will have been derived from statistical modelling.

bbolker · 2020-05-10T19:05:17Z

+100 to @insidedctm's comment. That said, it would be interesting and valuable to formalize the calibration procedure and quantify how reliable it is (for example, first by running the calibration procedure as described in the various papers on data simulated from the model itself, then by running on subsets of the time-series data and quantifying forecast accuracies).

phelps-sg · 2020-05-10T20:13:21Z

In reply to @insidedctm, I do not understand what you mean. Any mathematical model (simulation or otherwise) that takes input parameters (predictors) and has corresponding dependent variables (responses) which represent things in the real-world is, by definition, a statistical model; you provide some inputs to the model, and you get some corresponding quantitative outputs. These quantitative outputs corresponding to measurable things in the real-world, which can be compared against actual data. In common with many other statistical models your dependent variables are stochastic. Just because the mapping between predictors and responses is implemented in C++ instead of an equation, and just because it is an individual-based model, that does not mean that it no longer meets the definition of a statistical model.

Moreover, as I understand it the simulation is being used to make quantitative predictions about what will happen to e.g. infection rates, if certain parameters are changed. If that it is the case then it is vitally important to validate the model by comparing its predictions with reality. Or are you claiming that this model should not be used to make quantitative predictions - is this what you mean by "it is not a statistical model"?

There is a plenty of literature on calibration, estimation and validation of agent-based models (agent-based models are another word for individual-based simulation model). I don't see any reason why this ABM in particular can't or shouldn't be validated against empirical data

insidedctm · 2020-05-10T20:28:59Z

No you are quite right and that was slightly sloppy language. However I took from your use of terms like K-fold cross-validation that you believed this to be some sort of statistical regression model; there is no data to hold out so cross-validation isn't applicable. Something along the lines @bbolker 's suggestion would be interesting.

phelps-sg · 2020-05-10T21:11:18Z

Ok I'm confused. As I understand it the model is initially calibrated so that "it reproduced the observed cumulative number of deaths in GB or the US seen by 14thMarch 2020". So we use some training data (cumulative deaths D_t0 at t0=14/3/20), and then we can predict the number of deaths forward at some later date \hat{D}_t1 by running the model? We can then validate the model by computing (\hat{D}_t1 - \D_t1)^2. I would bet that if you carry on running the model in time that the squared errors will start to increase a lot at later dates. Therefore it might make sense to regularly recalibrate the model at later points in time as per #194. If such a methodology is followed it will be necessary to think carefully about how to divide data D_t into training and validation sets?

bbolker · 2020-05-10T21:17:17Z

Yes, exactly. That's the point of the link I posted above. The model is almost certainly not "autoregressive" in the sense that would make naive splits into training and validation subsets suitable. See here for some starting points in thinking about validating/calibrating epidemic models during the course of an epidemic.

phelps-sg · 2020-05-10T21:46:19Z

The Funk et al. paper looks like an excellent basis for a validation methodology. I also like their suggestion to compare out-of-sample errors against those obtained from a simpler null model.

insidedctm · 2020-05-10T22:08:01Z

See here for some starting points in thinking about validating/calibrating epidemic models during the course of an epidemic.

Really interesting paper, thank you. One further complication that would need to be dealt with is how do you validate forecasts when days after the forecast is made large-scale social changes are enacted. What you have with, for example, Report 9 is a whole variety of forecasts for different input parameters and different NPIs. At best only one can be the valid forecast - how do you select which one is a valid forecast?

bbolker · 2020-05-11T02:00:05Z

You have to distinguish between scenarios (expected outcome if ...) and forecasts. You raise some important issues, but these are all standard (and basically intractable) problems in epidemic forecasting. The best you can ever do is show that the machine would work if you correctly specified the inputs ...

phelps-sg · 2020-05-11T08:19:03Z

@bbolker, regarding the auto-regressive form or otherwise of the model, is it not the case that the model is being used to predict deaths D_t as a function of previous deaths D_{t-i}, and could do so on rolling basis e.g.
D_t = f(D_{t-i}) + \epsilon ?

I know that in this model f() is non-linear, so strictly speaking it is not an autoregressive model, but if a model has this form then time-series cross-validation would still be a valid approach?

bbolker · 2020-05-11T17:44:49Z

It's a nice idea, but I doubt it's that simple. Here's another paper on testing calibration of epidemic models (albeit in a much simpler framework); most of the emphasis is on calibration of parameter estimates rather than of forecasts, but the same ideas apply. With actual data I don't think you're going to be able to do better than calibrating to progressively longer subsets and evaluating the accuracy of n-step-ahead forecasts for each subset. For more general purposes you can simulate as many epidemics from the model as you like and see how well the calibration procedure works for the simulated dynamics.

NeilFerguson · 2020-05-21T20:13:30Z

An interesting discussion. We have included simple calibration (to approximately match a defined cumulative number of deaths by a particular time) within the code, but more statistically rigorous calibration is arguably best done in two ways: (a) estimating input parameters (eg transmission rates in schools) from epidemiological data using simpler models more suited to inference; (b) using wrapper code to run thousands of model runs across a parameter grid (eg spanning R0 and intervention effectiveness) and then evaluating the fit of each model run against available data using a likelihood function. We are doing both. For the latter - fitting to UK hospitalizations, ICU demand and deaths. Computationally intensive though. For most purposes it is better to use simpler (eg age structured SEIR compartmental) models for such things. We are doing it because there is a need to match current transmission patterns quite closely when looking at the logistical implications of exit strategies (eg track and trace). We will add the input files for those runs to the repo in coming weeks.

This was referenced May 13, 2020

Explicitly state the purpose and the goals of the modeling in the readme #246

Closed

Decompose model into ensemble of simpler models #247

Closed

Define metrics for the quality of the simulation results #248

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cross-validation and out-of-sample mean squared-error #206

cross-validation and out-of-sample mean squared-error #206

phelps-sg commented May 10, 2020

bbolker commented May 10, 2020

phelps-sg commented May 10, 2020

insidedctm commented May 10, 2020

bbolker commented May 10, 2020

phelps-sg commented May 10, 2020 •

edited

Loading

insidedctm commented May 10, 2020

phelps-sg commented May 10, 2020

bbolker commented May 10, 2020

phelps-sg commented May 10, 2020 •

edited

Loading

insidedctm commented May 10, 2020

bbolker commented May 11, 2020

phelps-sg commented May 11, 2020 •

edited

Loading

bbolker commented May 11, 2020

NeilFerguson commented May 21, 2020

cross-validation and out-of-sample mean squared-error #206

cross-validation and out-of-sample mean squared-error #206

Comments

phelps-sg commented May 10, 2020

bbolker commented May 10, 2020

phelps-sg commented May 10, 2020

insidedctm commented May 10, 2020

bbolker commented May 10, 2020

phelps-sg commented May 10, 2020 • edited Loading

insidedctm commented May 10, 2020

phelps-sg commented May 10, 2020

bbolker commented May 10, 2020

phelps-sg commented May 10, 2020 • edited Loading

insidedctm commented May 10, 2020

bbolker commented May 11, 2020

phelps-sg commented May 11, 2020 • edited Loading

bbolker commented May 11, 2020

NeilFerguson commented May 21, 2020

phelps-sg commented May 10, 2020 •

edited

Loading

phelps-sg commented May 10, 2020 •

edited

Loading

phelps-sg commented May 11, 2020 •

edited

Loading