Merge pull request #15 from lanl/lauren_branch

add troubleshooting example
lanl · Apr 24, 2024 · 2f6edeb · 2f6edeb
2 parents fc53f9a + c522107
commit 2f6edeb
Showing 1 changed file with 153 additions and 0 deletions.
diff --git a/examples/troubleshooting_convergence.ipynb b/examples/troubleshooting_convergence.ipynb
@@ -0,0 +1,153 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "73a88bcb-ffcb-4b96-90e4-b8422c627f3c",
+   "metadata": {},
+   "source": [
+    "# Troubleshooting Convergence and Other Problems\n",
+    "For some particularly tough calibration problems, even the sophisticated parallel tempering MCMC algorithm used by IMPALA can run into problems. In this document, we provided some insight and guidance to diagnose and resolve these problems. \n",
+    "\n",
+    "IMPALA has a lot of knobs and buttons that more experienced users can adjust to tune the algorithm performance to support their desired goals. First, it is worth clarifying what those goals commonly are and how our goal may impact our evaluation of the model fit:\n",
+    "\n",
+    "1. Point Prediction: For many applications, we are not very interested in the full posterior distribution of theta; rather, we want to obtain a single \"good\" specification of theta with respect to prediction in the training data. We have found that pooled calibration can often produce good results on this front (for example, by identifying the posterior draw of theta that best predicts the observed data) even if the parameter mixing and other MCMC diagnostics are concerning. \n",
+    "\n",
+    "2. Predictions with Uncertainty: Alternatively, we may be interested in predicting new experimental outcomes with uncertainty. Unless the experimental data is collected with very little in the way of batch effects and the computer model is very well-specified with a shared parameter value across experiments, a standard pooled calibration without accounting for model discrepancy may often produce sub-par predictive distributions with too-narrow credible intervals. Users can address model discrepancy in several ways. Firstly, discrepancy can be explicitly modeled as a linear combinations of user-defined basis functions. In many applications, however, we would like to generate a posterior distribution of theta which absorbs the model discrepancy through additional uncertainty. In this case, hierarchical calibration can be used to generate a broader predictive distribution that better captures the observed variation in the data. In practice, however, hierarchical calibration can sometimes produce overly wide and unrealistic credible intervals. In this case, we might recommend a clustered calibration, which tends to be a middle-ground between a fully flexible hierarhical calibration and a restrictive pooled analysis. \n",
+    "\n",
+    "3. Inference on theta and/or discrepancy: For this goal, the sampler mixing properties may be particularly crucial and should be carefully evaluated. Pooled calibration (with or without discrepancy as needed) may provide a useful approach to estimation of a posterior distribution for theta. However, we also recommend that users explore hierarchical/clustered calibration as a sensitivity analysis to ensure that the pooled calibration analysis has not produced unrealistically narrow credible intervals for theta. \n",
+    "\n",
+    "With all of these things in mind, we turn to some common problems and how we can approach addressing them. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4687f99a-abf9-4d39-b5b5-27030ee6e9e1",
+   "metadata": {},
+   "source": [
+    "### My trace plots look \"weird\"!\n",
+    "\n",
+    "The ideal trace plots look like fuzzy caterpillars laying flat after an initial chunk of iterations, as illustrated below:\n",
+    "\n",
+    "![trace](./images/trace_good.png)\n",
+    "\n",
+    " In real world calibrations, however, trouble with the MCMC sampler can manifest as strong autocorrelation between MCMC iterations, posterior samples that wander without convergence, etc. An example of a very concerning trace plot is provided below:\n",
+    "\n",
+    "![trace](./images/trace_bad.png)\n",
+    "\n",
+    "This type of poor MCMC behavior can be explained by many different issues, and it is difficult to provide a general set of solutions for mixing and convergence problems. Below, we highlight several common culprits and provide some ideas for how to improve thingg:\n",
+    "\n",
+    "1. The IMPALA MCMC algorithm uses parallel tempering to explore the posterior surface. While this algorithm is often well-suited for many complicated posterior surfaces, we can still sometimes run into issues. The IMPALA output objects include information about the parallel tempering and how often the tempering chains crossed. These outputs can be plotted and used to pinpoint issues with the tempering. In general, however, issues with the parallel tempering can often be address by specifying a finer grid of temperatures in setTemperatureLadder and/or increasing the range of the temperatures. \n",
+    "\n",
+    "2. Calibrations that use observational data of very diverse sizes (e.g., one dataset with 10 points and another with 10K) can sometimes result in posterior surfaces that are hard to explore. One simple solution is to either sub-sample the observed data to ensure similar dataset sizes across experiments or to interpolate more sparsely sampled data to artificially inflate the sample size. \n",
+    "\n",
+    "3. Consider your modeling assumptions. A pooled analysis assumes that the same theta parameter value is shared across experiments, while a hierarchical analysis allows the theta values to differ between experiments. A clustered analysis assumes that there are discrete subpopulations of experiments with shared values. Does the model structure make sense for your data and for your analytical goals? Do you need to consider model discrepancy? Do the prior distribution hyperparameters make sense for your data? Below, we list some tunable hyperparameters that experienced users can modify:\n",
+    "\n",
+    "* In the model call (e.g., ModelF, ModelMaterialStrength, etc.): \n",
+    "    * s2: this option allows users to specify how and whether the experimental noise level should be estimated. MCMC sampler issues can sometimes be mitigated by fixing s2 to some user-specified value. If s2 = \"fix\", then the sd_est value specified in addVecExperiments is fixed for all MCMC iterations. \n",
+    "\n",
+    "* In addVecExperiments: \n",
+    "    * sd_est: The initial values for the observation noise standard deviation\n",
+    "    * s2_df: the initial values for the s2 Inverse Gamma prior degrees of freedom.\n",
+    "    * discrep_tau: the fixed prior variance for the discrepancy basis coefficients controlling the amount of shrinkage toward zero\n",
+    "\n",
+    "* In setMCMC:\n",
+    "    * decor: Number of iterations between each decorrelation step. Reducing this number may help hierarhical and clustered calibration to move around the parameter space at a steep computational cost\n",
+    "    * start_var_theta: initial variance of adaptive MCMC proposal distributions for theta. Can be increased from default if posterior samples of theta are stuck at a single value across many iterations\n",
+    "    * start_tau_theta: np.exp(start_tau_theta) is the initial scaling factor for the adaptive MCMC proposal covariance for theta. This can be tuned to modify acceptance rates, e.g., by making this smaller for samplers that get stuck at a single value across many iterations\n",
+    "    * start_var_ls2: initial variance of adaptive MCMC proposal distributions for log(s2), i.e. the log of the observation error/noise standard deviation. Can be increased from default if posterior samples of theta are stuck at a single value across many iterations\n",
+    "    * start_tau_ls2: np.exp(start_tau_ls2) is the initial scaling factor for the adaptive MCMC proposal covariance for log(s2). \n",
+    "    * start_adapt_iter: MCMC iteration at which to start adapting the MCMC proposal distributions. \n",
+    "\n",
+    "* In setHierPriors: (Hierarchical model assumes experiment-specific thetai ~ Normal(theta0, Sigma0))\n",
+    "    * theta0_prior_mean: the prior mean for the calibration parameter theta0. This can usually be left as default. \n",
+    "    * theta0_prior_cov: the prior covariance for theta0, usually np.eye(self.p)*user_defined_prior_variance. This could be adjusted as needed. \n",
+    "    * Sigma0_prior_df: prior degrees of freedom for the Inverse Wishart prior for Sigma0, at least 1 + self.p, where larger values generally indicate theta_i values closer to theta_0\n",
+    "    * Sigma0_prior_scale: prior scale for the Inverse Wishart prior for Sigma0, where smaller values generally indicate theta_i values closer to theta_0\n",
+    "\n",
+    "* In setClusterPriors (Clustered model assumes experiment-speific thetai ~ DirichletProcess(G0, eta) where G0 = Normal(theta0, Sigma0)): \n",
+    "    * nclustmax: the maximum number of unique theta values to estimate (i.e., maximum number of clusters). If this is too small, this can create sampling problems\n",
+    "    * eta_prior_shape : prior shape for the Gamma prior for eta, the concentration parameter for the Dirichlet Process. Lower values indicate a higher propensity for experiments to cluster together. \n",
+    "    * eta_prior_rate :  prior rate for the Gamma prior for eta, the concentration parameter for the Dirichlet Process. Lower values indicate a higher propensity for experiments to cluster together. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "55db94be",
+   "metadata": {},
+   "source": [
+    "### My calibrated model isn't predicting my training data well!\n",
+    "\n",
+    "This problem can be caused by several underlying issues. \n",
+    "\n",
+    "1. One reason for poor fit is simply that your computer model does not fully capture the real data dynamics. This is known as model discrepancy, and this deviation between real world data and computer model outputs can be explicitly modeled through model discrepancy within the pooled calibration. See ex_bayarri_discrepancy for an example. Note: Experimental batch effects (e.g., vertical shifts) can also be viewed in terms of discrepancy and explicitly modeled. \n",
+    "\n",
+    "2. Another common problem relates to computer model emulation. If you calibrated using an emulated computer model, make sure to check that the emulation model is working as expected. Lack of fit in the emulation modeling may translate into problems with the calibartion. \n",
+    "\n",
+    "3. Poor prediction can also be a result of a poorly-fit or poorly-converged model. Check the parameter convergence and the performance of the parallel tempering as described in the previous section.\n",
+    "\n",
+    "4. Consider the IMPALA modeling assumptions you have made. Did you assume a constant noise level across experiments that enhibit differing noise? Do the prior distributions and corresponding hyperparameter specifications make sense for your data? \n",
+    "\n",
+    "5. Are you using a burn in? MCMC samplers can take many iterations to converge to the final stationary distribution. By examining trace plots, users can determine a reasonable cutoff before which all MCMC iterations are just tossed out. Note: the nburn option in IMPALA is deprecated and does not implement an MCMC burn in. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71da68e9",
+   "metadata": {},
+   "source": [
+    "### My predictive intervals are so wide! \n",
+    "\n",
+    "When you examine the posterior predictive distribution (i.e., predictions from our computer model and/or discrepancy evaluated at each sample from our posterior parent distribution), you may find that the resulting posterior intervals can sometimes be very (unreasonably?) wide. If this is a result of a hierarchical analysis, one approach to mitigate this behaviour is to perform a clustered analysis, where experiments are strongly encouraged to cluster (and share theta values) with other experiments. \n",
+    "\n",
+    "It is worth considering, however, whether the calibration is giving you the result you asked for; do your predictive intervals need to be wide to capture the variation in the data. In hierachical and clustered calibrations where model discrepancy is absorbed into variation in the input parameter theta, it is often not unreasonable to have large predictive uncertainty. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a48989d",
+   "metadata": {},
+   "source": [
+    "### My posterior distribution for theta is hitting up against the 0/1 boundaries\n",
+    "\n",
+    "This indicates that the bounds you set for your calibration inputs may be too narrow. Consider making them wider. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0905427a",
+   "metadata": {},
+   "source": [
+    "### My calibration is taking a really long time!\n",
+    "\n",
+    "Calibrations with my addVecExperiments calls to set up the calibration object can often run much more slowly. This is because this casts the observed data into a structure that does not leverage vectorized model evaluations. If possible, we recommend recasting multiple data vectors into a single long data vector entered into the model using a single addVecExperiments call. The IMPALA model object used to obtain computer model predictions would also have to be modified to provide predictions for the longer observation vector (e.g., by using ModelF to write a custom computer model prediction function). \n",
+    "\n",
+    "If you are calibrating using a custom computer model evaluated using ModelF, consider whether your computer model is too slow and whether an emulator is needed to speed up the computation.\n",
+    "\n",
+    "Calibrations using calibClust tend to be substantially slower than their hierarchical and pooled counterparts. This is a topic of continued code development. \n",
+    "\n",
+    "For very large observed datasets, the current specification of ModelF can be slow. See ModelF_bigdata for a version tailored to the bigger data setting. "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}