From 4556bd0d0867def6db22a424a0fbbbee7d882b3c Mon Sep 17 00:00:00 2001 From: PaulJonasJost Date: Tue, 5 Mar 2024 14:21:04 +0100 Subject: [PATCH] removed workflow_comparison.ipynb --- doc/example.rst | 1 - doc/example/workflow_comparison.ipynb | 876 -------------------------- test/run_notebook.sh | 1 - 3 files changed, 878 deletions(-) delete mode 100644 doc/example/workflow_comparison.ipynb diff --git a/doc/example.rst b/doc/example.rst index b1fbbae5e..a6f694110 100644 --- a/doc/example.rst +++ b/doc/example.rst @@ -27,7 +27,6 @@ Getting started example/getting_started.ipynb example/custom_objective_function.ipynb - example/workflow_comparison.ipynb PEtab and AMICI --------------- diff --git a/doc/example/workflow_comparison.ipynb b/doc/example/workflow_comparison.ipynb deleted file mode 100644 index 7e48aaaee..000000000 --- a/doc/example/workflow_comparison.ipynb +++ /dev/null @@ -1,876 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# pyPESTO vs no pyPESTO\n", - "\n", - "The objectives of this notebook are twofold:\n", - "\n", - "1. **General Workflow:** We walk step-by-step through a process to estimate parameters of dynamical models. By following this workflow, you will gain a clear understanding of the essential steps involved and how they contribute to the overall outcome.\n", - "\n", - "2. **Benefits of pyPESTO:** Throughout the notebook, we highlight the key advantages of using pyPESTO in each step of the workflow, compared to \"doing things manually\". By leveraging its capabilities, you can significantly increase efficiency and effectiveness when solving your parameter optimization tasks.\n", - "\n", - "This notebook is divided into several sections, each focusing on a specific aspect of the parameter estimation workflow. Here's an overview of what you find in each section:\n", - "\n", - "**Contents**\n", - "\n", - "1. **Objective Function:** We discuss the creation of an objective function that quantifies the goodness-of-fit between a model and observed data. We will demonstrate how pyPESTO simplifies this potentially cumbersome process and provides various options for objective function definition.\n", - "\n", - "2. **Optimization:** We show how to find optimal model parameters. We illustrate the general workflow and how pyPESTO allows to flexibly use different optimizers and to analyze and interpret results.\n", - "\n", - "3. **Profiling:** After a successful parameter optimization, we show how pyPESTO provides profile likelihood functionality to assess uncertainty and identifiability of selected parameters.\n", - "\n", - "4. **Sampling:** In addition to profiles, we use MCMC to sample from the Bayesian posterior distribution. We show how pyPESTO facilitates the use of different sampling methods.\n", - "\n", - "5. **Result Storage:** This section focuses on storing and organizing the results obtained from the parameter optimization workflow, which is necessary to keep results for later processing and sharing.\n", - "\n", - "By the end of this notebook, you'll have gained valuable insights into the parameter estimation workflow for dynamical models. Moreover, you'll have a clear understanding of the benefits pyPESTO brings to each step of this workflow. This tutorial will equip you with the knowledge and tools necessary to streamline your parameter estimation tasks and obtain accurate and reliable results." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# install dependencies\n", - "#!pip install pypesto[amici,petab]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - }, - "tags": [] - }, - "outputs": [], - "source": [ - "# imports\n", - "import logging\n", - "import os\n", - "import random\n", - "from pprint import pprint\n", - "\n", - "import amici\n", - "import matplotlib as mpl\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "import petab\n", - "import scipy.optimize\n", - "from IPython.display import Markdown, display\n", - "\n", - "import pypesto.optimize as optimize\n", - "import pypesto.petab\n", - "import pypesto.profile as profile\n", - "import pypesto.sample as sample\n", - "import pypesto.store as store\n", - "import pypesto.visualize as visualize\n", - "import pypesto.visualize.model_fit as model_fit\n", - "\n", - "mpl.rcParams['figure.dpi'] = 100\n", - "mpl.rcParams['font.size'] = 18\n", - "\n", - "# for reproducibility\n", - "random.seed(1912)\n", - "np.random.seed(1912)\n", - "\n", - "# name of the model\n", - "model_name = \"boehm_JProteomeRes2014\"\n", - "\n", - "# output directory\n", - "model_output_dir = \"tmp/\" + model_name" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. Create an objective function\n", - "\n", - "As application problem, we consider the model by [Böhm et al., JProteomRes 2014](https://pubs.acs.org/doi/abs/10.1021/pr5006923), which describes, trained on quantitative mass spectronomy data, the process of dimerization of phosphorylated STAT5A and STAT5B, important transductors of activation signals of cytokine receptors to the nucleus. The model is available via the [PEtab benchmark collection](https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab). For simulation, we use [AMICI](https://github.com/AMICI-dev/AMICI), an efficient ODE simulation and sensitivity calculation routine. [PEtab](https://github.com/PEtab-dev/PEtab) is a data format specification that standardises parameter estimation problems in systems biology.." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Without pyPESTO\n", - "\n", - "To fit an (ODE) model to data, the model needs to be implemented in a simulation program as a function mapping parameters to simulated data. Simulations must then be mapped to experimentally observed data via formulation of a single-value cost function (e.g. squared or absolute differences, corresponding to a normal or Laplace measurement noise model).\n", - "Loading the model via PEtab and AMICI already simplifies these stepss substantially compared to encoding the model and the objective function manually:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "%%capture\n", - "# PEtab problem loading\n", - "petab_yaml = f\"./{model_name}/{model_name}.yaml\"\n", - "\n", - "petab_problem = petab.Problem.from_yaml(petab_yaml)\n", - "\n", - "# AMICI model complilation\n", - "amici_model = amici.petab_import.import_petab_problem(\n", - " petab_problem, force_compile=True\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "AMICI allows us to construct an objective function from the PEtab problem, already considering the noise distribution assumed for this model. We can also simulate the problem for a parameter with this simple setup." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "# Simulation with PEtab nominal parameter values\n", - "print(\"PEtab benchmark parameters\")\n", - "pprint(amici.petab_objective.simulate_petab(petab_problem, amici_model))\n", - "\n", - "# Simulation with specified parameter values\n", - "parameters = np.array([-1.5, -5.0, -2.2, -1.7, 5.0, 4.2, 0.5, 0.8, 0.5])\n", - "ids = list(amici_model.getParameterIds())\n", - "ids[6:] = [\"sd_pSTAT5A_rel\", \"sd_pSTAT5B_rel\", \"sd_rSTAT5A_rel\"]\n", - "\n", - "print(\"Individualized parameters\")\n", - "pprint(\n", - " amici.petab_objective.simulate_petab(\n", - " petab_problem,\n", - " amici_model,\n", - " problem_parameters={x_id: x_i for x_id, x_i in zip(ids, parameters)},\n", - " scaled_parameters=True,\n", - " )\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We see that to call the objective function, we need to supply the parameters in a dictionary format. This is not really suitable for parameter estimation, as e.g. optimization packages usually work with (numpy) arrays. Therefore we need to create some kind of parameter mapping." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "class Objective:\n", - " \"\"\"\n", - " A very basic implementation to an objective function for AMICI, that can call the objective function just based on the parameters.\n", - " \"\"\"\n", - "\n", - " def __init__(self, petab_problem: petab.Problem, model: amici.Model):\n", - " \"\"\"Constructor for objective.\"\"\"\n", - " self.petab_problem = petab_problem\n", - " self.model = model\n", - " self.x_ids = list(self.model.getParameterIds())\n", - " # nned to change the names for the last ones\n", - " self.x_ids[6:] = [\"sd_pSTAT5A_rel\", \"sd_pSTAT5B_rel\", \"sd_rSTAT5A_rel\"]\n", - "\n", - " def x_dct(self, x: np.ndarray):\n", - " \"\"\"\n", - " Turn array of parameters to dictionary usable for objective call.\n", - " \"\"\"\n", - " return {x_id: x_i for x_id, x_i in zip(self.x_ids, x)}\n", - "\n", - " def __call__(self, x: np.ndarray):\n", - " \"\"\"Call the objective function\"\"\"\n", - " return -amici.petab_objective.simulate_petab(\n", - " petab_problem,\n", - " amici_model,\n", - " problem_parameters=self.x_dct(x),\n", - " scaled_parameters=True,\n", - " )[\"llh\"]\n", - "\n", - "\n", - "# Test it out\n", - "obj = Objective(petab_problem, amici_model)\n", - "pprint(obj(parameters))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Summary\n", - "\n", - "We have constructed a very basic functioning objective function for this specific problem. AMICI and PEtab already simplify the workflow compared to coding the objective from scratch, however there are still some shortcomings. Some things that we have not yet considered and that would require additional code are:\n", - "\n", - "* What if we have **multiple simulation conditions**? We would have to adjust the parameter mapping, to use the correct parameters and constants for each condition, and aggregate results.\n", - "* What if our system starts in a **steady state**? In this case we would have to preequilibrate, something that AMICI can do, but we would need to account for that additionally (and it would most likely also include an additional simulation condition).\n", - "* For later analysis we would like to be able to not only get the objective function but also the **residuals**, something that we can change in AMICI but we would have to account for this flexibility additionally.\n", - "* If we **fix a parameter** (i.e. keeping it constant while optimizing the remaining parameters), we would have to create a different parameter mapping (same for unfixing a parameter).\n", - "* What if we want to include **prior knowledge** of parameters?\n", - "* AMICI can also calculate **sensitivities** (`sllh` in the above output). During parameter estimation, the inference (optimization/sampling/...) algorithm typically calls the objective function many times both with and without sensitivities. Thus, we need to implement the ability to call e.g. function value, gradient and Hessian matrix (or an approximation thereof), as well as combinations of these for efficiency.\n", - "\n", - "This is most likely not the complete list but can already can get yield quite substantial coding effort. While each problem can be tackled, it is a lot of code lines, and we would need to rewrite it each time if we want to change something (or invest even more work and make the design of the objective function flexible).\n", - "\n", - "In short: **There is a need for a tool that can account for all these variables in the objective function formulation**." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### With pyPESTO\n", - "\n", - "All the above is easily addressed by using pyPESTO's objective function implementation. We support a multitude of objective functions (JAX, Aesara, AMICI, Julia, self-written). For PEtab models with AMICI, we take care of the parameter mapping, multiple simulation conditions (including preequilibration), changing between residuals and objective function, fixing parameters, and sensitivity calculation.\n", - "\n", - "While there is a lot of possibility for individualization, in its most basic form creating an objective from a petab file accounting for all of the above boils down to just a few lines in pyPESTO:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "petab_yaml = f\"./{model_name}/{model_name}.yaml\"\n", - "\n", - "petab_problem = petab.Problem.from_yaml(petab_yaml)\n", - "\n", - "# import the petab problem and generate a pypesto problem\n", - "importer = pypesto.petab.PetabImporter(petab_problem)\n", - "problem = importer.create_problem()\n", - "\n", - "# call the objective to get the objective function value and (additionally) the gradient\n", - "problem.objective(parameters, sensi_orders=(0, 1))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. Optimization\n", - "\n", - "After creating our objective function, we can now set up an optimization problem to find model parameters that best describe the data. For this we will need\n", - "* parameter bounds (to restrict the search area; these are usually based on domain knowledge)\n", - "* startpoints for the multistart local optimization (as dynamical system constrained optimization problems are in most cases non-convex)\n", - "* an optimizer" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Without pyPESTO" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "# bounds\n", - "ub = 5 * np.ones(len(parameters))\n", - "lb = -5 * np.ones(len(parameters))\n", - "\n", - "# number of starts\n", - "n_starts = 25\n", - "\n", - "# draw uniformly distributed parameters within these bounds\n", - "x_guesses = np.random.random((n_starts, len(lb))) * (ub - lb) + lb\n", - "\n", - "# optimize\n", - "results = []\n", - "for x0 in x_guesses:\n", - " results.append(\n", - " scipy.optimize.minimize(\n", - " obj,\n", - " x0,\n", - " bounds=zip(lb, ub),\n", - " tol=1e-12,\n", - " options={\"maxfun\": 500},\n", - " method=\"L-BFGS-B\",\n", - " )\n", - " )\n", - "pprint(results)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We might want to change the optimizer, like e.g. [NLopt](https://nlopt.readthedocs.io/en/latest/)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import nlopt\n", - "\n", - "opt = nlopt.opt(\n", - " nlopt.LD_LBFGS, len(parameters)\n", - ") # only one of many possible options\n", - "\n", - "opt.set_lower_bounds(lb)\n", - "opt.set_upper_bounds(ub)\n", - "\n", - "\n", - "def nlopt_objective(x, grad):\n", - " \"\"\"We need a wrapper function of the kind f(x,grad) for nlopt.\"\"\"\n", - " r = obj(x)\n", - " return r\n", - "\n", - "\n", - "opt.set_min_objective(nlopt_objective)\n", - "\n", - "\n", - "results = []\n", - "for x0 in x_guesses:\n", - " try:\n", - " result = opt.optimize(x0)\n", - " except (\n", - " nlopt.RoundoffLimited,\n", - " nlopt.ForcedStop,\n", - " ValueError,\n", - " RuntimeError,\n", - " MemoryError,\n", - " ) as e:\n", - " result = None\n", - " results.append(result)\n", - "\n", - "pprint(results)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can already see that the NLopt library takes different arguments and has a different result output than scipy. In order to be able to compare them, we need to modify the code again. We would at the very least like the end objective function value, our starting value and some kind of exit message." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "results = []\n", - "for x0 in x_guesses:\n", - " try:\n", - " result = opt.optimize(x0)\n", - " msg = 'Finished Successfully.'\n", - " except (\n", - " nlopt.RoundoffLimited,\n", - " nlopt.ForcedStop,\n", - " ValueError,\n", - " RuntimeError,\n", - " MemoryError,\n", - " ) as e:\n", - " result = None\n", - " msg = str(e)\n", - " res_complete = {\n", - " \"x\": result,\n", - " \"x0\": x0,\n", - " \"fun\": opt.last_optimum_value(),\n", - " \"message\": msg,\n", - " \"exitflag\": opt.last_optimize_result(),\n", - " }\n", - " results.append(res_complete)\n", - "\n", - "pprint(results)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is smoothly running and does not take too much code. But again, we did not consider quite a few things regarding this problem:\n", - "\n", - "* The Optimizer internally uses **finite differences**, which is firstly inefficient and secondly inaccurate, especially for very stiff models. Constructing sensitivities and integrating them into the optimization can be quite tedious.\n", - "* There is no **tracking of the history**, we only get the end points. If we want to analyze this in more detail we need to implement this into the objective function.\n", - "* Many times, especcially for larger models, we might want to **change the optimizer** depending on the performance. For example, for some systems other optimizers might perform better, e.g. a global vs a local one. A detailed analysis on this would require some setup, and each optimizer takes arguments in a different form.\n", - "* For bigger models and more starts, **parallelization** becomes a key component to ensure efficency.\n", - "* Especially when considering multiple optimizers, the lack of a **quasi standardised result format** becomes apparent und thus one would either have to write a proper result class or individualize all downstream analysis for each optimizer." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### With pyPESTO\n", - "\n", - "Using pyPESTO, all the above is easily possible. A `pypesto.engine.MultiProcessEngine` allows to use parallelization, and `optimize.ScipyOptimizer` specifies to use a scipy based optimizer. Alternatively, e.g. try `optimize.FidesOptimizer` or `optimize.NLoptOptimizer`, all with consistent calls and output formats. The results of the single optimizer runs are filled into a unified pyPESTO result object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "results_pypesto = optimize.minimize(\n", - " problem=problem,\n", - " optimizer=optimize.ScipyOptimizer(),\n", - " n_starts=n_starts,\n", - " engine=pypesto.engine.MultiProcessEngine(),\n", - ")\n", - "# a summary of the results\n", - "display(Markdown(results_pypesto.summary()))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Beyond the optimization itself, pyPESTO naturally provides various analysis and visualization routines:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "_, axes = plt.subplots(ncols=2, figsize=(12, 6), constrained_layout=True)\n", - "visualize.waterfall(results_pypesto, ax=axes[0])\n", - "visualize.parameters(results_pypesto, ax=axes[1]);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. Profiling\n", - "\n", - "Profile likelihood analysis allows to systematically asess parameter uncertainty and identifiability, tracing a maximum profile in the likelihood landscape starting from the optimal parameter value.\n", - "\n", - "### Without pyPESTO\n", - "\n", - "When it comes to profiling, we have the main apparatus already prepared with a working optimizer and our objective function. We still need a wrapper around the objective function as well as the geneal setup for the profiling, which includes selecting startpoints and cutoffs. For the sake of computation time, we will limit the maximum number of steps the scipy optimizer takes to 50." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# sort the results\n", - "results_sorted = sorted(results, key=lambda a: a[\"fun\"])\n", - "results_sorted" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "results_pypesto.optimize_result[0][\"x\"][problem.x_free_indices][1:]" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Below is a very basic implementation of a profile likelihood analysis. We start from the optimal parameter value and profile in both directions. We use a simple stepsize and a ratio to stop the profiling. To run this, uncomment the code below." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# x_start = results_pypesto.optimize_result[0][\"x\"][problem.x_free_indices][1:]\n", - "# x_fixed = results_pypesto.optimize_result[0][\"x\"][problem.x_free_indices][0]\n", - "# fval_min = results_pypesto.optimize_result[0][\"fval\"]\n", - "#\n", - "# # determine stepsize, ratios\n", - "# stepsize = 0.05\n", - "# ratio_min = 0.145\n", - "# x_profile = [results_pypesto.optimize_result[0][\"x\"][problem.x_free_indices]]\n", - "# fval_profile = [results_pypesto.optimize_result[0][\"fval\"]]\n", - "#\n", - "# # set up for nlopt optimizer\n", - "# opt = nlopt.opt(\n", - "# nlopt.LD_LBFGS, len(parameters) - 1\n", - "# ) # only one of many possible options\n", - "#\n", - "# opt.set_lower_bounds(lb[:-1])\n", - "# opt.set_upper_bounds(ub[:-1])\n", - "#\n", - "#\n", - "# for direction, bound in zip([-1, 1], (-5, 3)): # profile in both directions\n", - "# print(f\"direction: {direction}\")\n", - "# x0_curr = x_fixed\n", - "# x_rest = x_start\n", - "# run = True\n", - "# while direction * (x0_curr - bound) < 0 and run:\n", - "# x0_curr += stepsize * direction\n", - "#\n", - "# # define objective for fixed parameter\n", - "# def fix_obj(x: np.ndarray):\n", - "# x = np.insert(x, 0, x0_curr)\n", - "# return obj(x)\n", - "#\n", - "# # define nlopt objective\n", - "# def nlopt_objective(x, grad):\n", - "# \"\"\"We need a wrapper function of the kind f(x,grad) for nlopt.\"\"\"\n", - "# r = fix_obj(x)\n", - "# return r\n", - "#\n", - "# opt.set_min_objective(nlopt_objective)\n", - "# result = opt.optimize(x_rest)\n", - "#\n", - "# # update profiles\n", - "# if direction == 1:\n", - "# x_profile.append(np.insert(result, 0, x0_curr))\n", - "# fval_profile.append(opt.last_optimum_value())\n", - "# if np.exp(fval_min - fval_profile[-1]) <= ratio_min:\n", - "# run = False\n", - "# if direction == -1:\n", - "# x_profile.insert(0, np.insert(result, 0, x0_curr))\n", - "# fval_profile.insert(0, opt.last_optimum_value())\n", - "# if np.exp(fval_min - fval_profile[0]) <= ratio_min:\n", - "# run = False\n", - "# x_rest = result\n", - "# plt.plot(\n", - "# [x[0] for x in x_profile], np.exp(np.min(fval_profile) - fval_profile)\n", - "# );" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is a very basic implementation that still lacks a few things:\n", - "* If we want to profile all parameters, we will want to **parallelize** this to save time.\n", - "* We chose a very unflexible stepsize, in general we would want to be able to automatically **adjust the stepsize** during each profile calculation.\n", - "* As this approach requires (multiple) optimizations under the hood, the things discussed in the last step also apply here mostly." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### With pyPESTO\n", - "\n", - "pyPESTO takes care of those things and integrates the profiling directly into the Result object" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "result_pypesto = profile.parameter_profile(\n", - " problem=problem,\n", - " result=results_pypesto,\n", - " optimizer=optimize.ScipyOptimizer(),\n", - " engine=pypesto.engine.MultiProcessEngine(),\n", - " profile_index=[0],\n", - ")\n", - "\n", - "visualize.profiles(result_pypesto);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4. Sampling\n", - "\n", - "pyPESTO also supports Bayesian sampling methods. These are used to retrieve posterior distributions and measure uncertainty globally.\n", - "\n", - "### Without pyPESTO\n", - "\n", - "While there are many available sampling methods, setting them up for a more complex objective function can be time intensive, and comparing different ones even more so." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import emcee\n", - "\n", - "n_samples = 500\n", - "\n", - "\n", - "# set up the sampler\n", - "# rewrite nll to llh\n", - "def log_prob(x):\n", - " \"\"\"Log-probability density function.\"\"\"\n", - " # check if parameter lies within bounds\n", - " if any(x < lb) or any(x > ub):\n", - " return -np.inf\n", - " # invert sign\n", - " return -1.0 * obj(x)\n", - "\n", - "\n", - "# def a function to get multiple startpoints for walkers\n", - "def get_epsilon_ball_initial_state(\n", - " center: np.ndarray,\n", - " lb: np.ndarray,\n", - " ub: np.ndarray,\n", - " nwalkers: int = 20,\n", - " epsilon: float = 1e-3,\n", - "):\n", - " \"\"\"Get walker initial positions as samples from an epsilon ball.\n", - "\n", - " The ball is scaled in each direction according to the magnitude of the\n", - " center in that direction.\n", - "\n", - " It is assumed that, because vectors are generated near a good point,\n", - " all generated vectors are evaluable, so evaluability is not checked.\n", - "\n", - " Points that are generated outside the problem bounds will get shifted\n", - " to lie on the edge of the problem bounds.\n", - "\n", - " Parameters\n", - " ----------\n", - " center:\n", - " The center of the epsilon ball. The dimension should match the full\n", - " dimension of the pyPESTO problem. This will be returned as the\n", - " first position.\n", - " lb, ub:\n", - " Upper and lower bounds of the objective.\n", - " nwalkers:\n", - " Number of emcee walkers.\n", - " epsilon:\n", - " The relative radius of the ball. e.g., if `epsilon=0.5`\n", - " and the center of the first dimension is at 100, then the upper\n", - " and lower bounds of the epsilon ball in the first dimension will\n", - " be 150 and 50, respectively.\n", - " \"\"\"\n", - " # Epsilon ball\n", - " lb = center * (1 - epsilon)\n", - " ub = center * (1 + epsilon)\n", - "\n", - " # Sample initial positions\n", - " dim = lb.size\n", - " lb = lb.reshape((1, -1))\n", - " ub = ub.reshape((1, -1))\n", - "\n", - " # create uniform points in [0, 1]\n", - " xs = np.random.random((nwalkers - 1, dim))\n", - "\n", - " # re-scale\n", - " xs = xs * (ub - lb) + lb\n", - "\n", - " initial_state_after_first = xs\n", - "\n", - " # Include `center` in initial positions\n", - " initial_state = np.row_stack(\n", - " (\n", - " center,\n", - " initial_state_after_first,\n", - " )\n", - " )\n", - "\n", - " return initial_state\n", - "\n", - "\n", - "sampler = emcee.EnsembleSampler(\n", - " nwalkers=18, ndim=len(ub), log_prob_fn=log_prob\n", - ")\n", - "sampler.run_mcmc(\n", - " initial_state=get_epsilon_ball_initial_state(\n", - " results_sorted[0][\"x\"], lb, ub, 18\n", - " ),\n", - " nsteps=n_samples,\n", - " skip_initial_state_check=True,\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "trace_x = np.array([sampler.get_chain(flat=True)])\n", - "trace_neglogpost = np.array([-sampler.get_log_prob(flat=True)])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plt.plot(\n", - " trace_neglogpost.reshape(\n", - " 9000,\n", - " ),\n", - " \"o\",\n", - " alpha=0.05,\n", - ")\n", - "plt.ylim([240, 300]);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### With pyPESTO\n", - "\n", - "pyPESTO supports a number of samplers and unifies their usage, making a change of sampler comparatively easy. Instead of the below `sample.AdaptiveMetropolisSampler`, try e.g. also a `sample.EmceeSampler` (like above) or a `sample.AdaptiveParallelTemperingSampler`. It also unifies the result object to a certain extent to allow visualizations across samplers." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Sampling\n", - "sampler = sample.AdaptiveMetropolisSampler()\n", - "result_pypesto = sample.sample(\n", - " problem=problem,\n", - " sampler=sampler,\n", - " n_samples=1000,\n", - " result=result_pypesto,\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# plot objective function trace\n", - "visualize.sampling_fval_traces(result_pypesto);" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "visualize.sampling_1d_marginals(result_pypesto);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5. Storage\n", - "\n", - "As the analysis itself is time consuming, it is neccesary to save the results for later usage. In this case it becomes even more apparent why one needs a **unified result object**, as otherwise saving will have to be adjusted each time one changes optimizer/sampler/profile startopint or other commonly changed things, or use an unsafe format such as pickling.\n", - "\n", - "pyPESTO offers a unified result object and a very easy to use saving/loading-scheme:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import tempfile\n", - "\n", - "# create temporary file\n", - "fn = tempfile.mktemp(\".h5\")\n", - "\n", - "# write result with write_result function.\n", - "# Choose which parts of the result object to save with\n", - "# corresponding booleans.\n", - "store.write_result(\n", - " result=result_pypesto,\n", - " filename=fn,\n", - " problem=True,\n", - " optimize=True,\n", - " sample=True,\n", - " profile=True,\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Read result\n", - "result2 = store.read_result(fn, problem=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# plot profiles\n", - "visualize.sampling_1d_marginals(result2);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This concludes our brief rundown of a typical pyPESTO workflow and manual alternatives. In addition to what was shown here, pyPESTO provides a lot more functionality, including but not limited to visualization routines, diagnostics, model selection and hierarchical optimization. For further information, see the other example notebooks and the API documentation." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.2" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/test/run_notebook.sh b/test/run_notebook.sh index 417971a00..c885350c5 100755 --- a/test/run_notebook.sh +++ b/test/run_notebook.sh @@ -29,7 +29,6 @@ nbs_1=( 'censored_data.ipynb' 'semiquantitative_data.ipynb' 'getting_started.ipynb' - 'workflow_comparison.ipynb' ) # Sampling notebooks