diff --git a/_freeze/materials/checking-assumptions/execute-results/html.json b/_freeze/materials/checking-assumptions/execute-results/html.json
index c91b84e..ee5ee14 100644
--- a/_freeze/materials/checking-assumptions/execute-results/html.json
+++ b/_freeze/materials/checking-assumptions/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "91e9cff06ed366cdae4b93eba9c4050a",
+  "hash": "f65efeea6d026670c0ce9e5527fbb292",
   "result": {
-    "markdown": "---\ntitle: \"Checking assumptions\"\noutput: html_document\n---\n\n\n\n::: {.cell}\n\n:::\n\n\nAs with all statistical models, mixed effects models make certain assumptions about the dataset and the population it's drawn from. If these assumptions are not well met, then any results we get from our model must be taken with a huge grain of salt.\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll be using the `performance` package in R to visually check assumptions.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# install and load the package\ninstall.packages(\"performance\")\nlibrary(performance)\n```\n:::\n\n:::\n\n## What are the assumptions?\n\nThe assumptions of a linear mixed effects model - which we've been dealing with so far in the course - are very similar to the assumptions of a standard linear model, and include all the things you're likely used to seeing:\n\n- Continuous response variable\n- Independence of data points (beyond the non-independence that we have accounted for with our random effects)\n- Linearity in the relationship between the predictor(s) and the response\n- Residuals are normally distributed\n- Residuals have equality of variance\n\nAnd, though it isn't a \"formal\" assumption in the strictest sense, we also want to ensure that there aren't any overly influential data points.\n\nBecause we now have random effects in our model, there are a couple of additional assumptions that we make:\n\n- The coefficients of the random effects are normally distributed\n- Random effects are not influenced by any of the other predictors\n\n## Testing these assumptions\n\nThe first two of our assumptions - continuous response variable and independence - can't be tested just by examining the dataset or residuals. These two assumptions fit within a broader idea of \"choose the right model\", which requires you as a researcher to think carefully about your experimental design.\n\nThe rest of our assumptions can be assessed using the same method that we use for a standard linear regression analysis: visualisation via diagnostic plots.\n\nLet's look at our `sleepstudy` dataset again. Here is the full model that we fitted to those data:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"sleepstudy\")\n\nlme_sleep <- lmer(Reaction ~ Days + (1 + Days|Subject),\n                   data = sleepstudy)\n```\n:::\n\n:::\n\nNow, let's visualise it. We could create each of the plots by hand if we wished (using the `broom.mixed` package to augment our dataset), but thankfully there exists a much quicker method, using an R package called `performance`.\n\n::: {.callout-tip}\nThe `performance` package contains a bunch of functions that allow us to test the quality of our model. For the purposes of visualisation, we'll use `check_model`, but I encourage you to explore this package in more detail as there's a lot more to it (it's super helpful for evaluating the performance of generalised linear models and Bayesian models, as well as mixed models).\n\nNote that you might also need to install and/or load the `see` package in order to use the `performance` package.\n:::\n\n### The usual suspects\n\nWe'll start by looking at the diagnostic plots that we're used to seeing from standard linear models.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_sleep, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n:::\n\nThe plot for influential observations might look a bit different to the Cook's distance plot that you might be used to. On this plot, there are 4 data points labelled in red which fall really far outside our dashed contour lines (8, 57, 60 and 131). This tells us that we might want to re-examine these points, perhaps by excluding them from the dataset, fitting a new linear mixed model, and seeing whether our conclusions are still the same.\n\nThe linearity and homogeneity of variance plots look alright, overall, although there's some indication that our influential points might be causing a bit of drama there too. There's some snaking in the Q-Q plot that suggests our residuals have a \"heavy-tailed\", or leptokurtic, distribution.\n\n### Normality of random effects\n\nThe other important assumption to check via visualisation is the normality of our random effects estimates.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_sleep, check = \"reqq\")\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n:::\n\nWe have two random effects in our model - a set of random intercepts by `Subject`, and a set of random slopes of `Days` on `Subject`.\n\nFor each of these, a separate normal Q-Q plot has been constructed. If you look closely, you'll see that there are 18 points in each of our Q-Q plots here, which correspond to the 18 subjects in the dataset.\n\nThis lets us evaluate whether our set of coefficients for these random effects are normally distributed. In other words - do the set of y-intercepts and the set of gradients that were generated appear to have been sampled from a normal underlying distribution? Here, it looks like they do, which is excellent news.\n\n### Posterior predictive check\n\nOne of the other plots that is offered as part of `check_model` is called the posterior predictive check. It's quite a nice option to include, as it can give you an overall idea of how good a job your model does in predicting your data.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_sleep, check = \"pp_check\")\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n:::\n\nHere, the function has run a set of simulations for us, using the linear mixed model we created. Each of these simulated datasets, created from our model, is shown on the plot by a thin blue line (as you can see, many simulations have been run).\n\nThe green line then shows us our current dataset. If the green line shows the same sort of pattern as all the thinner blue lines, this indicates good overall model fit.\n\nFor this dataset, it really isn't bad at all for the most part! However, our dataset (the green line) does have a bit of a \"dip\" or \"dent\" that doesn't seem to be occurring in very many of our blue lines. This could potentially indicate that our model is a bit too simple, i.e., there is some other important variable that we've not factored in here; or it could simply be a result of random noise.\n\n::: {.callout-tip collapse=\"true\"}\n### Changing plotting colours in check_model\n\nIf you find the green, blue and red default colours in `check_model` to be a little too similar to each other for your liking, there is an optional `colours` argument in the function that you can add. For instance, you could change the green to a yellow, by adding this to the `check_model` function: `colors = c(\"#fada5e\", \"#1b6ca8\", \"#cd201f\")`.\n:::\n\n## Exercises\n\n### Exercise 1 - Dragons revisited (again)\n\n\n{{< level 1 >}}\n\n\n\nLet's once again revisit the `dragons` dataset, and the minimal model that we chose in the previous section based on significance testing:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndragons <- read_csv(\"data/dragons.csv\")\n\nlme_dragons_dropx <- lmer(intelligence ~ wingspan + scales + \n                            (1 + wingspan|mountain), \n                            data=dragons)\n```\n:::\n\n:::\n\nFit diagnostic plots for this model using the code given above. What do they show?\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_dragons_dropx, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_dragons_dropx, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-8-2.png){width=672}\n:::\n:::\n\n:::\n\n:::\n\nTry comparing these diagnostic plots to the diagnostic plots for the full model, `intelligence ~ wingspan*scales + (1 + wingspan|mountain)`. Are the assumptions better met? Why/why not?\n\n### Exercise 2 - Arabidopsis\n\n\n{{< level 2 >}}\n\n\n\nFor this second exercise, we'll use another internal dataset from `lme4`, called `Arabidopsis`. These data are about genetic variation in a plant genus *arabidopsis* (rockcress), in response to fertilisation and \"simulated herbivory\" (some of the plants' stems were damaged/clipped to simulate animal grazing).\n\n![They look like this - quite pretty!](images_mixed-effects/arabidopsis.webp){width=40% fig-alt=\"Close-up of an arabidopsis plant, with delicate white flowers\"}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Arabidopsis\")\n```\n:::\n\n:::\n\nIn this dataset, there are eight variables:\n\n- `total.fruits`, an integer variable measuring the total fruits produced per plant\n- `amd`, a variable measuring whether the plant underwent simulated herbivory (clipped or unclipped)\n- `nutrient`, a variable measuring which type of fertiliser/treatment the plant received (1, minimal or 8, added)\n- `reg`, or region, a variable with three categories (NL Netherlands, SP Spain, SW Sweden)\n- `popu`, or population, a variable representing groups within the regions\n- `gen`, or genotype, a variable with 24 categories\n- `rack`, a \"nuisance\" or confounding factor, representing which of two greenhouse racks the plant was grown on\n- `status`, another nuisance factor, representing the plant's germination method (Normal, Petri.Plate or Transplant)\n\nWe're interested in finding out whether the fruit yield can be predicted based on the type of fertiliser and whether the plant underwent simulated herbivory, across different genotypes and populations.\n\nFit the following mixed effects model: \n\n`total.fruits ~ nutrient + rack + status + amd + reg + (1|popu) + (1|gen)` \n\nand check its assumptions. What can you conclude about the suitability of a linear mixed effects model for this dataset?\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\n#### Fitting the model\n\nOur research question tell us that `total.fruits` is the response variable, and that `nutrient` and `amd` are fixed predictors of interest. The rest of our variables are confounds that we'd like to control for.\n\nOnly some of these additional variables have sufficient levels/categories to be treated as random effects, but both `popu` and `gen` do qualify. So far we've only talked about having one clustering variable at a time within a dataset; we'll talk more about this in subsequent sessions, and for now, we've shown you how to correctly do it for the current dataset:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_arabidopsis <- lmer(total.fruits ~ nutrient + rack + status + amd + reg + \n                          (1|popu) + (1|gen), data=Arabidopsis)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThe nuisance variables `rack` and `status` are included, along with `reg` (which could potentially be an effect of interest depending on the research question). Because they have less than 5 levels, they're best fitted as fixed effects.\n\n#### Is this a suitable model?\n\nProbably not, for multiple reasons.\n\nFirstly, we get a warning message telling us that our model has a \"singular fit\". This is usually a sign that your dataset isn't large enough to support all of the different parameters, fixed or random, that you've asked R to estimate.\n\nSecondly, if we look at the diagnostic plots, we can see some real issues emerging.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_arabidopsis, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_arabidopsis, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-11-2.png){width=672}\n:::\n:::\n\n:::\n\nMany of these plots look bad. There's evidence for non-linearity, for heterogeneity of variance and non-normality in the residuals, and the posterior predictive check looks terrible.\n\n::: {.callout-caution icon=\"false\"}\n#### Bonus questions: can we fix any of this?\n\n\n{{< level 3 >}}\n\n\n\nWhen you have a singular fit, i.e., you're asking for too much from your dataset, a good first step is usually to try reducing the complexity of your model. Try performing some model comparison, or fitting simpler models, and see what happens.\n\nIf you check the assumptions for each of these simpler models, however, you'll probably notice that many of the issues persist.\n\nTo figure out why, and whether it's fixable, think about the types of variables we have, and how R is treating them. You might find the `as.factor` function useful in places; but does that fix everything?\n:::\n\nChat about these bonus questions with a neighbour, or a trainer. Understanding why these diagnostic plots look bad, and why we might need to take a closer look at the dataset before we fit things, will serve you really well when working with your own data.\n\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key Points\n\n- Linear mixed effects models have the same assumptions as standard linear models\n- Mixed models also make assumptions about the distribution of random effects\n- The `performance` package in R can be used to assess whether these assumptions are met using diagnostic plots\n:::\n\n",
+    "markdown": "---\ntitle: \"Checking assumptions\"\noutput: html_document\n---\n\n\n\n::: {.cell}\n\n:::\n\n\nAs with all statistical models, mixed effects models make certain assumptions about the dataset and the population it's drawn from. If these assumptions are not well met, then any results we get from our model must be taken with a huge grain of salt.\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll be using the `performance` package in R to visually check assumptions.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# install and load the package\ninstall.packages(\"performance\")\nlibrary(performance)\n```\n:::\n\n:::\n\n## What are the assumptions?\n\nThe assumptions of a linear mixed effects model - which we've been dealing with so far in the course - are very similar to the assumptions of a standard linear model, and include all the things you're likely used to seeing:\n\n- Continuous response variable\n- Independence of data points (beyond the non-independence that we have accounted for with our random effects)\n- Linearity in the relationship between the predictor(s) and the response\n- Residuals are normally distributed\n- Residuals have equality of variance\n\nAnd, though it isn't a \"formal\" assumption in the strictest sense, we also want to ensure that there aren't any overly influential data points.\n\nBecause we now have random effects in our model, there are a couple of additional assumptions that we make:\n\n- The coefficients of the random effects are normally distributed\n- Random effects are not influenced by any of the other predictors\n\n## Testing these assumptions\n\nThe first two of our assumptions - continuous response variable and independence - can't be tested just by examining the dataset or residuals. These two assumptions fit within a broader idea of \"choose the right model\", which requires you as a researcher to think carefully about your experimental design.\n\nThe rest of our assumptions can be assessed using the same method that we use for a standard linear regression analysis: visualisation via diagnostic plots.\n\nLet's look at our `sleepstudy` dataset again. Here is the full model that we fitted to those data:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"sleepstudy\")\n\nlme_sleep <- lmer(Reaction ~ Days + (1 + Days|Subject),\n                   data = sleepstudy)\n```\n:::\n\n:::\n\nNow, let's visualise it. We could create each of the plots by hand if we wished (using the `broom.mixed` package to augment our dataset), but thankfully there exists a much quicker method, using an R package called `performance`.\n\n::: {.callout-tip}\nThe `performance` package contains a bunch of functions that allow us to test the quality of our model. For the purposes of visualisation, we'll use `check_model`, but I encourage you to explore this package in more detail as there's a lot more to it (it's super helpful for evaluating the performance of generalised linear models and Bayesian models, as well as mixed models).\n\nNote that you might also need to install and/or load the `see` package in order to use the `performance` package.\n:::\n\n### The usual suspects\n\nWe'll start by looking at the diagnostic plots that we're used to seeing from standard linear models.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_sleep, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n:::\n\nThe plot for influential observations might look a bit different to the Cook's distance plot that you might be used to. On this plot, there are 4 data points labelled in red which fall really far outside our dashed contour lines (8, 57, 60 and 131). This tells us that we might want to re-examine these points, perhaps by excluding them from the dataset, fitting a new linear mixed model, and seeing whether our conclusions are still the same.\n\nThe linearity and homogeneity of variance plots look alright, overall, although there's some indication that our influential points might be causing a bit of drama there too. There's some snaking in the Q-Q plot that suggests our residuals have a \"heavy-tailed\", or leptokurtic, distribution.\n\n### Normality of random effects\n\nThe other important assumption to check via visualisation is the normality of our random effects estimates.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_sleep, check = \"reqq\")\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n:::\n\nWe have two random effects in our model - a set of random intercepts by `Subject`, and a set of random slopes of `Days` on `Subject`.\n\nFor each of these, a separate normal Q-Q plot has been constructed. If you look closely, you'll see that there are 18 points in each of our Q-Q plots here, which correspond to the 18 subjects in the dataset.\n\nThis lets us evaluate whether our set of coefficients for these random effects are normally distributed. In other words - do the set of y-intercepts and the set of gradients that were generated appear to have been sampled from a normal underlying distribution? Here, it looks like they do, which is excellent news.\n\n### Posterior predictive check\n\nOne of the other plots that is offered as part of `check_model` is called the posterior predictive check. It's quite a nice option to include, as it can give you an overall idea of how good a job your model does in predicting your data.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_sleep, check = \"pp_check\")\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n:::\n\nHere, the function has run a set of simulations for us, using the linear mixed model we created. Each of these simulated datasets, created from our model, is shown on the plot by a thin blue line (as you can see, many simulations have been run).\n\nThe green line then shows us our current dataset. If the green line shows the same sort of pattern as all the thinner blue lines, this indicates good overall model fit.\n\nFor this dataset, it really isn't bad at all for the most part! However, our dataset (the green line) does have a bit of a \"dip\" or \"dent\" that doesn't seem to be occurring in very many of our blue lines. This could potentially indicate that our model is a bit too simple, i.e., there is some other important variable that we've not factored in here; or it could simply be a result of random noise.\n\n::: {.callout-tip collapse=\"true\"}\n### Changing plotting colours in check_model\n\nIf you find the green, blue and red default colours in `check_model` to be a little too similar to each other for your liking, there is an optional `colours` argument in the function that you can add. For instance, you could change the green to a yellow, by adding this to the `check_model` function: `colors = c(\"#fada5e\", \"#1b6ca8\", \"#cd201f\")`.\n:::\n\n## Exercises\n\n### Dragons revisited (again) {#sec-exr_dragons3}\n\n::: {.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nLet's once again revisit the `dragons` dataset, and the minimal model that we chose in [Exercise -@sec-exr_dragons2] based on significance testing:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndragons <- read_csv(\"data/dragons.csv\")\n\nlme_dragons_dropx <- lmer(intelligence ~ wingspan + scales + \n                            (1 + wingspan|mountain), \n                            data=dragons)\n```\n:::\n\n:::\n\nFit diagnostic plots for this model using the code given above. What do they show?\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_dragons_dropx, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_dragons_dropx, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-8-2.png){width=672}\n:::\n:::\n\n:::\n\n:::\n\nTry comparing these diagnostic plots to the diagnostic plots for the full model, `intelligence ~ wingspan*scales + (1 + wingspan|mountain)`. Are the assumptions better met? Why/why not?\n\n:::\n\n### Arabidopsis {#sec-exr_arabidopsis}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this second exercise, we'll use another internal dataset from `lme4`, called `Arabidopsis`. These data are about genetic variation in a plant genus *arabidopsis* (rockcress), in response to fertilisation and \"simulated herbivory\" (some of the plants' stems were damaged/clipped to simulate animal grazing).\n\n![They look like this - quite pretty!](images_mixed-effects/arabidopsis.webp){width=40% fig-alt=\"Close-up of an arabidopsis plant, with delicate white flowers\"}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Arabidopsis\")\n```\n:::\n\n:::\n\nIn this dataset, there are eight variables:\n\n- `total.fruits`, an integer variable measuring the total fruits produced per plant\n- `amd`, a variable measuring whether the plant underwent simulated herbivory (clipped or unclipped)\n- `nutrient`, a variable measuring which type of fertiliser/treatment the plant received (1, minimal or 8, added)\n- `reg`, or region, a variable with three categories (NL Netherlands, SP Spain, SW Sweden)\n- `popu`, or population, a variable representing groups within the regions\n- `gen`, or genotype, a variable with 24 categories\n- `rack`, a \"nuisance\" or confounding factor, representing which of two greenhouse racks the plant was grown on\n- `status`, another nuisance factor, representing the plant's germination method (Normal, Petri.Plate or Transplant)\n\nWe're interested in finding out whether the fruit yield can be predicted based on the type of fertiliser and whether the plant underwent simulated herbivory, across different genotypes and populations.\n\nFit the following mixed effects model: \n\n`total.fruits ~ nutrient + rack + status + amd + reg + (1|popu) + (1|gen)` \n\nand check its assumptions. What can you conclude about the suitability of a linear mixed effects model for this dataset?\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\n#### Fitting the model\n\nOur research question tell us that `total.fruits` is the response variable, and that `nutrient` and `amd` are fixed predictors of interest. The rest of our variables are confounds that we'd like to control for.\n\nOnly some of these additional variables have sufficient levels/categories to be treated as random effects, but both `popu` and `gen` do qualify. So far we've only talked about having one clustering variable at a time within a dataset; we'll talk more about this in subsequent sessions, and for now, we've shown you how to correctly do it for the current dataset:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_arabidopsis <- lmer(total.fruits ~ nutrient + rack + status + amd + reg + \n                          (1|popu) + (1|gen), data=Arabidopsis)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThe nuisance variables `rack` and `status` are included, along with `reg` (which could potentially be an effect of interest depending on the research question). Because they have less than 5 levels, they're best fitted as fixed effects.\n\n#### Is this a suitable model?\n\nProbably not, for multiple reasons.\n\nFirstly, we get a warning message telling us that our model has a \"singular fit\". This is usually a sign that your dataset isn't large enough to support all of the different parameters, fixed or random, that you've asked R to estimate.\n\nSecondly, if we look at the diagnostic plots, we can see some real issues emerging.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_arabidopsis, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_arabidopsis, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](checking-assumptions_files/figure-html/unnamed-chunk-11-2.png){width=672}\n:::\n:::\n\n:::\n\nMany of these plots look bad. There's evidence for non-linearity, for heterogeneity of variance and non-normality in the residuals, and the posterior predictive check looks terrible.\n\n::: {.callout-caution icon=\"false\"}\n#### Bonus questions: can we fix any of this?\n\n\n{{< level 3 >}}\n\n\n\nWhen you have a singular fit, i.e., you're asking for too much from your dataset, a good first step is usually to try reducing the complexity of your model. Try performing some model comparison, or fitting simpler models, and see what happens.\n\nIf you check the assumptions for each of these simpler models, however, you'll probably notice that many of the issues persist.\n\nTo figure out why, and whether it's fixable, think about the types of variables we have, and how R is treating them. You might find the `as.factor` function useful in places; but does that fix everything?\n:::\n\nChat about these bonus questions with a neighbour, or a trainer. Understanding why these diagnostic plots look bad, and why we might need to take a closer look at the dataset before we fit things, will serve you really well when working with your own data.\n\n:::\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key Points\n\n- Linear mixed effects models have the same assumptions as standard linear models\n- Mixed models also make assumptions about the distribution of random effects\n- The `performance` package in R can be used to assess whether these assumptions are met using diagnostic plots\n:::\n\n",
     "supporting": [
       "checking-assumptions_files"
     ],
diff --git a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-1.png b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-1.png
index 748cddc..dd29794 100644
Binary files a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-1.png and b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-1.png differ
diff --git a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-2.png b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-2.png
index 17fd98f..d4321a4 100644
Binary files a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-2.png and b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-11-2.png differ
diff --git a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-4-1.png b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-4-1.png
index bf7d440..2545d54 100644
Binary files a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-4-1.png and b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-4-1.png differ
diff --git a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-5-1.png b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-5-1.png
index 1753d55..bbcb087 100644
Binary files a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-5-1.png and b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-5-1.png differ
diff --git a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-6-1.png b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-6-1.png
index 0b20744..cc589b1 100644
Binary files a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-6-1.png and b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-6-1.png differ
diff --git a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-1.png b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-1.png
index dfbf33f..18e9c69 100644
Binary files a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-1.png and b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-1.png differ
diff --git a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-2.png b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-2.png
index fac9228..e9ab503 100644
Binary files a/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-2.png and b/_freeze/materials/checking-assumptions/figure-html/unnamed-chunk-8-2.png differ
diff --git a/_freeze/materials/crossed-random-effects/execute-results/html.json b/_freeze/materials/crossed-random-effects/execute-results/html.json
index 9d1f945..352f578 100644
--- a/_freeze/materials/crossed-random-effects/execute-results/html.json
+++ b/_freeze/materials/crossed-random-effects/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "7cd9f11220adb18f4d7763d98a28a95e",
+  "hash": "0d8e6bed9f5e092535a97f78321bd4a3",
   "result": {
-    "markdown": "---\ntitle: \"Crossed random effects\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nThe previous section of course materials discussed how to fit random effects in `lme4` when there are multiple clustering variables within the dataset/experimental design, with a focus on nested random effects. \n\nThis section similarly explains how to determine the random effects structure for more complex experimental designs, but deals with the situations where the clustering variables are not nested.\n\n## What are crossed random effects?\n\nWe describe two clustering variables as \"crossed\" if they can be combined in different ways to generate unique groupings, but one of them doesn't \"nest\" inside the other.\n\nThis concept is similar to the idea of a \"factorial\" design in regular linear modelling.\n\n### Fast food example\n\nFor instance, imagine a fast food franchise is looking to perform quality control checks across different branches. In 5 randomly selected branches, testers sample 6 different items of food from the menu. They sample the same 6 items in each branch, randomly selected from the wider menu.\n\nHere, both `branch` and `menu item` would be considered random effects, but one is not nested within the other. In this situation, item A in branch 1 and item A in branch 2 are not unconnected or unique; they are the same menu item. We would want to estimate a set of 6 random intercepts/slopes for `branch`, and separately, 5 random intercepts/slopes for `menu item`.\n\n![Branch and item as crossed effects](images_mixed-effects/fastfood_design.png){width=40%}\n\nA useful rule of thumb is that if the best way to draw out your experimental design is with a table or grid like this, rather than a tree-shaped diagram, then your effects are likely to be crossed rather than nested.\n\n## Fitting crossed random effects\n\nImplementing crossed random effects in your `lme4` model is very easy. You don't need to worry about additional syntax or explicit nesting.\n\nWe'll use a behavioural dataset from a cognitive psychology study, where the classic Stroop task was administered, as a test case.\n\n### The Stroop dataset\n\nIn the Stroop task, participants are asked to identify the colour of font that a word is written in. The words themselves, however, are the names of different colours. Typically, when the font colour does not match the word itself, people are slower to identify the font colour.\n\n![The Stroop task](images_mixed-effects/stroop.png){width=70%}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncognitive <- read_csv(\"data/stroop.csv\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nRows: 432 Columns: 4\n── Column specification ────────────────────────────────────────────────────────\nDelimiter: \",\"\nchr (2): subject, congruency\ndbl (2): item, reaction_time\n\nℹ Use `spec()` to retrieve the full column specification for this data.\nℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.\n```\n:::\n:::\n\n:::\n\nThis dataset contains four variables:\n\n- `subject`, of which there are 12\n- `item`, referring to task item, of which there are 36 in total\n- `congruency`, whether the colour of the font matched the word or not (congruent vs incongruent)\n- `reaction_time`, how long it took the participant to give a response (ms)\n\nOf the 36 items, 18 are congruent, and 18 are incongruent. Each subject in the study saw and responded to all 36 items, in a randomised (counterbalanced) order.\n\nOur fixed predictor is `congruency`, and we can treat both `subject` and `item` as clustering variables that create non-independent clusters amongst the 432 total observations of `reaction_time`.\n\nTherefore, we fit the following model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_cognitive <- lmer(reaction_time ~ congruency + (1|item) +\n                        (1+congruency|subject), data=cognitive)\n\nsummary(lme_cognitive)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: reaction_time ~ congruency + (1 | item) + (1 + congruency | subject)\n   Data: cognitive\n\nREML criterion at convergence: 4041.3\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.4839 -0.6472  0.0008  0.5949  3.2836 \n\nRandom effects:\n Groups   Name                  Variance Std.Dev. Corr \n item     (Intercept)            42.2     6.496        \n subject  (Intercept)           140.4    11.847        \n          congruencyincongruent 159.4    12.626   -0.68\n Residual                       609.9    24.696        \nNumber of obs: 432, groups:  item, 36; subject, 12\n\nFixed effects:\n                      Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)            247.995      4.107  14.240   60.39  < 2e-16 ***\ncongruencyincongruent   58.222      4.860  15.581   11.98 2.87e-09 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ncngrncyncng -0.686\n```\n:::\n:::\n\n:::\n\nIn this model, we've included a fixed effect of congruency, as well as three random effects:\n\n- random intercepts for `item`\n- random intercepts for `subject`\n- random slopes for `congruency` on `subject`\n\nWe do not fit random slopes for `congruency` on `item`, as `congruency` does not vary within individual task items.\n\nCrucially, `item` is not nested within `subject`. Item 4 for subject A is exactly the same as item 4 for subject E - we haven't given each subject their own set of items. You can see from the model output that we have therefore fitted 12 random intercepts/slopes for `subject`, and 36 random intercepts for `item`.\n\nThis allows us to capture the fixed relationship between `congruency` and `reaction_time`, with both `subject` and `item` accounted for.\n\n## Partially crossed random effects\n\nIn the example above, each participant in the study experienced each of the task items. We'd call this a fully-crossed design (or perhaps, a full factorial design). But, if each participant had only responded to a randomised subset of the task items, then we would instead say that the `item` and `subject` random effects are *partially* crossed.\n\nPartially crossed designs are common in research, such as when using the classic Latin square design. We'll look at an example of that, using the `abrasion` dataset.\n\n### The abrasion dataset\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nabrasion <- read_csv(\"data/abrasion.csv\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nRows: 16 Columns: 4\n── Column specification ────────────────────────────────────────────────────────\nDelimiter: \",\"\nchr (1): material\ndbl (3): run, position, wear\n\nℹ Use `spec()` to retrieve the full column specification for this data.\nℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.\n```\n:::\n:::\n\n:::\n\nIn this experiment, four different types of `material` are being tested (A, B, C and D) for their wear, by feeding them into a wear-testing machine. \n\nThe machine could process four material samples at a time in each `run`, and it's believed that there are differences between runs. There is also evidence that the `position` within the machine might also generate some differences in wear. Therefore, four runs were made in total, with each `material` placed at each different `position` across the `run`. For each of the 16 samples, the response variable `wear` is assessed by measuring the loss of weight in 0.1mm of material over the testing period.\n\nOn first read, it might sound as if `position` and `run` are somehow nested effects, but actually, they represent a Latin square design:\n\n![Latin square design of abrasion experiment](images_mixed-effects/latin_square.png){width=30%}\n\nA Latin square is a particular type of randomised design, in which each experimental condition (in this case, materials A through D) appear once and only once in each column and row of the design matrix. This sort of randomisation might be used to randomise the layout of plants in greenhouses, or samples in wells on plates.\n\nIn the `abrasion` example, this design matrix is actually stored within the structure of the dataset itself. You can reconstruct it by looking at the raw data, or by using the following code:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nmatrix(abrasion$material, 4, 4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n     [,1] [,2] [,3] [,4]\n[1,] \"C\"  \"A\"  \"D\"  \"B\" \n[2,] \"D\"  \"B\"  \"C\"  \"A\" \n[3,] \"B\"  \"D\"  \"A\"  \"C\" \n[4,] \"A\"  \"C\"  \"B\"  \"D\" \n```\n:::\n:::\n\n:::\n\nThe four possible positions are the same across each run, meaning that `position` is not nested within `run`, but is instead crossed. Position 1 in run 1 is linked to position 1 in run 3, for instance - we wouldn't consider these to be \"unique\" positions, but would like to group them together when estimating variance in our model.\n\nBut, because it's impossible for each `material` to experience each `position` in each `run`, this is a partially crossed design rather than a fully crossed one.\n\n### Fitting partially crossed random effects\n\nThe good news is that fitting this in `lme4` doesn't require any extra knowledge or special syntax. So long as the dataset is properly coded and accurately represents the structure of the experimental design, the code is identical to fully crossed random effects.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_abrasion <- lmer(wear ~ material + (1|run) + (1|position), data = abrasion)\n\nsummary(lme_abrasion)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: wear ~ material + (1 | run) + (1 | position)\n   Data: abrasion\n\nREML criterion at convergence: 100.3\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-1.08973 -0.30231  0.02697  0.42254  1.21052 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n run      (Intercept)  66.90    8.179  \n position (Intercept) 107.06   10.347  \n Residual              61.25    7.826  \nNumber of obs: 16, groups:  run, 4; position, 4\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  265.750      7.668   7.475  34.656 1.57e-09 ***\nmaterialB    -45.750      5.534   6.000  -8.267 0.000169 ***\nmaterialC    -24.000      5.534   6.000  -4.337 0.004892 ** \nmaterialD    -35.250      5.534   6.000  -6.370 0.000703 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n          (Intr) matrlB matrlC\nmaterialB -0.361              \nmaterialC -0.361  0.500       \nmaterialD -0.361  0.500  0.500\n```\n:::\n:::\n\n:::\n\nIf you check the output, you can see that we do indeed have 4 groups each for `run` and `position`, which is correct. The model has done what we intended, and we could now go on to look at the differences between `material`, with the nuisance effects of `run` and `position` having been accounted for.\n\n## Exercises\n\n### Exercise 1 - Penicillin\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll use the internal `Penicillin` dataset from `lme4`.\n\nThese data are taken from a study that assessed the concentration of a penicillin solution, by measuring how it inhibits the growth of organisms on a plate of agar. \n\nSix samples of the penicillin solution were taken. On each plate of agar, a few droplets of each of the six samples were allowed to diffuse into the medium. The diameter of the inhibition zones created could be measured, and is related in a known way to the concentration of the penicillin.\n\nThere are three variables:\n\n- `sample`, the penicillin sample (A through F, 6 total)\n- `plate`, the assay plate (a through x, 24 total)\n- `diameter`, of the zone of inhibition (measured in mm)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Penicillin\")\n```\n:::\n\n:::\n\nFor this exercise:\n\n1. Fit a sensible model to the data\n2. Perform significance testing/model comparison\n3. Check the model assumptions\n4. Visualise the model\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\nThis is quite a simple dataset, in that there are only two variables besides the response. But, given the research question, we likely want to consider both of these two variables as random effects.\n\nHow does that work? This is the first random-effects-only model that we've come across. (Well, technically there are still fixed effects - every time you estimate a random effect, a fixed effect will always be estimated as part of that.)\n\n#### Consider the experimental design\n\nWe have two variables for which we'd like to estimate random effects, and with no explicit fixed predictors, all that's available to us is random intercepts.\n\nThe two variables, `plate` and `sample`, are crossed in a factorial design (each of the six samples is included on each of the 24 plates). So, we want to fit these as crossed random effects.\n\n#### Fit the model\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_penicillin <- lmer(diameter ~ (1|sample) + (1|plate), data = Penicillin)\n\nsummary(lme_penicillin)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: diameter ~ (1 | sample) + (1 | plate)\n   Data: Penicillin\n\nREML criterion at convergence: 330.9\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.07923 -0.67140  0.06292  0.58377  2.97959 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n plate    (Intercept) 0.7169   0.8467  \n sample   (Intercept) 3.7311   1.9316  \n Residual             0.3024   0.5499  \nNumber of obs: 144, groups:  plate, 24; sample, 6\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  22.9722     0.8086  5.4866   28.41 3.62e-07 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis shows us that the average diameter of the inhibition zone is around 23mm. Looking at the random effects, there's more variance due to `sample` than there is to `plate`.\n\n#### Visualise the model\n\nWe can see these different variances by visualising the model. Here, a jagged line of best fit is drawn for each of the samples; the overall shape of the lines are the same, since we have random intercepts only. You can see that the spread within each of the lines (which represents variance for `plate`) is overall less than the spread of the lines themselves (which represents the variance for `sample`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_penicillin), aes(x = plate, y = diameter, colour = sample)) + \n  geom_jitter(width = 0.2, height = 0) +\n  geom_line(aes(y = .fitted, group = sample))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-9-1.png){width=672}\n:::\n:::\n\n:::\n\n:::\n\n### Exercise 2 - Politeness\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll use a real dataset called `politeness`, taken from a paper by Winter & Grawunder ([2012](https://doi.org/10.1016/j.wocn.2012.08.006)).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npoliteness <- read_csv(\"data/politeness.csv\")\n```\n:::\n\n:::\n\nThe study was designed to investigate whether voice pitch is higher in polite contexts than in informal ones, and whether this effect is consistent between male and female speakers.\n\nThere are five variables in this dataset:\n\n- `subject`, the participant (6 total)\n- `gender`, treated here as a binary categorical variable (male vs female)\n- `sentence`, the sentence that was spoken (7 total)\n- `context`, whether the speaker was in a polite or informal setting\n- `pitch`, the measured voice pitch across the sentence\n\nEach participant in the study spoke each of the seven sentences twice, once in each of the two contexts.\n\nIs there a difference between vocal pitch in different contexts? Is this effect consistent for male and female speakers?\n\nTo answer this question:\n\n1. Consider which variables you want to treat as fixed and random effects\n2. Try drawing out the structure of the dataset, and think about what levels the different variables are varying at\n3. You may want to assess the quality and significance of the model to help you draw your final conclusions\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\n#### Consider the experimental design\n\nIn this dataset, there are two variables for which we might want to fit random effects: `subject` and `sentence`. The particular sets of participants and sentences have been chosen at random from the larger population of participants/speakers and possible sentences that exist.\n\nThe other two variables, `gender` and `context`, are fixed effects of interest.\n\nLet's sketch out the design of this experiment. You could choose to visualise/sketch out this design in a couple of ways:\n\n![Experimental design for voice pitch experiment #1](images_mixed-effects/politeness_design.png){width=60%}\n\n![Experimental design for voice pitch experiment #2](images_mixed-effects/politeness_design2.png){width=60%}\n\nThe `subject` and `sentence` variables are not nested within one another - they're crossed. There are 42 combinations of `subject` and `sentence`.\n\nEach of those combinations then happens twice: once for each `context`, for a total of 84 possible unique utterances. (Note that there is actually one instance of missing data, so we only have 83.)\n\nNow, `context` varies within both `subject` and `sentence` - because each subject-sentence combination is spoken twice. But `gender` does not vary within `subject` in this instance; each participant is labelled as either male or female.\n\n#### Fit a full model\n\nSo, the full possible model we could fit is the following:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_polite <- lmer(pitch ~ gender*context + (1 + gender*context|sentence)\n                   + (1 + context|subject), data = politeness)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_polite)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: pitch ~ gender * context + (1 + gender * context | sentence) +  \n    (1 + context | subject)\n   Data: politeness\n\nREML criterion at convergence: 762.5\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.5176 -0.6586 -0.0519  0.5299  3.5191 \n\nRandom effects:\n Groups   Name               Variance Std.Dev. Corr             \n sentence (Intercept)        399.04   19.98                     \n          genderM            116.01   10.77    -0.97            \n          contextpol         217.07   14.73    -0.09 -0.06      \n          genderM:contextpol 291.53   17.07     0.11 -0.10 -0.78\n subject  (Intercept)        597.51   24.44                     \n          contextpol           1.21    1.10    1.00             \n Residual                    548.80   23.43                     \nNumber of obs: 83, groups:  sentence, 7; subject, 6\n\nFixed effects:\n                   Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)         260.686     16.802    5.825  15.515 5.85e-06 ***\ngenderM            -116.195     21.614    4.308  -5.376  0.00467 ** \ncontextpol          -27.400      9.148    6.763  -2.995  0.02091 *  \ngenderM:contextpol   15.892     12.188    8.169   1.304  0.22779    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) gendrM cntxtp\ngenderM     -0.703              \ncontextpol  -0.137  0.080       \ngndrM:cntxt  0.111 -0.140 -0.723\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThis full model has a singular fit, almost certainly because we don't have a sufficient sample size for 6 random effects plus fixed effects.\n\n#### Alternative (better) models\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_polite_red <- lmer(pitch ~ gender*context + (1|sentence) + (1|subject), \n                       data = politeness)\n\nsummary(lme_polite_red)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: pitch ~ gender * context + (1 | sentence) + (1 | subject)\n   Data: politeness\n\nREML criterion at convergence: 766.8\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-2.1191 -0.5604 -0.0768  0.5111  3.3352 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n sentence (Intercept) 218.3    14.77   \n subject  (Intercept) 617.1    24.84   \n Residual             637.4    25.25   \nNumber of obs: 83, groups:  sentence, 7; subject, 6\n\nFixed effects:\n                   Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)         260.686     16.348    5.737  15.946  5.7e-06 ***\ngenderM            -116.195     21.728    4.566  -5.348 0.004023 ** \ncontextpol          -27.400      7.791   69.017  -3.517 0.000777 ***\ngenderM:contextpol   15.572     11.095   69.056   1.403 0.164958    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) gendrM cntxtp\ngenderM     -0.665              \ncontextpol  -0.238  0.179       \ngndrM:cntxt  0.167 -0.252 -0.702\n```\n:::\n\n```{.r .cell-code}\nanova(lme_polite, lme_polite_red)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: politeness\nModels:\nlme_polite_red: pitch ~ gender * context + (1 | sentence) + (1 | subject)\nlme_polite: pitch ~ gender * context + (1 + gender * context | sentence) + (1 + context | subject)\n               npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)\nlme_polite_red    7 807.11 824.04 -396.55   793.11                    \nlme_polite       18 825.15 868.69 -394.58   789.15 3.952 11     0.9713\n```\n:::\n:::\n\n:::\n\nFitting a simpler model that contains only random intercepts, and comparing this to our more complicated model, shows no difference between the two - i.e., the simpler model is better.\n\nYou can keep comparing different models with different random effects structures, if you like, for practice - this dataset is a good sandbox for it!\n\n#### Check assumptions\n\nFor now, we're going to quickly check the assumptions of this simpler, intercepts-only model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_polite_red, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_polite_red, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-13-2.png){width=672}\n:::\n:::\n\n:::\n\nNot bad! Maybe one overly influential point (31) that deserves testing - you can try refitting the model without it, and seeing whether that changes the overall conclusions. The Q-Q plot veers off a tiny bit on the right hand side, but it's only really 3 residuals, so probably not worth worrying about.\n\nThe random intercepts look nicely normally distributed, and the posterior predictive check is quite convincing.\n\n#### Visualise the model\n\nLast but not least, let's visualise the model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_polite_red), aes(x = paste(gender, context), y = pitch, colour = gender)) +\n  geom_point(alpha = 0.7) +\n  stat_summary(fun = mean, geom = \"point\", size = 4) +\n  geom_line(aes(y = .fitted, group = paste(sentence, subject)))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-14-1.png){width=672}\n:::\n:::\n\n:::\n\nBased on the model output and the visualisation, we might therefore conclude that on average, speakers do use higher pitch for polite sentences compared to informal ones. Although there is a difference in pitch between male and female speakers overall, the effect of context is similar across genders.\n\nIn the final line of code for the plot, we've included the lines of best fit for each subject-sentence combination, which have fixed gradients but random intercepts. You can view sentence-wise lines of best fit (summarised across all 6 subjects) by writing `group = sentence`, or subject-wise lines of best fit (summarised across all 7 sentences) by writing `group = subject`. These tell you a little bit more about how much variation there is between the subjects and sentences.\n\n:::\n\n## Summary\n\nThis section has addressed how to fit models with multiple clustering variables, in scenarios where those clustering variables are not nested with one another.\n\nThis, along with the previous section on nested random effects, helps to extend the basic linear mixed effects model that was introduced earlier in the course. It emphasises the need to understand your variables and experimental design, in order to fit a suitable model.\n\n::: {.callout-tip}\n#### Key points\n- Two random effects are \"crossed\" if they interact to create multiple unique groups/combinations (as we see in factorial experimental designs), and are not nested\n- Random effects can be fully or partially crossed\n- Crossed random effects are fitted in `lme4` by creating multiple distinct random effects structures within the model formula\n:::\n\n",
+    "markdown": "---\ntitle: \"Crossed random effects\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nThe previous section of course materials discussed how to fit random effects in `lme4` when there are multiple clustering variables within the dataset/experimental design, with a focus on nested random effects. \n\nThis section similarly explains how to determine the random effects structure for more complex experimental designs, but deals with the situations where the clustering variables are not nested.\n\n## What are crossed random effects?\n\nWe describe two clustering variables as \"crossed\" if they can be combined in different ways to generate unique groupings, but one of them doesn't \"nest\" inside the other.\n\nThis concept is similar to the idea of a \"factorial\" design in regular linear modelling.\n\n### Fast food example\n\nFor instance, imagine a fast food franchise is looking to perform quality control checks across different branches. In 5 randomly selected branches, testers sample 6 different items of food from the menu. They sample the same 6 items in each branch, randomly selected from the wider menu.\n\nHere, both `branch` and `menu item` would be considered random effects, but one is not nested within the other. In this situation, item A in branch 1 and item A in branch 2 are not unconnected or unique; they are the same menu item. We would want to estimate a set of 6 random intercepts/slopes for `branch`, and separately, 5 random intercepts/slopes for `menu item`.\n\n![Branch and item as crossed effects](images_mixed-effects/fastfood_design.png){width=40%}\n\nA useful rule of thumb is that if the best way to draw out your experimental design is with a table or grid like this, rather than a tree-shaped diagram, then your effects are likely to be crossed rather than nested.\n\n## Fitting crossed random effects\n\nImplementing crossed random effects in your `lme4` model is very easy. You don't need to worry about additional syntax or explicit nesting.\n\nWe'll use a behavioural dataset from a cognitive psychology study, where the classic Stroop task was administered, as a test case.\n\n### The Stroop dataset\n\nIn the Stroop task, participants are asked to identify the colour of font that a word is written in. The words themselves, however, are the names of different colours. Typically, when the font colour does not match the word itself, people are slower to identify the font colour.\n\n![The Stroop task](images_mixed-effects/stroop.png){width=70%}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncognitive <- read_csv(\"data/stroop.csv\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nRows: 432 Columns: 4\n── Column specification ────────────────────────────────────────────────────────\nDelimiter: \",\"\nchr (2): subject, congruency\ndbl (2): item, reaction_time\n\nℹ Use `spec()` to retrieve the full column specification for this data.\nℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.\n```\n:::\n:::\n\n:::\n\nThis dataset contains four variables:\n\n- `subject`, of which there are 12\n- `item`, referring to task item, of which there are 36 in total\n- `congruency`, whether the colour of the font matched the word or not (congruent vs incongruent)\n- `reaction_time`, how long it took the participant to give a response (ms)\n\nOf the 36 items, 18 are congruent, and 18 are incongruent. Each subject in the study saw and responded to all 36 items, in a randomised (counterbalanced) order.\n\nOur fixed predictor is `congruency`, and we can treat both `subject` and `item` as clustering variables that create non-independent clusters amongst the 432 total observations of `reaction_time`.\n\nTherefore, we fit the following model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_cognitive <- lmer(reaction_time ~ congruency + (1|item) +\n                        (1+congruency|subject), data=cognitive)\n\nsummary(lme_cognitive)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: reaction_time ~ congruency + (1 | item) + (1 + congruency | subject)\n   Data: cognitive\n\nREML criterion at convergence: 4041.3\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.4839 -0.6472  0.0008  0.5949  3.2836 \n\nRandom effects:\n Groups   Name                  Variance Std.Dev. Corr \n item     (Intercept)            42.2     6.496        \n subject  (Intercept)           140.4    11.847        \n          congruencyincongruent 159.4    12.626   -0.68\n Residual                       609.9    24.696        \nNumber of obs: 432, groups:  item, 36; subject, 12\n\nFixed effects:\n                      Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)            247.995      4.107  14.240   60.39  < 2e-16 ***\ncongruencyincongruent   58.222      4.860  15.581   11.98 2.87e-09 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ncngrncyncng -0.686\n```\n:::\n:::\n\n:::\n\nIn this model, we've included a fixed effect of congruency, as well as three random effects:\n\n- random intercepts for `item`\n- random intercepts for `subject`\n- random slopes for `congruency` on `subject`\n\nWe do not fit random slopes for `congruency` on `item`, as `congruency` does not vary within individual task items.\n\nCrucially, `item` is not nested within `subject`. Item 4 for subject A is exactly the same as item 4 for subject E - we haven't given each subject their own set of items. You can see from the model output that we have therefore fitted 12 random intercepts/slopes for `subject`, and 36 random intercepts for `item`.\n\nThis allows us to capture the fixed relationship between `congruency` and `reaction_time`, with both `subject` and `item` accounted for.\n\n## Partially crossed random effects\n\nIn the example above, each participant in the study experienced each of the task items. We'd call this a fully-crossed design (or perhaps, a full factorial design). But, if each participant had only responded to a randomised subset of the task items, then we would instead say that the `item` and `subject` random effects are *partially* crossed.\n\nPartially crossed designs are common in research, such as when using the classic Latin square design. We'll look at an example of that, using the `abrasion` dataset.\n\n### The abrasion dataset\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nabrasion <- read_csv(\"data/abrasion.csv\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nRows: 16 Columns: 4\n── Column specification ────────────────────────────────────────────────────────\nDelimiter: \",\"\nchr (1): material\ndbl (3): run, position, wear\n\nℹ Use `spec()` to retrieve the full column specification for this data.\nℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.\n```\n:::\n:::\n\n:::\n\nIn this experiment, four different types of `material` are being tested (A, B, C and D) for their wear, by feeding them into a wear-testing machine. \n\nThe machine could process four material samples at a time in each `run`, and it's believed that there are differences between runs. There is also evidence that the `position` within the machine might also generate some differences in wear. Therefore, four runs were made in total, with each `material` placed at each different `position` across the `run`. For each of the 16 samples, the response variable `wear` is assessed by measuring the loss of weight in 0.1mm of material over the testing period.\n\nOn first read, it might sound as if `position` and `run` are somehow nested effects, but actually, they represent a Latin square design:\n\n![Latin square design of abrasion experiment](images_mixed-effects/latin_square.png){width=30%}\n\nA Latin square is a particular type of randomised design, in which each experimental condition (in this case, materials A through D) appear once and only once in each column and row of the design matrix. This sort of randomisation might be used to randomise the layout of plants in greenhouses, or samples in wells on plates.\n\nIn the `abrasion` example, this design matrix is actually stored within the structure of the dataset itself. You can reconstruct it by looking at the raw data, or by using the following code:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nmatrix(abrasion$material, 4, 4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n     [,1] [,2] [,3] [,4]\n[1,] \"C\"  \"A\"  \"D\"  \"B\" \n[2,] \"D\"  \"B\"  \"C\"  \"A\" \n[3,] \"B\"  \"D\"  \"A\"  \"C\" \n[4,] \"A\"  \"C\"  \"B\"  \"D\" \n```\n:::\n:::\n\n:::\n\nThe four possible positions are the same across each run, meaning that `position` is not nested within `run`, but is instead crossed. Position 1 in run 1 is linked to position 1 in run 3, for instance - we wouldn't consider these to be \"unique\" positions, but would like to group them together when estimating variance in our model.\n\nBut, because it's impossible for each `material` to experience each `position` in each `run`, this is a partially crossed design rather than a fully crossed one.\n\n### Fitting partially crossed random effects\n\nThe good news is that fitting this in `lme4` doesn't require any extra knowledge or special syntax. So long as the dataset is properly coded and accurately represents the structure of the experimental design, the code is identical to fully crossed random effects.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_abrasion <- lmer(wear ~ material + (1|run) + (1|position), data = abrasion)\n\nsummary(lme_abrasion)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: wear ~ material + (1 | run) + (1 | position)\n   Data: abrasion\n\nREML criterion at convergence: 100.3\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-1.08973 -0.30231  0.02697  0.42254  1.21052 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n run      (Intercept)  66.90    8.179  \n position (Intercept) 107.06   10.347  \n Residual              61.25    7.826  \nNumber of obs: 16, groups:  run, 4; position, 4\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  265.750      7.668   7.475  34.656 1.57e-09 ***\nmaterialB    -45.750      5.534   6.000  -8.267 0.000169 ***\nmaterialC    -24.000      5.534   6.000  -4.337 0.004892 ** \nmaterialD    -35.250      5.534   6.000  -6.370 0.000703 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n          (Intr) matrlB matrlC\nmaterialB -0.361              \nmaterialC -0.361  0.500       \nmaterialD -0.361  0.500  0.500\n```\n:::\n:::\n\n:::\n\nIf you check the output, you can see that we do indeed have 4 groups each for `run` and `position`, which is correct. The model has done what we intended, and we could now go on to look at the differences between `material`, with the nuisance effects of `run` and `position` having been accounted for.\n\n## Exercises\n\n### Penicillin {#sec-exr_penicillin}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll use the internal `Penicillin` dataset from `lme4`.\n\nThese data are taken from a study that assessed the concentration of a penicillin solution, by measuring how it inhibits the growth of organisms on a plate of agar. \n\nSix samples of the penicillin solution were taken. On each plate of agar, a few droplets of each of the six samples were allowed to diffuse into the medium. The diameter of the inhibition zones created could be measured, and is related in a known way to the concentration of the penicillin.\n\nThere are three variables:\n\n- `sample`, the penicillin sample (A through F, 6 total)\n- `plate`, the assay plate (a through x, 24 total)\n- `diameter`, of the zone of inhibition (measured in mm)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Penicillin\")\n```\n:::\n\n:::\n\nFor this exercise:\n\n1. Fit a sensible model to the data\n2. Perform significance testing/model comparison\n3. Check the model assumptions\n4. Visualise the model\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\nThis is quite a simple dataset, in that there are only two variables besides the response. But, given the research question, we likely want to consider both of these two variables as random effects.\n\nHow does that work? This is the first random-effects-only model that we've come across. (Well, technically there are still fixed effects - every time you estimate a random effect, a fixed effect will always be estimated as part of that.)\n\n#### Consider the experimental design\n\nWe have two variables for which we'd like to estimate random effects, and with no explicit fixed predictors, all that's available to us is random intercepts.\n\nThe two variables, `plate` and `sample`, are crossed in a factorial design (each of the six samples is included on each of the 24 plates). So, we want to fit these as crossed random effects.\n\n#### Fit the model\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_penicillin <- lmer(diameter ~ (1|sample) + (1|plate), data = Penicillin)\n\nsummary(lme_penicillin)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: diameter ~ (1 | sample) + (1 | plate)\n   Data: Penicillin\n\nREML criterion at convergence: 330.9\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.07923 -0.67140  0.06292  0.58377  2.97959 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n plate    (Intercept) 0.7169   0.8467  \n sample   (Intercept) 3.7311   1.9316  \n Residual             0.3024   0.5499  \nNumber of obs: 144, groups:  plate, 24; sample, 6\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  22.9722     0.8086  5.4866   28.41 3.62e-07 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis shows us that the average diameter of the inhibition zone is around 23mm. Looking at the random effects, there's more variance due to `sample` than there is to `plate`.\n\n#### Visualise the model\n\nWe can see these different variances by visualising the model. Here, a jagged line of best fit is drawn for each of the samples; the overall shape of the lines are the same, since we have random intercepts only. You can see that the spread within each of the lines (which represents variance for `plate`) is overall less than the spread of the lines themselves (which represents the variance for `sample`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_penicillin), aes(x = plate, y = diameter, colour = sample)) + \n  geom_jitter(width = 0.2, height = 0) +\n  geom_line(aes(y = .fitted, group = sample))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-9-1.png){width=672}\n:::\n:::\n\n:::\n\n:::\n\n:::\n\n### Politeness {#sec-exr_solutions}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll use a real dataset called `politeness`, taken from a paper by Winter & Grawunder ([2012](https://doi.org/10.1016/j.wocn.2012.08.006)).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npoliteness <- read_csv(\"data/politeness.csv\")\n```\n:::\n\n:::\n\nThe study was designed to investigate whether voice pitch is higher in polite contexts than in informal ones, and whether this effect is consistent between male and female speakers.\n\nThere are five variables in this dataset:\n\n- `subject`, the participant (6 total)\n- `gender`, treated here as a binary categorical variable (male vs female)\n- `sentence`, the sentence that was spoken (7 total)\n- `context`, whether the speaker was in a polite or informal setting\n- `pitch`, the measured voice pitch across the sentence\n\nEach participant in the study spoke each of the seven sentences twice, once in each of the two contexts.\n\nIs there a difference between vocal pitch in different contexts? Is this effect consistent for male and female speakers?\n\nTo answer this question:\n\n1. Consider which variables you want to treat as fixed and random effects\n2. Try drawing out the structure of the dataset, and think about what levels the different variables are varying at\n3. You may want to assess the quality and significance of the model to help you draw your final conclusions\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\n#### Consider the experimental design\n\nIn this dataset, there are two variables for which we might want to fit random effects: `subject` and `sentence`. The particular sets of participants and sentences have been chosen at random from the larger population of participants/speakers and possible sentences that exist.\n\nThe other two variables, `gender` and `context`, are fixed effects of interest.\n\nLet's sketch out the design of this experiment. You could choose to visualise/sketch out this design in a couple of ways:\n\n![Experimental design for voice pitch experiment #1](images_mixed-effects/politeness_design.png){width=60%}\n\n![Experimental design for voice pitch experiment #2](images_mixed-effects/politeness_design2.png){width=60%}\n\nThe `subject` and `sentence` variables are not nested within one another - they're crossed. There are 42 combinations of `subject` and `sentence`.\n\nEach of those combinations then happens twice: once for each `context`, for a total of 84 possible unique utterances. (Note that there is actually one instance of missing data, so we only have 83.)\n\nNow, `context` varies within both `subject` and `sentence` - because each subject-sentence combination is spoken twice. But `gender` does not vary within `subject` in this instance; each participant is labelled as either male or female.\n\n#### Fit a full model\n\nSo, the full possible model we could fit is the following:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_polite <- lmer(pitch ~ gender*context + (1 + gender*context|sentence)\n                   + (1 + context|subject), data = politeness)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_polite)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: pitch ~ gender * context + (1 + gender * context | sentence) +  \n    (1 + context | subject)\n   Data: politeness\n\nREML criterion at convergence: 762.5\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.5177 -0.6587 -0.0521  0.5299  3.5192 \n\nRandom effects:\n Groups   Name               Variance Std.Dev. Corr             \n sentence (Intercept)        398.96   19.97                     \n          genderM            116.03   10.77    -0.97            \n          contextpol         217.26   14.74    -0.09 -0.06      \n          genderM:contextpol 292.04   17.09     0.11 -0.10 -0.78\n subject  (Intercept)        597.74   24.45                     \n          contextpol           1.21    1.10    1.00             \n Residual                    548.79   23.43                     \nNumber of obs: 83, groups:  sentence, 7; subject, 6\n\nFixed effects:\n                   Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)         260.686     16.804    5.822  15.513 5.88e-06 ***\ngenderM            -116.195     21.618    4.306  -5.375  0.00468 ** \ncontextpol          -27.400      9.149    6.760  -2.995  0.02094 *  \ngenderM:contextpol   15.892     12.191    8.183   1.304  0.22783    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) gendrM cntxtp\ngenderM     -0.703              \ncontextpol  -0.137  0.080       \ngndrM:cntxt  0.111 -0.141 -0.723\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThis full model has a singular fit, almost certainly because we don't have a sufficient sample size for 6 random effects plus fixed effects.\n\n#### Alternative (better) models\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_polite_red <- lmer(pitch ~ gender*context + (1|sentence) + (1|subject), \n                       data = politeness)\n\nsummary(lme_polite_red)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: pitch ~ gender * context + (1 | sentence) + (1 | subject)\n   Data: politeness\n\nREML criterion at convergence: 766.8\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-2.1191 -0.5604 -0.0768  0.5111  3.3352 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n sentence (Intercept) 218.3    14.77   \n subject  (Intercept) 617.1    24.84   \n Residual             637.4    25.25   \nNumber of obs: 83, groups:  sentence, 7; subject, 6\n\nFixed effects:\n                   Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)         260.686     16.348    5.737  15.946  5.7e-06 ***\ngenderM            -116.195     21.728    4.566  -5.348 0.004023 ** \ncontextpol          -27.400      7.791   69.017  -3.517 0.000777 ***\ngenderM:contextpol   15.572     11.095   69.056   1.403 0.164958    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) gendrM cntxtp\ngenderM     -0.665              \ncontextpol  -0.238  0.179       \ngndrM:cntxt  0.167 -0.252 -0.702\n```\n:::\n\n```{.r .cell-code}\nanova(lme_polite, lme_polite_red)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: politeness\nModels:\nlme_polite_red: pitch ~ gender * context + (1 | sentence) + (1 | subject)\nlme_polite: pitch ~ gender * context + (1 + gender * context | sentence) + (1 + context | subject)\n               npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)\nlme_polite_red    7 807.11 824.04 -396.55   793.11                    \nlme_polite       18 825.15 868.69 -394.58   789.15 3.952 11     0.9713\n```\n:::\n:::\n\n:::\n\nFitting a simpler model that contains only random intercepts, and comparing this to our more complicated model, shows no difference between the two - i.e., the simpler model is better.\n\nYou can keep comparing different models with different random effects structures, if you like, for practice - this dataset is a good sandbox for it!\n\n#### Check assumptions\n\nFor now, we're going to quickly check the assumptions of this simpler, intercepts-only model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_polite_red, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_polite_red, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-13-2.png){width=672}\n:::\n:::\n\n:::\n\nNot bad! Maybe one overly influential point (31) that deserves testing - you can try refitting the model without it, and seeing whether that changes the overall conclusions. The Q-Q plot veers off a tiny bit on the right hand side, but it's only really 3 residuals, so probably not worth worrying about.\n\nThe random intercepts look nicely normally distributed, and the posterior predictive check is quite convincing.\n\n#### Visualise the model\n\nLast but not least, let's visualise the model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_polite_red), aes(x = paste(gender, context), y = pitch, colour = gender)) +\n  geom_point(alpha = 0.7) +\n  stat_summary(fun = mean, geom = \"point\", size = 4) +\n  geom_line(aes(y = .fitted, group = paste(sentence, subject)))\n```\n\n::: {.cell-output-display}\n![](crossed-random-effects_files/figure-html/unnamed-chunk-14-1.png){width=672}\n:::\n:::\n\n:::\n\nBased on the model output and the visualisation, we might therefore conclude that on average, speakers do use higher pitch for polite sentences compared to informal ones. Although there is a difference in pitch between male and female speakers overall, the effect of context is similar across genders.\n\nIn the final line of code for the plot, we've included the lines of best fit for each subject-sentence combination, which have fixed gradients but random intercepts. You can view sentence-wise lines of best fit (summarised across all 6 subjects) by writing `group = sentence`, or subject-wise lines of best fit (summarised across all 7 sentences) by writing `group = subject`. These tell you a little bit more about how much variation there is between the subjects and sentences.\n\n:::\n\n:::\n\n## Summary\n\nThis section has addressed how to fit models with multiple clustering variables, in scenarios where those clustering variables are not nested with one another.\n\nThis, along with the previous section on nested random effects, helps to extend the basic linear mixed effects model that was introduced earlier in the course. It emphasises the need to understand your variables and experimental design, in order to fit a suitable model.\n\n::: {.callout-tip}\n#### Key points\n- Two random effects are \"crossed\" if they interact to create multiple unique groups/combinations (as we see in factorial experimental designs), and are not nested\n- Random effects can be fully or partially crossed\n- Crossed random effects are fitted in `lme4` by creating multiple distinct random effects structures within the model formula\n:::\n\n",
     "supporting": [
       "crossed-random-effects_files"
     ],
diff --git a/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-1.png b/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-1.png
index 5d96ff6..94a8b9e 100644
Binary files a/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-1.png and b/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-1.png differ
diff --git a/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-2.png b/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-2.png
index 182099f..d9baac3 100644
Binary files a/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-2.png and b/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-13-2.png differ
diff --git a/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-9-1.png b/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-9-1.png
index dd13328..e3ece8c 100644
Binary files a/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-9-1.png and b/_freeze/materials/crossed-random-effects/figure-html/unnamed-chunk-9-1.png differ
diff --git a/_freeze/materials/fitting-mixed-models/execute-results/html.json b/_freeze/materials/fitting-mixed-models/execute-results/html.json
index 8b90686..83a23a9 100644
--- a/_freeze/materials/fitting-mixed-models/execute-results/html.json
+++ b/_freeze/materials/fitting-mixed-models/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "5201d4f31b76c310df90dd4f3f4b24c7",
+  "hash": "c13895c5eae1494e9770bb4934d1ad9f",
   "result": {
-    "markdown": "---\ntitle: \"Fitting mixed models\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nThe course materials so far have discussed the motivation behind mixed effects models, and why we might choose to include random effects.\n\nIn this section, we will learn how to fit these models in R, and how to visualise the results.\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll be using the `lme4` package in R, which is by far the most common and best choice of package for this type of model. (It's an update of the older package `nlme`, which you might also see people using.) The syntax is nice and simple and extends what we've been doing so far with the `lm()` function in (hopefully!) a very intuitive way. \n\nThe package also contains functions for fitting non-linear mixed effects and generalised mixed effects models - though we won't be focusing on those here, it's nice to know that the package can handle them in case you ever choose to explore them in future!\n\nFor Python users, the `pymer4` package in Python allows you to \"borrow\" most of the functionality of R's `lme4`, though it still has many bugs that make it difficult to run on any system except Linux. There is also some functionality for fitting mixed models using `statsmodels` in Python. We won't be using those packages here, but you may wish to explore them if you are a die-hard Python user!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# load the required packages for fitting & visualising\nlibrary(tidyverse)\nlibrary(lme4)\nlibrary(broom)\nlibrary(broom.mixed)\nlibrary(patchwork)\n```\n:::\n\n:::\n\n## The sleepstudy data\n\nWe'll be using the internal `sleepstudy` dataset from the `lme4` package in R as an example (this dataset is also provided as a `.csv` file, if you'd prefer to read it in or are using Python).\n\nThis is a simple dataset taken from a real study that investigated the effects of sleep deprivation on reaction times in 18 subjects, and has just three variables: \n\n- `Reaction`, reaction time in milliseconds\n- `Days`, number of days of sleep deprivation\n- `Subject`, subject ID\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"sleepstudy\")\n```\n:::\n\n:::\n\nHave a look at the data more closely. You'll notice that for each subject, we've got 10 measurements, one for each day of sleep deprivation. This repeated measurement means that our data are not independent of one another; for each subject in the study we would expect measurements of reaction times to be more similar to one another than they are to reaction times of another subject.\n\nLet's start by doing something that we know is wrong, and ignoring this dependence for now. We'll begin by visualising the data with a simple scatterplot.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  geom_smooth(method = \"lm\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n:::\n\nThis gives the overall impression that we might expect - reaction time does seem to slow as people become more sleep deprived.\n\nBut, as we've already pointed out, ignoring the fact that subjects' own reaction times will be more similar to themselves than to another subject's, we should make a point of accounting for this.\n\n## Adding random intercepts\n\nIn this dataset, we want to treat `Subject` as a random effect, which means fitting a mixed effects model. Why `Subject`? There are two things at play here that make us what to treat this as a random effect:\n\n1. `Subject` is a *grouping* variable within our dataset, and is causing us problems with independence.\n2. It's not these specific 18 subjects that we're interested in - they instead represent 18 random selections from a broader distribution/population of subjects that we could have tested. We would like to generalise our findings to this broader population.\n\nTo fit the model, we use a different function to what we've used so far, but the syntax looks very similar. The difference is the addition of a new term `(1|Subject)`, which represents our random effect.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\n# construct a linear mixed effects model with Subject\n# as a random effect\nlme_sleep1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)\n\n# summarise the model\nsummary(lme_sleep1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (1 | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1786.5\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.2257 -0.5529  0.0109  0.5188  4.2506 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n Subject  (Intercept) 1378.2   37.12   \n Residual              960.5   30.99   \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept) 251.4051     9.7467  22.8102   25.79   <2e-16 ***\nDays         10.4673     0.8042 161.0000   13.02   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.371\n```\n:::\n:::\n\n:::\n\nOkay. The syntax might have looked similar to a standard linear model, but the output does not.\n\nIn later sections of the course, we'll discuss how to test significance based on this sort of output. In the meantime, however, to help get our head around the model we've fitted, we're going to visualise it.\n\nHere, we'll make use of the `broom` and `broom.mixed` packages to extract fitted values from the models - the `augment` function essentially creates a dataframe that contains both the raw data and the fitted values (along with residuals and other useful values), which helps a lot in plotting.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\n# create a linear model - we'll use this in our graph\nlm_sleep <- lm(Reaction ~ Days, data = sleepstudy)\n\n# set up our basic plot\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  \n  # create separate plots for each subject in the sample\n  # and add the data points\n  facet_wrap(facets = vars(Subject), nrow = 3) +\n  geom_point() +\n  \n  # this adds the line of best fit for the whole sample\n  # (without the random effect), using coefficients\n  # from our simple linear model object\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) + \n  \n  # and finally, this will add different lines of best fit\n  # for each subject as calculated in our mixed model object\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted), colour = \"blue\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n:::\n\nEach plot represents a different subject's data. On each plot, we've added the following:\n\n* in black we have the same overall line of best fit from our original (incorrect) linear model.\n* in blue are the individual lines of best fit for each subject. These lines move up and down the plot relative to the global line of best fit. This reflects the fact that, though all subjects are declining as they become more sleep deprived, some of them started with slower baseline reaction times, with different y-intercepts to match. Subject 310, for instance, seems to have pretty good reflexes relative to everyone else, while subject 337 isn't quite as quick on the trigger.\n\nWe can visualise the same model slightly differently, to allow us to look at the set of lines of best fit together. Here, we will create a plot that doesn't have facets (but still shows us the same model predictions):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  \n  # include the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) +\n  \n  # include individual lines of best fit\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted, group = Subject), \n            colour = \"blue\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\n:::\n\nThe global line of best fit is in the middle (in black), with each of the individual subject lines of best fit around it.\n\nFrom this plot, we can see that the *gradient* of each of these blue lines is still the same as the overall line of best fit. This is because we've added a random intercept in our model, but have **kept the same slope**. \n\nThis reflects an underlying assumption that the relationship between sleep deprivation and reaction time is the same - i.e. that people get worse at the same rate - even if their starting baselines differ.\n\nWe might not think that this assumption is a good one, however. And that's where random slopes come in.\n\n## Adding random slopes\n\nTo add a random slope as well as a random intercept, we need to alter the syntax slightly for our random effect. Now, instead of `(1|Subject)`, we'll instead use `(1 + Days|Subject)`. This allows the relationship between `Days` and `Reaction` to vary between subjects.\n\nLet's fit that new model and summarise it.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_sleep2 <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)\n\nsummary(lme_sleep2)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (1 + Days | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1743.6\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.9536 -0.4634  0.0231  0.4634  5.1793 \n\nRandom effects:\n Groups   Name        Variance Std.Dev. Corr\n Subject  (Intercept) 612.10   24.741       \n          Days         35.07    5.922   0.07\n Residual             654.94   25.592       \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***\nDays          10.467      1.546  17.000   6.771 3.26e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.138\n```\n:::\n:::\n\n:::\n\nWe can go ahead and add our new lines (in red) to our earlier facet plot. Only the last line of code is new here:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  facet_wrap(facets = vars(Subject), nrow = 3) +\n  geom_point() +\n  \n  # the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) + \n  \n  # our previous lines of best fit, with random intercepts\n  # but constant slope\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted), colour = \"blue\") +\n  \n  # our lines of best with random intercepts and random slopes\n  geom_line(data = augment(lme_sleep2), aes(y = .fitted), colour = \"red\") \n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-9-1.png){width=672}\n:::\n:::\n\n:::\n\nWhile for some of our subjects, the red, blue and black lines look quite similar, for others they diverge a fair amount. Subjects 309 and 335, for instance, are displaying a remarkably flat trend that suggests they're not really suffering delays in reaction time from their sleep deprivation very much at all, while subject 308 definitely seems to struggle without their eight hours.\n\nLet's compare those different red lines, representing our random intercepts & slopes model, on a single plot. This is the same code as we used a couple of plots ago, except the last line is now different:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  \n  # include the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) +\n\n  # include individual lines of best fit\n  geom_line(data = augment(lme_sleep2), aes(y = .fitted, group = Subject), \n            colour = \"red\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n:::\n\n:::\n\nVisualising all of our lines of best fit simultaneously like this makes it clearer what it means to have both random intercepts and random slopes. Each line of best fit starts in a slightly different place, and also has a different gradient.\n\n### Fitting random slopes without random intercepts\n\nIt's quite unusual to fit a model with random slopes but without random intercepts - but it's absolutely possible.\n\nThe `lme4` package includes \"implicit random intercepts\", meaning that we don't actually need to specify the 1 in our random effects structure for random intercepts to be fitted. \n\nTry running the following, and compare the two outputs - these models are identical:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_explicit <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)\nlme_implicit <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)\n\nsummary(lme_explicit)\nsummary(lme_implicit)\n```\n:::\n\n:::\n\nIf we were determined to remove the random intercepts, we have to explicitly tell `lme4` not to fit them, like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_slopesonly <- lmer(Reaction ~ Days + (0 + Days|Subject), data = sleepstudy)\n\nsummary(lme_slopesonly)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (0 + Days | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1766.5\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.5104 -0.5588  0.0541  0.6244  4.6022 \n\nRandom effects:\n Groups   Name Variance Std.Dev.\n Subject  Days  52.71    7.26   \n Residual      842.03   29.02   \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error     df t value Pr(>|t|)    \n(Intercept)   251.41       4.02 161.00  62.539  < 2e-16 ***\nDays           10.47       1.87  21.68   5.599 1.32e-05 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.340\n```\n:::\n:::\n\n:::\n\nYou should see that the random intercepts have now disappeared from the output.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  \n  # include the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) +\n  \n  # include individual lines of best fit\n  geom_line(data = augment(lme_slopesonly), aes(y = .fitted, group = Subject), \n            colour = \"purple\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n:::\n\nIndeed, looking at each of the lines of best fit here, we can see that they all have the same intercept (i.e., the same value of y when x = 0), but with differing slopes.\n\n## Two-level models\n\nAll of the mixed models we've fitted to these data so far (with random slopes and/or random intercepts) can be described as two-level models.\n\nA standard linear model would be a one-level model, where we have true independence and no clustering/grouping variables.\n\nBut for this dataset, the `Subject` variable creates clusters, so we have a different set of `Reaction` times for each `Subject`. Whether we choose to fit random intercepts, slopes, or both, this overall structure between the variables remains the same, creating a hierarchy with two levels. Hence, a two-level model!\n\nLater in the course, we will look at more complicated models, where we have multiple clustering variables that we want to generate random effects for, due to more complex experimental designs.\n\n### Equations & notation\n\nFor those who are interested in notation and equations, the drop-down box below gives a little more detail on how this works for a linear mixed effects model. \n\nThis subsection skews a bit more in the maths direction, and won't be needed by everyone who uses mixed models in their research. But, it's included here as bonus material for anyone who finds equations helpful, or for those who might need this for reporting on and reading about mixed models!\n\n::: {.callout-note collapse=\"true\"}\n#### Linear mixed models notation\n\nFor the `sleepstudy` dataset, a standard linear model `Reaction ~ Days` would be written in the format:\n\n$$\ny = \\beta_{0} + \\beta_{1}x_{1} + \\epsilon\n$$\n\nThe $x$ variable here is, of course, `Days`, and $y$ is our response variable `Reaction`.\n\nIn this equation, $\\beta_{0}$ represents the intercept, and $\\beta_{1}$ represents the slope or gradient. Each of these is either a single fixed number, or, in the case of a categorical predictor, a set of fixed means for the groups. \n\nThe $\\epsilon$ at the end represents our error, or noise. In the case of linear model, we measure this by calculating the residuals. As you already know from standard linear models, we assume that these residuals are random and normally distributed. So, we could additionally note that:\n\n$$\n\\epsilon ∼ N(0, \\sigma^2)\n$$\n\nThis is just fancy shorthand for: \"the errors are drawn from a normal distribution, which has a mean of 0 and variance $\\sigma^2$\". This variance is something we need to estimate, in order to perform our regression analysis.\n\n#### Random intercepts model\n\nWhen we add random effects to deal with the clustering variable `Subject`, however, we are doing more than just estimating a fixed mean or coefficient.\n\nThat's because we're actually estimating a *distribution* of coefficients whenever we estimate a random effect. \n\nSo, when we include random intercepts in our model `Reaction ~ Days + (1|Subject)`, we are not just estimating three numbers. We estimate an intercept for each `Subject` in the dataset. And, we are assuming that those intercepts have been drawn from a normal distribution with mean 0 - this is a baked-in assumption of a linear mixed model (more on assumptions in a later section).\n\nFor this model, the equation for our model is now written like this:\n\n$$\ny_{ij} = \\beta_{0j} + \\beta_{1}x_{ij} + \\epsilon_{ij}\n$$\n\nWhere have these extra subscript letters come from?\n\nWell, previously we didn't bother with this, because a standard linear model only has one level. Now, we have a two-level model, so we use $i$ and $j$ to refer to those different levels.\n\nHere, $j$ would represent the different levels of our clustering variable `Subject`. The letter $i$ then represents the set of values within each cluster $j$. So, $ij$ in our subscripts refers to our entire set of response/outcome values `Reaction`, which here are measured at the level of individual `Days` within each `Subject`.\n\nThe term $\\beta_{0j}$ tells us that we have random intercepts. For each of our $j$ clusters, there is a separate $\\beta_{0}$. You will sometimes see a random effect broken down further, like this:\n\n$$\n\\beta_{0j} = \\gamma_{00} + U_{0j}\n$$\n\nHere, the $\\gamma_{00}$ refers to the \"grand intercept\", i.e., the average intercept across all groups. This is a fixed effect, one single value that doesn't change, and we need to estimate it in order to be able to then estimate $U_{0j}$. It's conventional - though not compulsory - to use $\\gamma$ to represent fixed/global coefficients like this.\n\nThe $U_{0j}$ bit then refers to the set of deviations from that grand intercept, one for each of your clusters/groups. These deviations should be normally distributed with mean 0 and variance $\\tau^2_{00}$. Again, it's conventional to use $\\tau^2$ to refer to the variance of random effects specifically (rather than $\\sigma^2$, which we used for the variance of our residuals). You will sometimes see people use letters other than $U$ to refer to the set of deviations/coefficients, especially when there are more than two levels in the model (more on that in a later section.)\n\n$$\nU_{0j} ∼ N(0, \\tau^2_{00})\n$$\n\nOnce again, we also assume that our errors $\\epsilon_{ij}$ are normally distributed around 0 as well, just as we did with the standard linear model.\n\n#### Random intercepts & random slopes model\n\nNow let's look at what happens when we add a second random effect, as in the model `Reaction ~ Days + (1 + Days|Subject)`. The equation now looks like this.\n\nLevel 1:\n\n$$\ny_{ij} = \\beta_{0j} + \\beta_{1j}x_{ij} + \\epsilon_{ij}\n$$\n\nLevel 2:\n\n$$\n\\beta_{0j} = \\gamma_{00} + U_{0j}\n$$\n$$\n\\beta_{1j} = \\gamma_{10} + U_{1j}\n$$\n\nwhere,\n\n$$\n\\left( \\begin{array}{c} U_{0j} \\\\ U_{1j} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\\\ 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} & \\rho_{01} \\\\ \\rho_{01} &  \\tau^2_{10} \\end{array} \\right)\n$$\n\nWe now have two random effects instead of one. We can tell this because we're now writing $\\beta_{1j}$ and specifying an additional equation for it, instead of just writing $\\beta_{1}$ for a single fixed value of the slope.\n\nAdmittedly, that last bit looks more complicated than before. We won't go into too much detail, but what's happening on the right is known as a \"variance-covariance\" matrix. When you include multiple random effects in a mixed model, the correlations between those random effects are also estimated. So we actually make assumptions about the joint distribution that all of the random effects are being drawn from. If this statement alone doesn't satisfy your curiosity, you might find [this link](https://rpubs.com/yjunechoe/correlationsLMEM) a useful resource with some handy visualisations of how this works!\n\nIf that's a bit more complicated than you're interested in, don't worry. You don't need to understand all that maths to be able to used a mixed effects model. It boils down to the same thing: that random effects are a set of coefficients with some variance, and we make assumptions about their distribution(s).\n\n#### A helpful summary\n\nThis table summarises and defines each of the terms included in the equation(s) above.\n\n| Parameter | Description |\n|:-|:-----|\n|$y_{ij}$|Response/outcome; value of `Reaction` for subject $j$ on day $i$|\n|$x_{ij}$|Predictor; value of `Days` for subject $j$ on day $i$|\n|$\\beta_{0j}$|Level 1 intercept parameter, containing a fixed and a random effect|\n|$\\gamma_{00}$|Fixed effect; grand (average) intercept|\n|$U_{0j}$|Random effect; deviation from grand intercept for subject $j$|\n|$\\beta_{1j}$|Level 1 slope parameter, containing a fixed and a random effect|\n|$\\gamma_{10}$|Fixed effect; grand (average) slope|\n|$U_{1j}$|Random effect; deviation from grand slope for subject $j$|\n|$\\epsilon_{ij}$|Error/residual (difference between real value and predicted value) of `Reaction` for subject $j$ on day $i$|\n|$\\tau^2_{00}$|Variance of random intercepts $U_{0j}$|\n|$\\tau^2_{10}$|Variance of random slopes $U_{1j}$|\n|$\\rho_{01}$|Covariance between random effects $U_{0j}$ and $U_{1j}$|\n\n:::\n\n### Sharing information\n\nFinally, while we're working with the `sleepstudy` dataset, let's take the opportunity to visualise something else that's special about random effects (which we'll discuss more later in the course): sharing information between levels.\n\nAs an extra observation, let's use `geom_smooth` to add the lines of best fit that we would see if we fitted each subject with their own individual regression:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  facet_wrap(facets = vars(Subject), nrow = 3) +\n  geom_point() +\n  \n  # the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) + \n  \n  # random intercepts only\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted), colour = \"blue\") +\n  \n  # random intercepts and random slopes\n  geom_line(data = augment(lme_sleep2), aes(y = .fitted), colour = \"red\") +\n  \n  # individual regression lines for each individual\n  geom_smooth(method = \"lm\", se = FALSE, colour = \"green\", linewidth = 0.5)\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-14-1.png){width=672}\n:::\n:::\n\n:::\n\nHere, the black line (which is the same on every plot) represents a global line of best fit - this is what we would see if we ignored the `Subject` variable entirely and just did a simple linear regression. This is called **complete pooling**.\n\nThe green lines, meanwhile, represent what happens when we split our dataset into separate groups by `Subject`, and fit individual regressions between `Reaction` and `Days` that are completely independent of each other. This is called **no pooling**, i.e., treating `Subject` as a fixed effect.\n\nThe blue and red lines represent our mixed effects models - the difference between the two is whether we allowed the slope to vary randomly, as well as the random intercept. In both cases, we are using something called **partial pooling**. \n\nComparing the green and red lines in particular allows us to see the phenomenon of \"shrinkage\", which occurs because of partial pooling. \n\nThe red lines are all closer to the black line than the green line is. In other words, the predictions for our mixed effects model are more similar to the global line of best fit, than the individual regression lines are to that global line. We say that the red lines (our mixed model) are showing some shrinkage towards the global line; Subjects 330, 335 and 370 perhaps show this best. \n\nThis happens because, when random effects are estimated, information is shared between the different levels of the random effect (in this case, between subjects). Though we still estimate separate slopes and/or intercepts for each subject, we take into account the global average, and this pulls the individual lines of best fit towards the global one.\n\nThis idea of taking into account the global average when calculating our set of random slopes or intercepts is another key element that helps us decide whether we want to treat a variable as a random effect. Do you want to share information between your categories, or is it better for your research question to keep them separate?\n\n## Exercises\n\n### Exercise 1 - Irrigation\n\n\n{{< level 1 >}}\n\n\n\nThis example uses the `irrigation` dataset. The study is a split-plot design, used for an agricultural trial aimed at maximising crop yield.\n\nTwo crop varieties and four different irrigation methods were tested across eight fields available for the experiment. Only one type of irrigation method can be applied to each field, but the fields are divided into two halves with a different variety of crop planted in each half.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nirrigation <- read_csv(\"data/irrigation.csv\")\n```\n:::\n\n:::\n\nThere are four variables in total:\n\n- `field` ID, f1 through f8\n- `irrigation` method used, i1 through i4\n- `variety` of crop, v1 or v2\n- `yield`, the total crop yield per field\n\nFor this exercise: \n\n1. Visualise the data\n2. Fit a mixed model\n\nDoes it look as if `irrigation` method or crop `variety` are likely to affect `yield`?\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\n#### Visualise the data\n\nThis is quite a small dataset, with only 16 data points. We want to know whether `irrigation`, on the x axis, and/or `variety`, split by colour, affect `yield`; so let's put all of those variables on the same plot:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(irrigation, aes(x = irrigation, y = yield, colour = variety)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-16-1.png){width=672}\n:::\n:::\n\n:::\n\nIt looks as if there could be some differences between `irrigation` levels, but the effect of `variety` looks less clear.\n\nOur data points do all appear to be paired together, and this is almost certainly related to our `field` variable, which we can see if we alter the plot above:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(irrigation, aes(x = irrigation, y = yield, colour = field)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n:::\n\nThe effect of `field`, then, seems quite strong.\n\n#### Fit the model\n\nWe can see from the plots above that we need to consider `field` as an important grouping variable. We'd like to account for variance between fields in our model, but we're not interested in this specific set of fields: so, we'll treat it as a random effect.\n\nWe'll also include fixed effects of `irrigation` and `variety`, as well as their interaction, since these are our predictors of interest.\n\nWe don't have enough observations in this dataset to add random slopes, so we only have random intercepts by field. (If you're curious, have a look at the error message that occurs if you try to fit random slopes for `variety` by `field`; feel free to ask a trainer about it.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_yield <- lmer(yield ~ irrigation*variety + (1|field), data = irrigation)\n\nsummary(lme_yield)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: yield ~ irrigation * variety + (1 | field)\n   Data: irrigation\n\nREML criterion at convergence: 45.4\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-0.7448 -0.5509  0.0000  0.5509  0.7448 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n field    (Intercept) 16.200   4.025   \n Residual              2.107   1.452   \nNumber of obs: 16, groups:  field, 8\n\nFixed effects:\n                       Estimate Std. Error     df t value Pr(>|t|)    \n(Intercept)              38.500      3.026  4.487  12.725 0.000109 ***\nirrigationi2              1.200      4.279  4.487   0.280 0.791591    \nirrigationi3              0.700      4.279  4.487   0.164 0.877156    \nirrigationi4              3.500      4.279  4.487   0.818 0.454584    \nvarietyv2                 0.600      1.452  4.000   0.413 0.700582    \nirrigationi2:varietyv2   -0.400      2.053  4.000  -0.195 0.855020    \nirrigationi3:varietyv2   -0.200      2.053  4.000  -0.097 0.927082    \nirrigationi4:varietyv2    1.200      2.053  4.000   0.584 0.590265    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) irrgt2 irrgt3 irrgt4 vrtyv2 irr2:2 irr3:2\nirrigation2 -0.707                                          \nirrigation3 -0.707  0.500                                   \nirrigation4 -0.707  0.500  0.500                            \nvarietyv2   -0.240  0.170  0.170  0.170                     \nirrgtn2:vr2  0.170 -0.240 -0.120 -0.120 -0.707              \nirrgtn3:vr2  0.170 -0.120 -0.240 -0.120 -0.707  0.500       \nirrgtn4:vr2  0.170 -0.120 -0.120 -0.240 -0.707  0.500  0.500\n```\n:::\n:::\n\n:::\n\nThis output shows us that our global average yield is 38.5 (the Intercept line for the fixed effects results). Relative to this, the variance of our `field` random effect is reasonably big at 16.2. Meanwhile, the differences for each of the different varieties and irrigation methods are all quite small.\n\n#### Visualise the model\n\nSince we're not comparing multiple different models in the same plot, we can be more efficient by putting the augmented model object directly into the first line of our `ggplot` function. Because both of our fixed predictors are categorical variables, we can more easily visualise the model with boxplots than with lines of best fit. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_yield), aes(x = irrigation, y = yield, colour = variety)) +\n  geom_point() +\n  geom_boxplot(aes(y = .fitted, group = paste(variety, irrigation)))\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-19-1.png){width=672}\n:::\n:::\n\n:::\n\nYou might be looking at the above graph, and wonder what impact the random effect of `field` has had on these model predictions. Well, if we tweak the graph a little bit and add the individual predicted values by `variety`, `irrigation` and `field` all at once, we can get a sense of how the predicted values have actually moved closer, or \"shrunk\", towards one another.\n\nAnother way to think about this is: some of the variance in the `yield` response variable, which in a simple linear model would be attributed entirely to our fixed predictors, is being captured instead by the differences between our random fields. So, the final effects of `irrigation * variety` are lessened.\n\nIn the next session of the course, we'll talk about how to check whether these results are significant.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_yield), aes(x = irrigation, y = yield, shape = variety)) +\n  geom_point() +\n  geom_point(aes(y = .fitted, group = paste(field, variety, irrigation), colour = field))\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-20-1.png){width=672}\n:::\n:::\n\n:::\n\n:::\n\n### Exercise 2 - Solutions\n\n\n{{< level 2 >}}\n\n\n\nA lab technician wants to test the purity of their stock of several common solutes. They take multiple samples of each solute, and dissolve them into six common solvents four times each (for a total of 72 solutions).\n\nThe technician wants to know the average dissolving time of each solute across replicates and across solvents, which they can compare against known figures to check the quality of each solute.\n\nRead in the `solutions.csv` dataset. Explore the data and experimental design (including by visualising), and then fit at least one appropriate mixed effects model.\n\n::: {.callout-note collapse=\"true\"}\n#### Hints\n\nThere is no worked answer provided for this exercise, in order to challenge you a little. If, however, you are looking for guidance on what steps to take and which functions to use, you can use the `irrigation` example above as a scaffold.\n\nNote: if you encounter the `boundary (singular) fit: see help('isSingular')` error, this doesn't mean that you've used the `lme4` syntax incorrectly; as we'll discuss later in the course, it means that the model you've fitted is too complex to be supported by the size of the dataset.\n:::\n\n### Exercise 3 - Dragons\n\n\n{{< level 2 >}}\n\n\n\n*The inspiration for this example dataset is taken from an [online tutorial](https://ourcodingclub.github.io/tutorials/mixed-models/) by Gabriela K Hadjuk.*\n\nRead in the `dragons.csv` file, explore these data, then fit, summarise and visualise at least one mixed effects model.\n\nThis is a slightly more complicated dataset, with five different variables:\n\n- `dragon`, which is simply an ID number for each dragon measured; here, each dragon is unique\n- `wingspan`, a measure of the size of the dragon\n- `scales`, a categorical (binary) variable for what colour scales the dragon has\n- `mountain`, a categorical variable representing which mountain range the dragon was found on\n- `intelligence`, our continuous response variable\n\nWe're interested in the relationships between `wingspan`, the colour of `scales` and `intelligence`, but we want to factor in the fact that we have measured these variables across 5 different mountain ranges.\n\nWith more variables, there are more possible models that could be fitted. Think about: what different structures might the fixed and random effects take? How does that change our visualisation?\n\nTry to work through this yourself, before expanding the answer below.\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\nHere, we'll work through how to fit and visualise one possible mixed effects model that could be fitted to these data.\n\nBut, if you fitted models with other sets of fixed/random effects and explored those, well done. We'll talk in the next section of the course about how you can decide between these models to determine which is the best at explaining the data. Right now, it's just the process that matters.\n\n#### Visualise the data\n\nBefore we do anything else, let's have a look at what we're working with:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndragons <- read_csv(\"data/dragons.csv\")\n```\n:::\n\n:::\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-22-1.png){width=672}\n:::\n\n```{.r .cell-code}\nggplot(dragons, aes(x = scales, y = intelligence)) +\n  geom_boxplot()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-22-2.png){width=672}\n:::\n:::\n\n:::\n\nAs a whole, we get the impression that as wingspan increases, so does intelligence. It also looks as if intelligence is slightly higher on average in metallic dragons than in chromatic dragons.\n\nMight there be an interaction between `wingspan` and `scales`? It's hard to tell from our first plot, but it's not impossible. (You could try using the `geom_smooth` function to fit a basic grouped linear regression, if you wanted a clearer idea at this stage.)\n\nNow, let's produce the same plots, but faceted/split by mountain range:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  facet_wrap(vars(mountain)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-23-1.png){width=672}\n:::\n\n```{.r .cell-code}\nggplot(dragons, aes(x = scales, y = intelligence)) +\n  facet_wrap(vars(mountain)) +\n  geom_boxplot()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-23-2.png){width=672}\n:::\n:::\n\n:::\n\nThe broad impression remains the same, but for one thing: the strength of the relationship between `wingspan` and `intelligence` seems to vary across our different facets, i.e. between mountain ranges. \n\nIt's hard to tell whether the relationship between `scales` and `intelligence` also differs across mountain ranges, as this effect is subtler overall.\n\n#### Consider the fixed effects \n\nWe have four options for our fixed effects structure:\n\n- No fixed effects (a random effects only model)\n- A single effect, of either `wingspan` or `scales`\n- An additive model\n- Including both main effects and an interaction\n\nWe'll talk in the next section of the course about how we can compare between different models and determine whether individual predictors are significant or not.\n\nHowever, in this case we want to fit at least an additive fixed effects structure, as the exercise summary indicated that we are interested in whether `scales` and `wingspan` have a bearing on `intelligence`. For this walkthrough, we'll include the interaction term as well.\n\n#### Consider the random effects\n\nThere is only one variable in this dataset that it would be suitable to consider \"random\": `mountain`. And, given how the plots look when we split them by mountain range, it would seem that this is very much something we want to take into account.\n\n(The `wingspan` variable is continuous, and the categorical `scales` variable only contains two levels, making both of these inappropriate/impossible to treat as random variables.)\n\nHowever, as we learned by looking at the `sleepstudy` dataset, we can fit multiple separate random effects, meaning that even with just `mountain` as a clustering variable, we have options!\n\n- Random intercepts, by mountain; `(1|mountain)`\n- Random slopes for `wingspan`, by mountain; `(0 + wingspan|mountain)`\n- Random slopes for `scales`, by mountain; `(0 + scales|mountain)`\n- Random slopes for `wingspan:scales`, by mountain; `(0 + wingspan:scales|mountain)`\n\n::: {.callout-tip}\nThis last option is worth taking a moment to unpack. \n\nAllowing `wingspan:scales` to vary by mountain means that we are asking the model to assume that the strength of the interaction between `wingspan` and `scales` varies between mountain ranges such that the different coefficients for that interaction are drawn from a random distribution.\n\nOr, phrased differently: the strength of the relationship between `wingspan` and `intelligence` depends on `scales` colour, but the degree to which it is dependent on `scales` colour also varies between `mountain` ranges.\n\nThis is biologically plausible! Though, we're dealing with imaginary creatures, so one could facetiously claim that *anything* is biologically plausible...\n:::\n\nAgain, the next section of the course will talk about how we can compare models to decide which predictors (including random effects) are making useful contributions to our model.\n\nIt would be perfectly allowable for you to fit all four of these random effects if you wanted to. The syntax to include them all would be `(1 + wingspan*scales|mountain)`, or written out in full, `(1 + wingspan + scales + wingspan:colour|mountain)`.\n\nFor now, though, we'll just fit the first two random effects (random intercepts, and random slopes for `wingspan`, by `mountain`), to keep things a little simpler.\n\n#### Fit the model\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons <- lmer(intelligence ~ wingspan*scales + (1 + wingspan|mountain), \n                    data=dragons)\nsummary(lme_dragons)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n   Data: dragons\n\nREML criterion at convergence: 1629.3\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.56346 -0.66381  0.04359  0.69979  2.56843 \n\nRandom effects:\n Groups   Name        Variance Std.Dev. Corr\n mountain (Intercept)  10.4730  3.2362      \n          wingspan      0.2629  0.5127  0.09\n Residual             181.4417 13.4700      \nNumber of obs: 200, groups:  mountain, 5\n\nFixed effects:\n                         Estimate Std. Error        df t value Pr(>|t|)    \n(Intercept)              89.28519    3.73223  10.69540  23.923 1.24e-10 ***\nwingspan                  1.00255    0.23620   4.22265   4.244  0.01177 *  \nscalesmetallic           15.67710    4.81498 188.76548   3.256  0.00134 ** \nwingspan:scalesmetallic  -0.09228    0.07976 188.37980  -1.157  0.24878    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) wngspn sclsmt\nwingspan    -0.168              \nscalesmtllc -0.649  0.155       \nwngspn:scls  0.590 -0.167 -0.918\n```\n:::\n:::\n\n:::\n\nThis output looks very similar to what we saw before. The main difference here is that our fixed effect structure is more complex than for the `sleepstudy` dataset - hence, we have two additional rows, for our second main effect and our interaction. (The correlation matrix for our fixed effects, right at the bottom, has also become more complicated.)\n\n#### Visualise the model\n\nWe'll start by building a plot that's faceted by `mountain`, since we know this is a crucial clustering variable. To add our mixed model to the plot, we use the `augment` function from the `broom.mixed` package.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  facet_wrap(vars(mountain)) +\n  geom_point() +\n  \n  # use augment so that we can plot our mixed model\n  geom_line(data = augment(lme_dragons), aes(y = .fitted))\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-25-1.png){width=672}\n:::\n:::\n\n:::\n\nAlternatively (or additionally) we can view all of these lines on a single plot, with a black line representing the global average:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = mountain)) +\n  geom_point() +\n  \n  # plot the mixed model\n  geom_line(data = augment(lme_dragons), aes(y = .fitted, \n                    linetype = scales, group = paste(mountain, scales))) +\n  \n  # add the global average line\n  geom_smooth(method = \"lm\", se = FALSE, colour = \"black\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using formula = 'y ~ x'\n```\n:::\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-26-1.png){width=672}\n:::\n:::\n\n:::\n\n#### Alternative models\n\nWhat happens if we do fit the more complex random effects structures that were mentioned above?\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_complex <- lmer(intelligence ~ wingspan*scales + \n                              (1 + wingspan*scales|mountain), data=dragons)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_dragons_complex)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: intelligence ~ wingspan * scales + (1 + wingspan * scales | mountain)\n   Data: dragons\n\nREML criterion at convergence: 1624.5\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.45624 -0.60041 -0.02585  0.68853  2.57259 \n\nRandom effects:\n Groups   Name                    Variance  Std.Dev. Corr             \n mountain (Intercept)             122.56574 11.0709                   \n          wingspan                  0.29770  0.5456  -0.23            \n          scalesmetallic          156.14278 12.4957  -1.00  0.25      \n          wingspan:scalesmetallic   0.04728  0.2174   0.99 -0.36 -0.99\n Residual                         173.46583 13.1706                   \nNumber of obs: 200, groups:  mountain, 5\n\nFixed effects:\n                        Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)             89.14563    5.99808  4.06220  14.862 0.000108 ***\nwingspan                 1.00264    0.25025  4.00432   4.006 0.016009 *  \nscalesmetallic          16.02787    7.31986  4.48351   2.190 0.086395 .  \nwingspan:scalesmetallic -0.09562    0.12475  4.46561  -0.766 0.481936    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) wngspn sclsmt\nwingspan    -0.303              \nscalesmtllc -0.887  0.281       \nwngspn:scls  0.861 -0.372 -0.962\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThis is the most complex model we could fit, with all the possible fixed and random effects included. You'll notice that you encounter an error, telling you that you have singular fit.\n\nOur dataset is likely too small to support so many random effects; 200 might sound large, but in the context of a mixed effects model, it unfortunately is not.\n\nYou might also notice in the model summary that the estimated variance for the random slopes of `wingspan:scales` is also very small. This is a decent indication that this random effect probably isn't useful in this model, probably because this effect isn't actually occurring in our underlying dragon population.\n\n:::\n\n::: {.callout-tip appearance=\"minimal\"}\n#### Bonus questions\n\n\n{{< level 3 >}}\n\n\n\nFor those who want to push their understanding a bit further, here's a few additional things to think about. We won't give the answers here, but if you're interested, call a trainer over to chat about them more.\n\n- How could you adapt the code above to visualise a mixed effects model that did not include `scales` as a fixed predictor?\n- How much shrinkage do you observe for the lines of best fit in the `dragons` dataset? Is this more or less than in the `sleepstudy` dataset? Why might this be?\n- What syntax would you use in `lme4` to fit a model with the following equation to the dragons dataset?\n\n::: {.callout-note collapse=\"true\"}\n#### Model equation\n\nLevel 1:\n\n$$\ny_{ij} = \\beta_{0j} + \\beta_{1j}x_{1ij} + \\beta_{2j}x_{2ij} + \\beta_3x_{1ij}x_{2ij} + \\epsilon_{ij}\n$$\n\nLevel 2:\n\n$$\n\\beta_{0j} = \\gamma_{00} + U_{0j}\n$$\n$$\n\\beta_{1j} = \\gamma_{10} + U_{1j}\n$$\n$$\n\\beta_{2j} = \\gamma_{20} + U_{2j}\n$$\n\nand,\n\n$$\n\\left( \\begin{array}{c} U_{0j} \\\\ U_{1j} \\\\ U_{2j} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\\\ 0 \\\\ 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} & \\rho_{01} & \\rho_{02} \\\\ \\rho_{01} &  \\tau^2_{10} & \\rho_{12} \\\\ \\rho_{02} & \\rho_{12} & \\tau^2_{20} \\end{array} \\right)\n$$\n\nWhere $y$ is `intelligence`, $x_1$ is `wingspan`, $x_2$ is `scales`, $j$ represents mountain ranges and $i$ represents individual dragons within those mountain ranges.\n\n:::\n\n:::\n\n## Summary\n\nThis section of the course is designed to introduce the syntax required for fitting two-level mixed models in R, including both random intercepts and random slopes, and how we can visualise the resulting models.\n\nLater sections will address significance testing and assumption checking, as well as how to fit more complex mixed models.\n\n::: {.callout-tip}\n#### Key points\n- Mixed effects models can be fitted using the `lme4` package in R, which extends the linear model by introducing specialised syntax for random effects\n- For random intercepts, we use the format `(1|B)`, where B is our grouping variable\n- For random intercepts with random slopes, we use the format `(1 + A|B)`, where we allow the slope of A as well as the intercept to vary between levels of B\n- For random slopes only, we use `(0 + A|B)`, which gives random slopes for A without random intercepts\n- Random effects are fitted using partial pooling, which results in the phenomenon of \"shrinkage\"\n:::\n\n",
+    "markdown": "---\ntitle: \"Fitting mixed models\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nThe course materials so far have discussed the motivation behind mixed effects models, and why we might choose to include random effects.\n\nIn this section, we will learn how to fit these models in R, and how to visualise the results.\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll be using the `lme4` package in R, which is by far the most common and best choice of package for this type of model. (It's an update of the older package `nlme`, which you might also see people using.) The syntax is nice and simple and extends what we've been doing so far with the `lm()` function in (hopefully!) a very intuitive way. \n\nThe package also contains functions for fitting non-linear mixed effects and generalised mixed effects models - though we won't be focusing on those here, it's nice to know that the package can handle them in case you ever choose to explore them in future!\n\nFor Python users, the `pymer4` package in Python allows you to \"borrow\" most of the functionality of R's `lme4`, though it still has many bugs that make it difficult to run on any system except Linux. There is also some functionality for fitting mixed models using `statsmodels` in Python. We won't be using those packages here, but you may wish to explore them if you are a die-hard Python user!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# load the required packages for fitting & visualising\nlibrary(tidyverse)\nlibrary(lme4)\nlibrary(broom)\nlibrary(broom.mixed)\nlibrary(patchwork)\n```\n:::\n\n:::\n\n## The sleepstudy data\n\nWe'll be using the internal `sleepstudy` dataset from the `lme4` package in R as an example (this dataset is also provided as a `.csv` file, if you'd prefer to read it in or are using Python).\n\nThis is a simple dataset taken from a real study that investigated the effects of sleep deprivation on reaction times in 18 subjects, and has just three variables: \n\n- `Reaction`, reaction time in milliseconds\n- `Days`, number of days of sleep deprivation\n- `Subject`, subject ID\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"sleepstudy\")\n```\n:::\n\n:::\n\nHave a look at the data more closely. You'll notice that for each subject, we've got 10 measurements, one for each day of sleep deprivation. This repeated measurement means that our data are not independent of one another; for each subject in the study we would expect measurements of reaction times to be more similar to one another than they are to reaction times of another subject.\n\nLet's start by doing something that we know is wrong, and ignoring this dependence for now. We'll begin by visualising the data with a simple scatterplot.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  geom_smooth(method = \"lm\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n:::\n\nThis gives the overall impression that we might expect - reaction time does seem to slow as people become more sleep deprived.\n\nBut, as we've already pointed out, ignoring the fact that subjects' own reaction times will be more similar to themselves than to another subject's, we should make a point of accounting for this.\n\n## Adding random intercepts\n\nIn this dataset, we want to treat `Subject` as a random effect, which means fitting a mixed effects model. Why `Subject`? There are two things at play here that make us what to treat this as a random effect:\n\n1. `Subject` is a *grouping* variable within our dataset, and is causing us problems with independence.\n2. It's not these specific 18 subjects that we're interested in - they instead represent 18 random selections from a broader distribution/population of subjects that we could have tested. We would like to generalise our findings to this broader population.\n\nTo fit the model, we use a different function to what we've used so far, but the syntax looks very similar. The difference is the addition of a new term `(1|Subject)`, which represents our random effect.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\n# construct a linear mixed effects model with Subject\n# as a random effect\nlme_sleep1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)\n\n# summarise the model\nsummary(lme_sleep1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (1 | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1786.5\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.2257 -0.5529  0.0109  0.5188  4.2506 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n Subject  (Intercept) 1378.2   37.12   \n Residual              960.5   30.99   \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept) 251.4051     9.7467  22.8102   25.79   <2e-16 ***\nDays         10.4673     0.8042 161.0000   13.02   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.371\n```\n:::\n:::\n\n:::\n\nOkay. The syntax might have looked similar to a standard linear model, but the output does not.\n\nIn later sections of the course, we'll discuss how to test significance based on this sort of output. In the meantime, however, to help get our head around the model we've fitted, we're going to visualise it.\n\nHere, we'll make use of the `broom` and `broom.mixed` packages to extract fitted values from the models - the `augment` function essentially creates a dataframe that contains both the raw data and the fitted values (along with residuals and other useful values), which helps a lot in plotting.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\n# create a linear model - we'll use this in our graph\nlm_sleep <- lm(Reaction ~ Days, data = sleepstudy)\n\n# set up our basic plot\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  \n  # create separate plots for each subject in the sample\n  # and add the data points\n  facet_wrap(facets = vars(Subject), nrow = 3) +\n  geom_point() +\n  \n  # this adds the line of best fit for the whole sample\n  # (without the random effect), using coefficients\n  # from our simple linear model object\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) + \n  \n  # and finally, this will add different lines of best fit\n  # for each subject as calculated in our mixed model object\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted), colour = \"blue\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n:::\n\nEach plot represents a different subject's data. On each plot, we've added the following:\n\n* in black we have the same overall line of best fit from our original (incorrect) linear model.\n* in blue are the individual lines of best fit for each subject. These lines move up and down the plot relative to the global line of best fit. This reflects the fact that, though all subjects are declining as they become more sleep deprived, some of them started with slower baseline reaction times, with different y-intercepts to match. Subject 310, for instance, seems to have pretty good reflexes relative to everyone else, while subject 337 isn't quite as quick on the trigger.\n\nWe can visualise the same model slightly differently, to allow us to look at the set of lines of best fit together. Here, we will create a plot that doesn't have facets (but still shows us the same model predictions):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  \n  # include the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) +\n  \n  # include individual lines of best fit\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted, group = Subject), \n            colour = \"blue\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\n:::\n\nThe global line of best fit is in the middle (in black), with each of the individual subject lines of best fit around it.\n\nFrom this plot, we can see that the *gradient* of each of these blue lines is still the same as the overall line of best fit. This is because we've added a random intercept in our model, but have **kept the same slope**. \n\nThis reflects an underlying assumption that the relationship between sleep deprivation and reaction time is the same - i.e. that people get worse at the same rate - even if their starting baselines differ.\n\nWe might not think that this assumption is a good one, however. And that's where random slopes come in.\n\n## Adding random slopes\n\nTo add a random slope as well as a random intercept, we need to alter the syntax slightly for our random effect. Now, instead of `(1|Subject)`, we'll instead use `(1 + Days|Subject)`. This allows the relationship between `Days` and `Reaction` to vary between subjects.\n\nLet's fit that new model and summarise it.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_sleep2 <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)\n\nsummary(lme_sleep2)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (1 + Days | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1743.6\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.9536 -0.4634  0.0231  0.4634  5.1793 \n\nRandom effects:\n Groups   Name        Variance Std.Dev. Corr\n Subject  (Intercept) 612.10   24.741       \n          Days         35.07    5.922   0.07\n Residual             654.94   25.592       \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***\nDays          10.467      1.546  17.000   6.771 3.26e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.138\n```\n:::\n:::\n\n:::\n\nWe can go ahead and add our new lines (in red) to our earlier facet plot. Only the last line of code is new here:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  facet_wrap(facets = vars(Subject), nrow = 3) +\n  geom_point() +\n  \n  # the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) + \n  \n  # our previous lines of best fit, with random intercepts\n  # but constant slope\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted), colour = \"blue\") +\n  \n  # our lines of best with random intercepts and random slopes\n  geom_line(data = augment(lme_sleep2), aes(y = .fitted), colour = \"red\") \n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-9-1.png){width=672}\n:::\n:::\n\n:::\n\nWhile for some of our subjects, the red, blue and black lines look quite similar, for others they diverge a fair amount. Subjects 309 and 335, for instance, are displaying a remarkably flat trend that suggests they're not really suffering delays in reaction time from their sleep deprivation very much at all, while subject 308 definitely seems to struggle without their eight hours.\n\nLet's compare those different red lines, representing our random intercepts & slopes model, on a single plot. This is the same code as we used a couple of plots ago, except the last line is now different:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  \n  # include the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) +\n\n  # include individual lines of best fit\n  geom_line(data = augment(lme_sleep2), aes(y = .fitted, group = Subject), \n            colour = \"red\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n:::\n\n:::\n\nVisualising all of our lines of best fit simultaneously like this makes it clearer what it means to have both random intercepts and random slopes. Each line of best fit starts in a slightly different place, and also has a different gradient.\n\n### Fitting random slopes without random intercepts\n\nIt's quite unusual to fit a model with random slopes but without random intercepts - but it's absolutely possible.\n\nThe `lme4` package includes \"implicit random intercepts\", meaning that we don't actually need to specify the 1 in our random effects structure for random intercepts to be fitted. \n\nTry running the following, and compare the two outputs - these models are identical:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_explicit <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)\nlme_implicit <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)\n\nsummary(lme_explicit)\nsummary(lme_implicit)\n```\n:::\n\n:::\n\nIf we were determined to remove the random intercepts, we have to explicitly tell `lme4` not to fit them, like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_slopesonly <- lmer(Reaction ~ Days + (0 + Days|Subject), data = sleepstudy)\n\nsummary(lme_slopesonly)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (0 + Days | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1766.5\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.5104 -0.5588  0.0541  0.6244  4.6022 \n\nRandom effects:\n Groups   Name Variance Std.Dev.\n Subject  Days  52.71    7.26   \n Residual      842.03   29.02   \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error     df t value Pr(>|t|)    \n(Intercept)   251.41       4.02 161.00  62.539  < 2e-16 ***\nDays           10.47       1.87  21.68   5.599 1.32e-05 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.340\n```\n:::\n:::\n\n:::\n\nYou should see that the random intercepts have now disappeared from the output.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  geom_point() +\n  \n  # include the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) +\n  \n  # include individual lines of best fit\n  geom_line(data = augment(lme_slopesonly), aes(y = .fitted, group = Subject), \n            colour = \"purple\")\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n:::\n\nIndeed, looking at each of the lines of best fit here, we can see that they all have the same intercept (i.e., the same value of y when x = 0), but with differing slopes.\n\n## Two-level models\n\nAll of the mixed models we've fitted to these data so far (with random slopes and/or random intercepts) can be described as two-level models.\n\nA standard linear model would be a one-level model, where we have true independence and no clustering/grouping variables.\n\nBut for this dataset, the `Subject` variable creates clusters, so we have a different set of `Reaction` times for each `Subject`. Whether we choose to fit random intercepts, slopes, or both, this overall structure between the variables remains the same, creating a hierarchy with two levels. Hence, a two-level model!\n\nLater in the course, we will look at more complicated models, where we have multiple clustering variables that we want to generate random effects for, due to more complex experimental designs.\n\n### Equations & notation\n\nFor those who are interested in notation and equations, the drop-down box below gives a little more detail on how this works for a linear mixed effects model. \n\nThis subsection skews a bit more in the maths direction, and won't be needed by everyone who uses mixed models in their research. But, it's included here as bonus material for anyone who finds equations helpful, or for those who might need this for reporting on and reading about mixed models!\n\n::: {.callout-note collapse=\"true\"}\n#### Linear mixed models notation\n\nFor the `sleepstudy` dataset, a standard linear model `Reaction ~ Days` would be written in the format:\n\n$$\ny = \\beta_{0} + \\beta_{1}x_{1} + \\epsilon\n$$\n\nThe $x$ variable here is, of course, `Days`, and $y$ is our response variable `Reaction`.\n\nIn this equation, $\\beta_{0}$ represents the intercept, and $\\beta_{1}$ represents the slope or gradient. Each of these is either a single fixed number, or, in the case of a categorical predictor, a set of fixed means for the groups. \n\nThe $\\epsilon$ at the end represents our error, or noise. In the case of linear model, we measure this by calculating the residuals. As you already know from standard linear models, we assume that these residuals are random and normally distributed. So, we could additionally note that:\n\n$$\n\\epsilon ∼ N(0, \\sigma^2)\n$$\n\nThis is just fancy shorthand for: \"the errors are drawn from a normal distribution, which has a mean of 0 and variance $\\sigma^2$\". This variance is something we need to estimate, in order to perform our regression analysis.\n\n#### Random intercepts model\n\nWhen we add random effects to deal with the clustering variable `Subject`, however, we are doing more than just estimating a fixed mean or coefficient.\n\nThat's because we're actually estimating a *distribution* of coefficients whenever we estimate a random effect. \n\nSo, when we include random intercepts in our model `Reaction ~ Days + (1|Subject)`, we are not just estimating three numbers. We estimate an intercept for each `Subject` in the dataset. And, we are assuming that those intercepts have been drawn from a normal distribution with mean 0 - this is a baked-in assumption of a linear mixed model (more on assumptions in a later section).\n\nFor this model, the equation for our model is now written like this:\n\n$$\ny_{ij} = \\beta_{0j} + \\beta_{1}x_{ij} + \\epsilon_{ij}\n$$\n\nWhere have these extra subscript letters come from?\n\nWell, previously we didn't bother with this, because a standard linear model only has one level. Now, we have a two-level model, so we use $i$ and $j$ to refer to those different levels.\n\nHere, $j$ would represent the different levels of our clustering variable `Subject`. The letter $i$ then represents the set of values within each cluster $j$. So, $ij$ in our subscripts refers to our entire set of response/outcome values `Reaction`, which here are measured at the level of individual `Days` within each `Subject`.\n\nThe term $\\beta_{0j}$ tells us that we have random intercepts. For each of our $j$ clusters, there is a separate $\\beta_{0}$. You will sometimes see a random effect broken down further, like this:\n\n$$\n\\beta_{0j} = \\gamma_{00} + U_{0j}\n$$\n\nHere, the $\\gamma_{00}$ refers to the \"grand intercept\", i.e., the average intercept across all groups. This is a fixed effect, one single value that doesn't change, and we need to estimate it in order to be able to then estimate $U_{0j}$. It's conventional - though not compulsory - to use $\\gamma$ to represent fixed/global coefficients like this.\n\nThe $U_{0j}$ bit then refers to the set of deviations from that grand intercept, one for each of your clusters/groups. These deviations should be normally distributed with mean 0 and variance $\\tau^2_{00}$. Again, it's conventional to use $\\tau^2$ to refer to the variance of random effects specifically (rather than $\\sigma^2$, which we used for the variance of our residuals). You will sometimes see people use letters other than $U$ to refer to the set of deviations/coefficients, especially when there are more than two levels in the model (more on that in a later section.)\n\n$$\nU_{0j} ∼ N(0, \\tau^2_{00})\n$$\n\nOnce again, we also assume that our errors $\\epsilon_{ij}$ are normally distributed around 0 as well, just as we did with the standard linear model.\n\n#### Random intercepts & random slopes model\n\nNow let's look at what happens when we add a second random effect, as in the model `Reaction ~ Days + (1 + Days|Subject)`. The equation now looks like this.\n\nLevel 1:\n\n$$\ny_{ij} = \\beta_{0j} + \\beta_{1j}x_{ij} + \\epsilon_{ij}\n$$\n\nLevel 2:\n\n$$\n\\beta_{0j} = \\gamma_{00} + U_{0j}\n$$\n$$\n\\beta_{1j} = \\gamma_{10} + U_{1j}\n$$\n\nwhere,\n\n$$\n\\left( \\begin{array}{c} U_{0j} \\\\ U_{1j} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\\\ 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} & \\rho_{01} \\\\ \\rho_{01} &  \\tau^2_{10} \\end{array} \\right)\n$$\n\nWe now have two random effects instead of one. We can tell this because we're now writing $\\beta_{1j}$ and specifying an additional equation for it, instead of just writing $\\beta_{1}$ for a single fixed value of the slope.\n\nAdmittedly, that last bit looks more complicated than before. We won't go into too much detail, but what's happening on the right is known as a \"variance-covariance\" matrix. When you include multiple random effects in a mixed model, the correlations between those random effects are also estimated. So we actually make assumptions about the joint distribution that all of the random effects are being drawn from. If this statement alone doesn't satisfy your curiosity, you might find [this link](https://rpubs.com/yjunechoe/correlationsLMEM) a useful resource with some handy visualisations of how this works!\n\nIf that's a bit more complicated than you're interested in, don't worry. You don't need to understand all that maths to be able to used a mixed effects model. It boils down to the same thing: that random effects are a set of coefficients with some variance, and we make assumptions about their distribution(s).\n\n#### A helpful summary\n\nThis table summarises and defines each of the terms included in the equation(s) above.\n\n| Parameter | Description |\n|:-|:-----|\n|$y_{ij}$|Response/outcome; value of `Reaction` for subject $j$ on day $i$|\n|$x_{ij}$|Predictor; value of `Days` for subject $j$ on day $i$|\n|$\\beta_{0j}$|Level 1 intercept parameter, containing a fixed and a random effect|\n|$\\gamma_{00}$|Fixed effect; grand (average) intercept|\n|$U_{0j}$|Random effect; deviation from grand intercept for subject $j$|\n|$\\beta_{1j}$|Level 1 slope parameter, containing a fixed and a random effect|\n|$\\gamma_{10}$|Fixed effect; grand (average) slope|\n|$U_{1j}$|Random effect; deviation from grand slope for subject $j$|\n|$\\epsilon_{ij}$|Error/residual (difference between real value and predicted value) of `Reaction` for subject $j$ on day $i$|\n|$\\tau^2_{00}$|Variance of random intercepts $U_{0j}$|\n|$\\tau^2_{10}$|Variance of random slopes $U_{1j}$|\n|$\\rho_{01}$|Covariance between random effects $U_{0j}$ and $U_{1j}$|\n\n:::\n\n### Sharing information\n\nFinally, while we're working with the `sleepstudy` dataset, let's take the opportunity to visualise something else that's special about random effects (which we'll discuss more later in the course): sharing information between levels.\n\nAs an extra observation, let's use `geom_smooth` to add the lines of best fit that we would see if we fitted each subject with their own individual regression:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(sleepstudy, aes(x = Days, y = Reaction)) +\n  facet_wrap(facets = vars(Subject), nrow = 3) +\n  geom_point() +\n  \n  # the global line of best fit\n  geom_line(data = augment(lm_sleep), aes(y = .fitted)) + \n  \n  # random intercepts only\n  geom_line(data = augment(lme_sleep1), aes(y = .fitted), colour = \"blue\") +\n  \n  # random intercepts and random slopes\n  geom_line(data = augment(lme_sleep2), aes(y = .fitted), colour = \"red\") +\n  \n  # individual regression lines for each individual\n  geom_smooth(method = \"lm\", se = FALSE, colour = \"green\", linewidth = 0.5)\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-14-1.png){width=672}\n:::\n:::\n\n:::\n\nHere, the black line (which is the same on every plot) represents a global line of best fit - this is what we would see if we ignored the `Subject` variable entirely and just did a simple linear regression. This is called **complete pooling**.\n\nThe green lines, meanwhile, represent what happens when we split our dataset into separate groups by `Subject`, and fit individual regressions between `Reaction` and `Days` that are completely independent of each other. This is called **no pooling**, i.e., treating `Subject` as a fixed effect.\n\nThe blue and red lines represent our mixed effects models - the difference between the two is whether we allowed the slope to vary randomly, as well as the random intercept. In both cases, we are using something called **partial pooling**. \n\nComparing the green and red lines in particular allows us to see the phenomenon of \"shrinkage\", which occurs because of partial pooling. \n\nThe red lines are all closer to the black line than the green line is. In other words, the predictions for our mixed effects model are more similar to the global line of best fit, than the individual regression lines are to that global line. We say that the red lines (our mixed model) are showing some shrinkage towards the global line; Subjects 330, 335 and 370 perhaps show this best. \n\nThis happens because, when random effects are estimated, information is shared between the different levels of the random effect (in this case, between subjects). Though we still estimate separate slopes and/or intercepts for each subject, we take into account the global average, and this pulls the individual lines of best fit towards the global one.\n\nThis idea of taking into account the global average when calculating our set of random slopes or intercepts is another key element that helps us decide whether we want to treat a variable as a random effect. Do you want to share information between your categories, or is it better for your research question to keep them separate?\n\n## Exercises\n\n### Irrigation {#sec-exr_irrigation}\n\n::: {.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nThis example uses the `irrigation` dataset. The study is a split-plot design, used for an agricultural trial aimed at maximising crop yield.\n\nTwo crop varieties and four different irrigation methods were tested across eight fields available for the experiment. Only one type of irrigation method can be applied to each field, but the fields are divided into two halves with a different variety of crop planted in each half.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nirrigation <- read_csv(\"data/irrigation.csv\")\n```\n:::\n\n:::\n\nThere are four variables in total:\n\n- `field` ID, f1 through f8\n- `irrigation` method used, i1 through i4\n- `variety` of crop, v1 or v2\n- `yield`, the total crop yield per field\n\nFor this exercise: \n\n1. Visualise the data\n2. Fit a mixed model\n\nDoes it look as if `irrigation` method or crop `variety` are likely to affect `yield`?\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\n#### Visualise the data\n\nThis is quite a small dataset, with only 16 data points. We want to know whether `irrigation`, on the x axis, and/or `variety`, split by colour, affect `yield`; so let's put all of those variables on the same plot:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(irrigation, aes(x = irrigation, y = yield, colour = variety)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-16-1.png){width=672}\n:::\n:::\n\n:::\n\nIt looks as if there could be some differences between `irrigation` levels, but the effect of `variety` looks less clear.\n\nOur data points do all appear to be paired together, and this is almost certainly related to our `field` variable, which we can see if we alter the plot above:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(irrigation, aes(x = irrigation, y = yield, colour = field)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n:::\n\nThe effect of `field`, then, seems quite strong.\n\n#### Fit the model\n\nWe can see from the plots above that we need to consider `field` as an important grouping variable. We'd like to account for variance between fields in our model, but we're not interested in this specific set of fields: so, we'll treat it as a random effect.\n\nWe'll also include fixed effects of `irrigation` and `variety`, as well as their interaction, since these are our predictors of interest.\n\nWe don't have enough observations in this dataset to add random slopes, so we only have random intercepts by field. (If you're curious, have a look at the error message that occurs if you try to fit random slopes for `variety` by `field`; feel free to ask a trainer about it.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_yield <- lmer(yield ~ irrigation*variety + (1|field), data = irrigation)\n\nsummary(lme_yield)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: yield ~ irrigation * variety + (1 | field)\n   Data: irrigation\n\nREML criterion at convergence: 45.4\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-0.7448 -0.5509  0.0000  0.5509  0.7448 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n field    (Intercept) 16.200   4.025   \n Residual              2.107   1.452   \nNumber of obs: 16, groups:  field, 8\n\nFixed effects:\n                       Estimate Std. Error     df t value Pr(>|t|)    \n(Intercept)              38.500      3.026  4.487  12.725 0.000109 ***\nirrigationi2              1.200      4.279  4.487   0.280 0.791591    \nirrigationi3              0.700      4.279  4.487   0.164 0.877156    \nirrigationi4              3.500      4.279  4.487   0.818 0.454584    \nvarietyv2                 0.600      1.452  4.000   0.413 0.700582    \nirrigationi2:varietyv2   -0.400      2.053  4.000  -0.195 0.855020    \nirrigationi3:varietyv2   -0.200      2.053  4.000  -0.097 0.927082    \nirrigationi4:varietyv2    1.200      2.053  4.000   0.584 0.590265    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) irrgt2 irrgt3 irrgt4 vrtyv2 irr2:2 irr3:2\nirrigation2 -0.707                                          \nirrigation3 -0.707  0.500                                   \nirrigation4 -0.707  0.500  0.500                            \nvarietyv2   -0.240  0.170  0.170  0.170                     \nirrgtn2:vr2  0.170 -0.240 -0.120 -0.120 -0.707              \nirrgtn3:vr2  0.170 -0.120 -0.240 -0.120 -0.707  0.500       \nirrgtn4:vr2  0.170 -0.120 -0.120 -0.240 -0.707  0.500  0.500\n```\n:::\n:::\n\n:::\n\nThis output shows us that our global average yield is 38.5 (the Intercept line for the fixed effects results). Relative to this, the variance of our `field` random effect is reasonably big at 16.2. Meanwhile, the differences for each of the different varieties and irrigation methods are all quite small.\n\n#### Visualise the model\n\nSince we're not comparing multiple different models in the same plot, we can be more efficient by putting the augmented model object directly into the first line of our `ggplot` function. Because both of our fixed predictors are categorical variables, we can more easily visualise the model with boxplots than with lines of best fit. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_yield), aes(x = irrigation, y = yield, colour = variety)) +\n  geom_point() +\n  geom_boxplot(aes(y = .fitted, group = paste(variety, irrigation)))\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-19-1.png){width=672}\n:::\n:::\n\n:::\n\nYou might be looking at the above graph, and wonder what impact the random effect of `field` has had on these model predictions. Well, if we tweak the graph a little bit and add the individual predicted values by `variety`, `irrigation` and `field` all at once, we can get a sense of how the predicted values have actually moved closer, or \"shrunk\", towards one another.\n\nAnother way to think about this is: some of the variance in the `yield` response variable, which in a simple linear model would be attributed entirely to our fixed predictors, is being captured instead by the differences between our random fields. So, the final effects of `irrigation * variety` are lessened.\n\nIn the next session of the course, we'll talk about how to check whether these results are significant.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_yield), aes(x = irrigation, y = yield, shape = variety)) +\n  geom_point() +\n  geom_point(aes(y = .fitted, group = paste(field, variety, irrigation), colour = field))\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-20-1.png){width=672}\n:::\n:::\n\n:::\n\n:::\n\n:::\n\n### Solutions {#sec-exr_solutions}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nA lab technician wants to test the purity of their stock of several common solutes. They take multiple samples of each solute, and dissolve them into six common solvents four times each (for a total of 72 solutions).\n\nThe technician wants to know the average dissolving time of each solute across replicates and across solvents, which they can compare against known figures to check the quality of each solute.\n\nRead in the `solutions.csv` dataset. Explore the data and experimental design (including by visualising), and then fit at least one appropriate mixed effects model.\n\n::: {.callout-note collapse=\"true\"}\n#### Hints\n\nThere is no worked answer provided for this exercise, in order to challenge you a little. If, however, you are looking for guidance on what steps to take and which functions to use, you can use the `irrigation` example above as a scaffold.\n\nNote: if you encounter the `boundary (singular) fit: see help('isSingular')` error, this doesn't mean that you've used the `lme4` syntax incorrectly; as we'll discuss later in the course, it means that the model you've fitted is too complex to be supported by the size of the dataset.\n:::\n\n:::\n\n### Dragons {#sec-exr_dragons}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\n*The inspiration for this example dataset is taken from an [online tutorial](https://ourcodingclub.github.io/tutorials/mixed-models/) by Gabriela K Hadjuk.*\n\nRead in the `dragons.csv` file, explore these data, then fit, summarise and visualise at least one mixed effects model.\n\nThis is a slightly more complicated dataset, with five different variables:\n\n- `dragon`, which is simply an ID number for each dragon measured; here, each dragon is unique\n- `wingspan`, a measure of the size of the dragon\n- `scales`, a categorical (binary) variable for what colour scales the dragon has\n- `mountain`, a categorical variable representing which mountain range the dragon was found on\n- `intelligence`, our continuous response variable\n\nWe're interested in the relationships between `wingspan`, the colour of `scales` and `intelligence`, but we want to factor in the fact that we have measured these variables across 5 different mountain ranges.\n\nWith more variables, there are more possible models that could be fitted. Think about: what different structures might the fixed and random effects take? How does that change our visualisation?\n\nTry to work through this yourself, before expanding the answer below.\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\nHere, we'll work through how to fit and visualise one possible mixed effects model that could be fitted to these data.\n\nBut, if you fitted models with other sets of fixed/random effects and explored those, well done. We'll talk in the next section of the course about how you can decide between these models to determine which is the best at explaining the data. Right now, it's just the process that matters.\n\n#### Visualise the data\n\nBefore we do anything else, let's have a look at what we're working with:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndragons <- read_csv(\"data/dragons.csv\")\n```\n:::\n\n:::\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-22-1.png){width=672}\n:::\n\n```{.r .cell-code}\nggplot(dragons, aes(x = scales, y = intelligence)) +\n  geom_boxplot()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-22-2.png){width=672}\n:::\n:::\n\n:::\n\nAs a whole, we get the impression that as wingspan increases, so does intelligence. It also looks as if intelligence is slightly higher on average in metallic dragons than in chromatic dragons.\n\nMight there be an interaction between `wingspan` and `scales`? It's hard to tell from our first plot, but it's not impossible. (You could try using the `geom_smooth` function to fit a basic grouped linear regression, if you wanted a clearer idea at this stage.)\n\nNow, let's produce the same plots, but faceted/split by mountain range:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  facet_wrap(vars(mountain)) +\n  geom_point()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-23-1.png){width=672}\n:::\n\n```{.r .cell-code}\nggplot(dragons, aes(x = scales, y = intelligence)) +\n  facet_wrap(vars(mountain)) +\n  geom_boxplot()\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-23-2.png){width=672}\n:::\n:::\n\n:::\n\nThe broad impression remains the same, but for one thing: the strength of the relationship between `wingspan` and `intelligence` seems to vary across our different facets, i.e. between mountain ranges. \n\nIt's hard to tell whether the relationship between `scales` and `intelligence` also differs across mountain ranges, as this effect is subtler overall.\n\n#### Consider the fixed effects \n\nWe have four options for our fixed effects structure:\n\n- No fixed effects (a random effects only model)\n- A single effect, of either `wingspan` or `scales`\n- An additive model\n- Including both main effects and an interaction\n\nWe'll talk in the next section of the course about how we can compare between different models and determine whether individual predictors are significant or not.\n\nHowever, in this case we want to fit at least an additive fixed effects structure, as the exercise summary indicated that we are interested in whether `scales` and `wingspan` have a bearing on `intelligence`. For this walkthrough, we'll include the interaction term as well.\n\n#### Consider the random effects\n\nThere is only one variable in this dataset that it would be suitable to consider \"random\": `mountain`. And, given how the plots look when we split them by mountain range, it would seem that this is very much something we want to take into account.\n\n(The `wingspan` variable is continuous, and the categorical `scales` variable only contains two levels, making both of these inappropriate/impossible to treat as random variables.)\n\nHowever, as we learned by looking at the `sleepstudy` dataset, we can fit multiple separate random effects, meaning that even with just `mountain` as a clustering variable, we have options!\n\n- Random intercepts, by mountain; `(1|mountain)`\n- Random slopes for `wingspan`, by mountain; `(0 + wingspan|mountain)`\n- Random slopes for `scales`, by mountain; `(0 + scales|mountain)`\n- Random slopes for `wingspan:scales`, by mountain; `(0 + wingspan:scales|mountain)`\n\n::: {.callout-tip}\nThis last option is worth taking a moment to unpack. \n\nAllowing `wingspan:scales` to vary by mountain means that we are asking the model to assume that the strength of the interaction between `wingspan` and `scales` varies between mountain ranges such that the different coefficients for that interaction are drawn from a random distribution.\n\nOr, phrased differently: the strength of the relationship between `wingspan` and `intelligence` depends on `scales` colour, but the degree to which it is dependent on `scales` colour also varies between `mountain` ranges.\n\nThis is biologically plausible! Though, we're dealing with imaginary creatures, so one could facetiously claim that *anything* is biologically plausible...\n:::\n\nAgain, the next section of the course will talk about how we can compare models to decide which predictors (including random effects) are making useful contributions to our model.\n\nIt would be perfectly allowable for you to fit all four of these random effects if you wanted to. The syntax to include them all would be `(1 + wingspan*scales|mountain)`, or written out in full, `(1 + wingspan + scales + wingspan:colour|mountain)`.\n\nFor now, though, we'll just fit the first two random effects (random intercepts, and random slopes for `wingspan`, by `mountain`), to keep things a little simpler.\n\n#### Fit the model\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons <- lmer(intelligence ~ wingspan*scales + (1 + wingspan|mountain), \n                    data=dragons)\nsummary(lme_dragons)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n   Data: dragons\n\nREML criterion at convergence: 1629.3\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-2.5634 -0.6638  0.0436  0.6998  2.5684 \n\nRandom effects:\n Groups   Name        Variance Std.Dev. Corr\n mountain (Intercept)  10.4742  3.2364      \n          wingspan      0.2629  0.5127  0.09\n Residual             181.4419 13.4700      \nNumber of obs: 200, groups:  mountain, 5\n\nFixed effects:\n                         Estimate Std. Error        df t value Pr(>|t|)    \n(Intercept)              89.28523    3.73227  10.69604  23.923 1.24e-10 ***\nwingspan                  1.00255    0.23619   4.22302   4.245  0.01176 *  \nscalesmetallic           15.67707    4.81498 188.76573   3.256  0.00134 ** \nwingspan:scalesmetallic  -0.09228    0.07976 188.38010  -1.157  0.24878    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) wngspn sclsmt\nwingspan    -0.168              \nscalesmtllc -0.649  0.155       \nwngspn:scls  0.590 -0.167 -0.918\n```\n:::\n:::\n\n:::\n\nThis output looks very similar to what we saw before. The main difference here is that our fixed effect structure is more complex than for the `sleepstudy` dataset - hence, we have two additional rows, for our second main effect and our interaction. (The correlation matrix for our fixed effects, right at the bottom, has also become more complicated.)\n\n#### Visualise the model\n\nWe'll start by building a plot that's faceted by `mountain`, since we know this is a crucial clustering variable. To add our mixed model to the plot, we use the `augment` function from the `broom.mixed` package.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  facet_wrap(vars(mountain)) +\n  geom_point() +\n  \n  # use augment so that we can plot our mixed model\n  geom_line(data = augment(lme_dragons), aes(y = .fitted))\n```\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-25-1.png){width=672}\n:::\n:::\n\n:::\n\nAlternatively (or additionally) we can view all of these lines on a single plot, with a black line representing the global average:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = mountain)) +\n  geom_point() +\n  \n  # plot the mixed model\n  geom_line(data = augment(lme_dragons), aes(y = .fitted, \n                    linetype = scales, group = paste(mountain, scales))) +\n  \n  # add the global average line\n  geom_smooth(method = \"lm\", se = FALSE, colour = \"black\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`geom_smooth()` using formula = 'y ~ x'\n```\n:::\n\n::: {.cell-output-display}\n![](fitting-mixed-models_files/figure-html/unnamed-chunk-26-1.png){width=672}\n:::\n:::\n\n:::\n\n#### Alternative models\n\nWhat happens if we do fit the more complex random effects structures that were mentioned above?\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_complex <- lmer(intelligence ~ wingspan*scales + \n                              (1 + wingspan*scales|mountain), data=dragons)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_dragons_complex)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: intelligence ~ wingspan * scales + (1 + wingspan * scales | mountain)\n   Data: dragons\n\nREML criterion at convergence: 1624.5\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.45873 -0.60031 -0.02174  0.68955  2.57233 \n\nRandom effects:\n Groups   Name                    Variance  Std.Dev. Corr             \n mountain (Intercept)             123.98353 11.1348                   \n          wingspan                  0.29806  0.5460  -0.23            \n          scalesmetallic          161.17549 12.6955  -1.00  0.24      \n          wingspan:scalesmetallic   0.04834  0.2199   0.99 -0.35 -0.99\n Residual                         173.47387 13.1709                   \nNumber of obs: 200, groups:  mountain, 5\n\nFixed effects:\n                        Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)             89.15207    6.02151  4.02067  14.806 0.000117 ***\nwingspan                 1.00250    0.25040  4.00271   4.004 0.016060 *  \nscalesmetallic          16.03649    7.38831  4.38555   2.171 0.089747 .  \nwingspan:scalesmetallic -0.09559    0.12559  4.38484  -0.761 0.485488    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) wngspn sclsmt\nwingspan    -0.303              \nscalesmtllc -0.890  0.272       \nwngspn:scls  0.866 -0.362 -0.962\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThis is the most complex model we could fit, with all the possible fixed and random effects included. You'll notice that you encounter an error, telling you that you have singular fit.\n\nOur dataset is likely too small to support so many random effects; 200 might sound large, but in the context of a mixed effects model, it unfortunately is not.\n\nYou might also notice in the model summary that the estimated variance for the random slopes of `wingspan:scales` is also very small. This is a decent indication that this random effect probably isn't useful in this model, probably because this effect isn't actually occurring in our underlying dragon population.\n\n:::\n\n:::\n\n::: {.callout-exercise}\n#### Bonus questions\n\n\n{{< level 3 >}}\n\n\n\nFor those who want to push their understanding a bit further, here's a few additional things to think about. We won't give the answers here, but if you're interested, call a trainer over to chat about them more.\n\n- How could you adapt the code above to visualise a mixed effects model that did not include `scales` as a fixed predictor?\n- How much shrinkage do you observe for the lines of best fit in the `dragons` dataset? Is this more or less than in the `sleepstudy` dataset? Why might this be?\n- What syntax would you use in `lme4` to fit a model with the following equation to the dragons dataset?\n\n::: {.callout-note collapse=\"true\"}\n#### Model equation\n\nLevel 1:\n\n$$\ny_{ij} = \\beta_{0j} + \\beta_{1j}x_{1ij} + \\beta_{2j}x_{2ij} + \\beta_3x_{1ij}x_{2ij} + \\epsilon_{ij}\n$$\n\nLevel 2:\n\n$$\n\\beta_{0j} = \\gamma_{00} + U_{0j}\n$$\n$$\n\\beta_{1j} = \\gamma_{10} + U_{1j}\n$$\n$$\n\\beta_{2j} = \\gamma_{20} + U_{2j}\n$$\n\nand,\n\n$$\n\\left( \\begin{array}{c} U_{0j} \\\\ U_{1j} \\\\ U_{2j} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\\\ 0 \\\\ 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} & \\rho_{01} & \\rho_{02} \\\\ \\rho_{01} &  \\tau^2_{10} & \\rho_{12} \\\\ \\rho_{02} & \\rho_{12} & \\tau^2_{20} \\end{array} \\right)\n$$\n\nWhere $y$ is `intelligence`, $x_1$ is `wingspan`, $x_2$ is `scales`, $j$ represents mountain ranges and $i$ represents individual dragons within those mountain ranges.\n\n:::\n\n:::\n\n:::\n\n## Summary\n\nThis section of the course is designed to introduce the syntax required for fitting two-level mixed models in R, including both random intercepts and random slopes, and how we can visualise the resulting models.\n\nLater sections will address significance testing and assumption checking, as well as how to fit more complex mixed models.\n\n::: {.callout-tip}\n#### Key points\n- Mixed effects models can be fitted using the `lme4` package in R, which extends the linear model by introducing specialised syntax for random effects\n- For random intercepts, we use the format `(1|B)`, where B is our grouping variable\n- For random intercepts with random slopes, we use the format `(1 + A|B)`, where we allow the slope of A as well as the intercept to vary between levels of B\n- For random slopes only, we use `(0 + A|B)`, which gives random slopes for A without random intercepts\n- Random effects are fitted using partial pooling, which results in the phenomenon of \"shrinkage\"\n:::\n\n",
     "supporting": [
       "fitting-mixed-models_files"
     ],
diff --git a/_freeze/materials/generalised-mixed-models/execute-results/html.json b/_freeze/materials/generalised-mixed-models/execute-results/html.json
index b375129..8db8a13 100644
--- a/_freeze/materials/generalised-mixed-models/execute-results/html.json
+++ b/_freeze/materials/generalised-mixed-models/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "5f048f2d8fc53a9854549ee7d91dc876",
+  "hash": "0a89e360eeedac253330cc64154c0192",
   "result": {
-    "markdown": "---\ntitle: \"Generalised mixed models\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nThis page contains some information, along with a worked example, explaining how to fit and interpret generalised mixed effects models in `lme4`. \n\nThere are no exercises, but we will work through a dataset you'll recognise from earlier in the course as an example of the code.\n\n::: {.callout-tip}\n#### Prior knowledge\n\nThese bonus materials are intended to follow on from the materials and concepts introduced in our sister course on [generalised linear modelling](https://cambiotraining.github.io/stats-glm/), and will assume knowledge and familiarity with generalised linear models.\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll need several packages, including the new `glmmTMB`, to explore fitting generalised linear mixed models.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(performance)\nlibrary(glmmTMB)\n```\n:::\n\n:::\n\n## Generalising linear models\n\nOne of the assumptions of a linear model is that the response variable is continuous. But in many real experiments, the response variable might be one of the following:\n\n- binary (yes/no or success/fail)\n- proportional (number of successes out of all trials)\n- fractional (percentage of a quantity)\n- count (integers with a lower limit at 0)\n\nor might follow a strongly non-normal distribution, e.g., time or income often follow an exponential distribution.\n\nIn these cases, a linear model may not be appropriate, and/or a generalised linear model can provide a better fit. GLMs \"extend\" the standard linear model by wrapping the linear equation inside a non-linear link function. \n\n### Extending linear mixed effects models\n\nVery usefully, the procedure that we apply to generalise a standard linear model - namely, adding a link function - also works to generalise linear mixed effects models.\n\nBy including both a link function and one or more random effects, we can combine two extensions to the linear model to create generalised linear mixed effects models (GLMMs).\n\nThe assumptions of a GLMM are an amalgamation of the assumptions of a GLM and a linear mixed model:\n\n- Independent observations (after random effects)\n- Response variable follows distribution from exponential family (binomial, Poisson, beta, gamma, etc.)\n- Correct link function; there is a linear relationship between the linearised model\n- Normally distributed random effects\n\n## Revisiting arabidopsis\n\nTo give an illustration of how we fit and assess generalised linear mixed effects models (GLMMs), we'll look at the internal dataset `Arabidopsis` from `lme4`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Arabidopsis\")\n```\n:::\n\n:::\n\nIn this dataset, there are eight variables:\n\n- `total.fruits`, an integer variable measuring the total fruits produced per plant\n- `amd`, a variable measuring whether the plant underwent simulated herbivory (clipped or unclipped)\n- `nutrient`, a variable measuring which type of fertiliser/treatment the plant received (1, minimal or 8, added)\n- `reg`, or region, a variable with three categories (NL Netherlands, SP Spain, SW Sweden)\n- `popu`, or population, a variable representing groups within the regions\n- `gen`, or genotype, a variable with 24 categories\n- `rack`, a \"nuisance\" or confounding factor, representing which of two greenhouse racks the plant was grown on\n- `status`, another nuisance factor, representing the plant's germination method (Normal, Petri.Plate or Transplant)\n\nWe're interested in finding out whether the fruit yield can be predicted based on the type of fertiliser and whether the plant underwent simulated herbivory, across different genotypes and populations.\n\nIn the previous section of the course on checking assumptions, we fitted a standard linear mixed model to these data. Here, we'll fit a slightly simplified version:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_arabidopsis <- lmer(total.fruits ~ nutrient + amd + (1|popu) + (1|gen), \n                        data = Arabidopsis)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: total.fruits ~ nutrient + amd + (1 | popu) + (1 | gen)\n   Data: Arabidopsis\n\nREML criterion at convergence: 6245.2\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.7839 -0.6391 -0.2043  0.2621  5.3628 \n\nRandom effects:\n Groups   Name        Variance  Std.Dev. \n gen      (Intercept) 5.498e-13 7.415e-07\n popu     (Intercept) 1.517e+02 1.232e+01\n Residual             1.264e+03 3.555e+01\nNumber of obs: 625, groups:  gen, 24; popu, 9\n\nFixed effects:\n             Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)     8.697      4.981  14.233   1.746    0.102    \nnutrient        4.578      0.407 614.918  11.248   <2e-16 ***\namdunclipped    4.540      2.847 614.662   1.595    0.111    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) nutrnt\nnutrient    -0.374       \namdunclippd -0.299  0.016\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nBut we found that the diagnostic plots for this model did not look good, in particular the residual vs fitted, location-scale, normal Q-Q and posterior predictive check plots:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_arabidopsis, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n:::\n\nYou may have spotted the reason for this when you completed the exercise in section 7 of this course: `total.fruits` is not a continuous response variable, but instead a count variable.\n\nWe want to improve the way that we're modelling this variable by including a link function.\n\n### The glmer function\n\nSince `total.fruits` is a count variable, there's a decent chance it follows a Poisson distribution. So as a first step in trying to improve our model, let's try specifying the log link function.\n\nWe do this in `lme4` using the `glmer` function. It combines the syntax that you're already used to from `lmer`, with the syntax from the `glm` function. In other words, we keep all the same syntax for random effects, and we include the `family` argument to determine which link function we're using.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmm_arabidopsis <- glmer(total.fruits ~ nutrient + amd + (1|popu) + (1|gen), \n                          data = Arabidopsis, family = \"poisson\")\n\nsummary(glmm_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nGeneralized linear mixed model fit by maximum likelihood (Laplace\n  Approximation) [glmerMod]\n Family: poisson  ( log )\nFormula: total.fruits ~ nutrient + amd + (1 | popu) + (1 | gen)\n   Data: Arabidopsis\n\n     AIC      BIC   logLik deviance df.resid \n 20985.0  21007.2 -10487.5  20975.0      620 \n\nScaled residuals: \n   Min     1Q Median     3Q    Max \n-8.571 -3.648 -2.069  1.774 42.407 \n\nRandom effects:\n Groups Name        Variance Std.Dev.\n gen    (Intercept) 0.06356  0.2521  \n popu   (Intercept) 0.25746  0.5074  \nNumber of obs: 625, groups:  gen, 24; popu, 9\n\nFixed effects:\n             Estimate Std. Error z value Pr(>|z|)    \n(Intercept)  2.321461   0.178166  13.030   <2e-16 ***\nnutrient     0.170799   0.002493  68.508   <2e-16 ***\namdunclipped 0.139879   0.014719   9.503   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) nutrnt\nnutrient    -0.090       \namdunclippd -0.045  0.006\n```\n:::\n:::\n\n:::\n\nSome brief points of comparison between this model summary, and the summary for `lme_arabidopsis` above. \n\nFirstly, you'll see the GLMM has been fitted using maximum likelihood estimation rather than ReML. Secondly, you'll also see that there are some p-values provided as standard in the GLMM output; these are called Wald tests, which test whether the coefficient value is significantly different from zero (this is subtly different from testing whether the individual predictor itself is significant).\n\nLet's have a look at the diagnostic plots, and see if we've made any improvements on our standard linear mixed model.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmm_arabidopsis, residual_type = \"normal\",\n            check = c(\"pp_check\", \"outliers\", \"reqq\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\n:::\n\nWe have one potentially influential point we might want to investigate, which has a Cook's distance > 0.8. (Note that you can also use the `check_outliers` function if you find the plot above a little difficult to interpret, or if you want to change the threshold.)\n\nOur random effects do appear to be nicely normally distributed.\n\nThe posterior predictive check, however, raises some concerns. The blue simulated values don't really appear to be following the pattern of the data (green), especially on the left hand side of the plot.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmm_arabidopsis, residual_type = \"normal\",\n            check = c(\"vif\", \"overdispersion\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_overdispersion(glmm_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# Overdispersion test\n\n       dispersion ratio =    37.821\n  Pearson's Chi-Squared = 23449.159\n                p-value =   < 0.001\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nOverdispersion detected.\n```\n:::\n:::\n\n:::\n\nWell, we're fine on collinearity, but overdispersion/zero-inflation seems a huge problem, especially when we use the `check_overdispersion` function to investigate in more detail. It seems that the Poisson distribution actually isn't representative of our response variable.\n\n### Negative binomial regression\n\nWe can, instead of Poisson regression, try fitting a negative binomial regression instead. As with standard GLMs, this requires a slightly different function - `glmer.nb` rather than `glmer`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmmnb_arabidopsis <- glmer.nb(total.fruits ~ nutrient + amd + (1|popu) + (1|gen), \n                               data = Arabidopsis)\n\nsummary(glmmnb_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nGeneralized linear mixed model fit by maximum likelihood (Laplace\n  Approximation) [glmerMod]\n Family: Negative Binomial(0.536)  ( log )\nFormula: total.fruits ~ nutrient + amd + (1 | popu) + (1 | gen)\n   Data: Arabidopsis\n\n     AIC      BIC   logLik deviance df.resid \n  5051.1   5077.8  -2519.6   5039.1      619 \n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-0.7286 -0.6592 -0.3517  0.2440 12.2435 \n\nRandom effects:\n Groups Name        Variance Std.Dev.\n gen    (Intercept) 0.0461   0.2147  \n popu   (Intercept) 0.2466   0.4966  \nNumber of obs: 625, groups:  gen, 24; popu, 9\n\nFixed effects:\n             Estimate Std. Error z value Pr(>|z|)    \n(Intercept)   2.21630    0.20928   10.59   <2e-16 ***\nnutrient      0.17569    0.01646   10.68   <2e-16 ***\namdunclipped  0.27879    0.11426    2.44   0.0147 *  \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) nutrnt\nnutrient    -0.410       \namdunclippd -0.323  0.126\n```\n:::\n:::\n\n:::\n\nIf we check the diagnostic plots, we can see a bit of improvement - the posterior predictive check in particular looks much better.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmmnb_arabidopsis, residual_type = \"normal\",\n            check = c(\"pp_check\", \"outliers\", \"reqq\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(glmmnb_arabidopsis, residual_type = \"normal\",\n            check = c(\"vif\", \"overdispersion\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-10-2.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_overdispersion(glmmnb_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# Overdispersion test\n\n dispersion ratio = 0.386\n          p-value =  0.04\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nUnderdispersion detected.\n```\n:::\n:::\n\n:::\n\nIt could still be better; there's evidence now for underdispersion. \n\nThe lingering issues might be because of zero-inflation. If we look at the distribution of the data via a histogram, this certainly looks plausible.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = Arabidopsis, aes(x = total.fruits)) +\n  geom_histogram()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n:::\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n:::\n\n:::\n\nWhat are the next steps in improving this analysis?\n\nWell, we could fit a zero-inflated model to these data. Because zero-inflated models are a bit more complex - you're actually fitting two different models or distributions simultaneously to the same dataset - `lme4` unfortunately doesn't contain a function that allows us to do this.\n\nIf you need to go beyond the standard array of distributions that are offered in `glm` and `glmer`, such as fitting a zero-inflated model, you have to explore other R packages. To help guide you, there is a brief description in the next session of some possible options. \n\n## Alternative packages\n\nThough we have focused heavily on `lme4` in this course, and for this section on GLMMs, it's important to flag to you that this is not the *only* package for fitting generalised mixed effects models (or linear mixed effects models, as it happens).\n\n### The glmmTMB package\n\nThis package is designed explicitly for generalised mixed effects modelling in R (and somewhat as an extension to `lme4`, so the syntax isn't too unfamiliar). \n\nYou can find a manual for the `glmmTMB` package written by the author [here](https://cran.r-project.org/web/packages/glmmTMB/vignettes/glmmTMB.pdf) that contains more information and code examples.\n\nHow might we use the package to fit a zero-inflated Poisson model for the `Arabidopsis` dataset?\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmmzip_arabidopsis <- glmmTMB(total.fruits ~ nutrient + rack + status + amd + reg + \n                          (1|popu) + (1|gen), data = Arabidopsis,\n                          family = \"poisson\", ziformula = ~1)\n\nsummary(glmmzip_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Family: poisson  ( log )\nFormula:          \ntotal.fruits ~ nutrient + rack + status + amd + reg + (1 | popu) +  \n    (1 | gen)\nZero inflation:                ~1\nData: Arabidopsis\n\n     AIC      BIC   logLik deviance df.resid \n 16065.3  16114.1  -8021.6  16043.3      614 \n\nRandom effects:\n\nConditional model:\n Groups Name        Variance Std.Dev.\n popu   (Intercept) 0.02115  0.1454  \n gen    (Intercept) 0.02795  0.1672  \nNumber of obs: 625, groups:  popu, 9; gen, 24\n\nConditional model:\n                   Estimate Std. Error z value Pr(>|z|)    \n(Intercept)        3.412538   0.132959   25.67  < 2e-16 ***\nnutrient           0.156741   0.002507   62.52  < 2e-16 ***\nrack              -0.668821   0.016042  -41.69  < 2e-16 ***\nstatusPetri.Plate -0.161421   0.022427   -7.20 6.13e-13 ***\nstatusTransplant  -0.184060   0.020137   -9.14  < 2e-16 ***\namdunclipped       0.059388   0.014770    4.02 5.80e-05 ***\nregSP              0.448013   0.156767    2.86  0.00427 ** \nregSW             -0.073457   0.168100   -0.44  0.66212    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nZero-inflation model:\n            Estimate Std. Error z value Pr(>|z|)    \n(Intercept)  -1.3808     0.1001   -13.8   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThe new bit of syntax is the `ziformula` argument. If you set this equal to `~0`, you are asking R to estimate the model *excluding* zero-inflation (which is also the default). So, to model the zero-inflation, you must set this argument equal to `~1`.\n\nWe could look at all the diagnostic plots (and in a real analysis situation, you would), but let's focus on the posterior predictive check.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmmzip_arabidopsis, residual_type = \"normal\", check = \"pp_check\")\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n:::\n\nIt's doing a much, much better job now of estimating those zeroes (top left of the plot). However, it's suffering from similar problems to our original Poisson model in the range around 1-15.\n\nPerhaps a zero-inflated negative binomial model might do the trick for the `Arabidopsis` dataset? We can fit that in `glmmTMB` by updating the `family` argument.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmmzinb_arabidopsis <- glmmTMB(total.fruits ~ nutrient + rack + status + amd + reg + \n                          (1|popu) + (1|gen), data = Arabidopsis,\n                          family = \"nbinom2\", ziformula = ~1)\n\ncheck_model(glmmzinb_arabidopsis, residual_type = \"normal\", check = \"pp_check\")\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-14-1.png){width=672}\n:::\n:::\n\n:::\n\nNot perfect - but perhaps better?\n\n### Even more packages\n\nEven `glmmTMB` is not the end of the road. There are others one could use, including packages such as `brms` and `GLMMadaptive`, or the `glmmPQL` function from `MASS`, and you may see these cropping up in online tutorials or even papers.\n\nFor a detailed list of packages, [this resource](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#which-r-packages-functions-fit-glmms) from Bolker et al. is a great starting point.\n\nA note of caution: not all packages will implement exactly the same computational methods \"under the hood\" as `lme4`, because fitting and assessing mixed effects models, especially non-linear and generalised ones, is difficult to do and therefore is still an area of active research and discussion in statistics. \n\nSo, if you notice that you get different estimates and numbers when fitting models in different packages, don't panic. What matters more than anything is the conclusion you draw from your data overall, and how confident you are in that conclusion.\n\nFor those of you with an interest in the computational side of things, you might find resources such as [this blog post](https://rpubs.com/kaz_yos/glmm1) to be a useful starting place.\n\n## Summary\n\nLinear mixed effects models can be generalised in the same way that standard linear models are: by wrapping the linear equation inside a non-linear link function. The link function is chosen based on the distribution of the response variable.\n\nAlternatively, you might prefer to think of it the other way around: that GLMs can be extended to cope with non-independence by adding random effects to them. In either case, the result is the same. Both random effects and link functions can be used simultaneously, to cope with the (quite common!) situation where a dataset is both hierarchical and has a non-continuous response variable.\n\n::: {.callout-tip}\n#### Key points\n- By including both a link function to linearise the model, and random effects, we can fit generalised linear mixed effects models in R\n- We can do this by using the `glmer` or `glmer.nb` functions from `lme4` for most of the \"common\" GLMMs\n- Other packages such as `glmmTMB` are needed for zero-inflated models and other extensions\n- Evaluating and assessing GLMMs can be done using the same methods as for standard GLMs/linear mixed effects models\n:::\n\n",
+    "markdown": "---\ntitle: \"Generalised mixed models\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nThis page contains some information, along with a worked example, explaining how to fit and interpret generalised mixed effects models in `lme4`. \n\nThere are no exercises, but we will work through a dataset you'll recognise from earlier in the course as an example of the code.\n\n::: {.callout-tip}\n#### Prior knowledge\n\nThese bonus materials are intended to follow on from the materials and concepts introduced in our sister course on [generalised linear modelling](https://cambiotraining.github.io/stats-glm/), and will assume knowledge and familiarity with generalised linear models.\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll need several packages, including the new `glmmTMB`, to explore fitting generalised linear mixed models.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(performance)\nlibrary(glmmTMB)\n```\n:::\n\n:::\n\n## Generalising linear models\n\nOne of the assumptions of a linear model is that the response variable is continuous. But in many real experiments, the response variable might be one of the following:\n\n- binary (yes/no or success/fail)\n- proportional (number of successes out of all trials)\n- fractional (percentage of a quantity)\n- count (integers with a lower limit at 0)\n\nor might follow a strongly non-normal distribution, e.g., time or income often follow an exponential distribution.\n\nIn these cases, a linear model may not be appropriate, and/or a generalised linear model can provide a better fit. GLMs \"extend\" the standard linear model by wrapping the linear equation inside a non-linear link function. \n\n### Extending linear mixed effects models\n\nVery usefully, the procedure that we apply to generalise a standard linear model - namely, adding a link function - also works to generalise linear mixed effects models.\n\nBy including both a link function and one or more random effects, we can combine two extensions to the linear model to create generalised linear mixed effects models (GLMMs).\n\nThe assumptions of a GLMM are an amalgamation of the assumptions of a GLM and a linear mixed model:\n\n- Independent observations (after random effects)\n- Response variable follows distribution from exponential family (binomial, Poisson, beta, gamma, etc.)\n- Correct link function; there is a linear relationship between the linearised model\n- Normally distributed random effects\n\n## Revisiting Arabidopsis\n\nTo give an illustration of how we fit and assess generalised linear mixed effects models (GLMMs), we'll look at the internal dataset `Arabidopsis`, which we investigated earlier in the course in [Exercise -@sec-exr_arabidopsis].\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Arabidopsis\")\n```\n:::\n\n:::\n\nIn this dataset, there are eight variables:\n\n- `total.fruits`, an integer variable measuring the total fruits produced per plant\n- `amd`, a variable measuring whether the plant underwent simulated herbivory (clipped or unclipped)\n- `nutrient`, a variable measuring which type of fertiliser/treatment the plant received (1, minimal or 8, added)\n- `reg`, or region, a variable with three categories (NL Netherlands, SP Spain, SW Sweden)\n- `popu`, or population, a variable representing groups within the regions\n- `gen`, or genotype, a variable with 24 categories\n- `rack`, a \"nuisance\" or confounding factor, representing which of two greenhouse racks the plant was grown on\n- `status`, another nuisance factor, representing the plant's germination method (Normal, Petri.Plate or Transplant)\n\nWe're interested in finding out whether the fruit yield can be predicted based on the type of fertiliser and whether the plant underwent simulated herbivory, across different genotypes and populations.\n\nIn the previous section of the course on checking assumptions, we fitted a standard linear mixed model to these data. Here, we'll fit a slightly simplified version:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_arabidopsis <- lmer(total.fruits ~ nutrient + amd + (1|popu) + (1|gen), \n                        data = Arabidopsis)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: total.fruits ~ nutrient + amd + (1 | popu) + (1 | gen)\n   Data: Arabidopsis\n\nREML criterion at convergence: 6245.2\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.7839 -0.6391 -0.2043  0.2621  5.3628 \n\nRandom effects:\n Groups   Name        Variance  Std.Dev. \n gen      (Intercept) 5.498e-13 7.415e-07\n popu     (Intercept) 1.517e+02 1.232e+01\n Residual             1.264e+03 3.555e+01\nNumber of obs: 625, groups:  gen, 24; popu, 9\n\nFixed effects:\n             Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)     8.697      4.981  14.233   1.746    0.102    \nnutrient        4.578      0.407 614.918  11.248   <2e-16 ***\namdunclipped    4.540      2.847 614.662   1.595    0.111    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) nutrnt\nnutrient    -0.374       \namdunclippd -0.299  0.016\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nBut we found that the diagnostic plots for this model did not look good, in particular the residual vs fitted, location-scale, normal Q-Q and posterior predictive check plots:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_arabidopsis, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n:::\n\nYou may have spotted the reason for this when you completed the exercise in section 7 of this course: `total.fruits` is not a continuous response variable, but instead a count variable.\n\nWe want to improve the way that we're modelling this variable by including a link function.\n\n### The glmer function\n\nSince `total.fruits` is a count variable, there's a decent chance it follows a Poisson distribution. So as a first step in trying to improve our model, let's try specifying the log link function.\n\nWe do this in `lme4` using the `glmer` function. It combines the syntax that you're already used to from `lmer`, with the syntax from the `glm` function. In other words, we keep all the same syntax for random effects, and we include the `family` argument to determine which link function we're using.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmm_arabidopsis <- glmer(total.fruits ~ nutrient + amd + (1|popu) + (1|gen), \n                          data = Arabidopsis, family = \"poisson\")\n\nsummary(glmm_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nGeneralized linear mixed model fit by maximum likelihood (Laplace\n  Approximation) [glmerMod]\n Family: poisson  ( log )\nFormula: total.fruits ~ nutrient + amd + (1 | popu) + (1 | gen)\n   Data: Arabidopsis\n\n     AIC      BIC   logLik deviance df.resid \n 20985.0  21007.2 -10487.5  20975.0      620 \n\nScaled residuals: \n   Min     1Q Median     3Q    Max \n-8.571 -3.648 -2.069  1.774 42.407 \n\nRandom effects:\n Groups Name        Variance Std.Dev.\n gen    (Intercept) 0.06356  0.2521  \n popu   (Intercept) 0.25746  0.5074  \nNumber of obs: 625, groups:  gen, 24; popu, 9\n\nFixed effects:\n             Estimate Std. Error z value Pr(>|z|)    \n(Intercept)  2.321461   0.178166  13.030   <2e-16 ***\nnutrient     0.170799   0.002493  68.508   <2e-16 ***\namdunclipped 0.139879   0.014719   9.503   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) nutrnt\nnutrient    -0.090       \namdunclippd -0.045  0.006\n```\n:::\n:::\n\n:::\n\nSome brief points of comparison between this model summary, and the summary for `lme_arabidopsis` above. \n\nFirstly, you'll see the GLMM has been fitted using maximum likelihood estimation rather than ReML. Secondly, you'll also see that there are some p-values provided as standard in the GLMM output; these are called Wald tests, which test whether the coefficient value is significantly different from zero (this is subtly different from testing whether the individual predictor itself is significant).\n\nLet's have a look at the diagnostic plots, and see if we've made any improvements on our standard linear mixed model.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmm_arabidopsis, residual_type = \"normal\",\n            check = c(\"pp_check\", \"outliers\", \"reqq\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\n:::\n\nWe have one potentially influential point we might want to investigate, which has a Cook's distance > 0.8. (Note that you can also use the `check_outliers` function if you find the plot above a little difficult to interpret, or if you want to change the threshold.)\n\nOur random effects do appear to be nicely normally distributed.\n\nThe posterior predictive check, however, raises some concerns. The blue simulated values don't really appear to be following the pattern of the data (green), especially on the left hand side of the plot.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmm_arabidopsis, residual_type = \"normal\",\n            check = c(\"vif\", \"overdispersion\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_overdispersion(glmm_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# Overdispersion test\n\n       dispersion ratio =    37.821\n  Pearson's Chi-Squared = 23449.159\n                p-value =   < 0.001\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nOverdispersion detected.\n```\n:::\n:::\n\n:::\n\nWell, we're fine on collinearity, but overdispersion/zero-inflation seems a huge problem, especially when we use the `check_overdispersion` function to investigate in more detail. It seems that the Poisson distribution actually isn't representative of our response variable.\n\n### Negative binomial regression\n\nWe can, instead of Poisson regression, try fitting a negative binomial regression instead. As with standard GLMs, this requires a slightly different function - `glmer.nb` rather than `glmer`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmmnb_arabidopsis <- glmer.nb(total.fruits ~ nutrient + amd + (1|popu) + (1|gen), \n                               data = Arabidopsis)\n\nsummary(glmmnb_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nGeneralized linear mixed model fit by maximum likelihood (Laplace\n  Approximation) [glmerMod]\n Family: Negative Binomial(0.536)  ( log )\nFormula: total.fruits ~ nutrient + amd + (1 | popu) + (1 | gen)\n   Data: Arabidopsis\n\n     AIC      BIC   logLik deviance df.resid \n  5051.1   5077.8  -2519.6   5039.1      619 \n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-0.7286 -0.6592 -0.3517  0.2440 12.2435 \n\nRandom effects:\n Groups Name        Variance Std.Dev.\n gen    (Intercept) 0.0461   0.2147  \n popu   (Intercept) 0.2466   0.4966  \nNumber of obs: 625, groups:  gen, 24; popu, 9\n\nFixed effects:\n             Estimate Std. Error z value Pr(>|z|)    \n(Intercept)   2.21630    0.20928   10.59   <2e-16 ***\nnutrient      0.17569    0.01646   10.68   <2e-16 ***\namdunclipped  0.27879    0.11426    2.44   0.0147 *  \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr) nutrnt\nnutrient    -0.410       \namdunclippd -0.323  0.126\n```\n:::\n:::\n\n:::\n\nIf we check the diagnostic plots, we can see a bit of improvement - the posterior predictive check in particular looks much better.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmmnb_arabidopsis, residual_type = \"normal\",\n            check = c(\"pp_check\", \"outliers\", \"reqq\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(glmmnb_arabidopsis, residual_type = \"normal\",\n            check = c(\"vif\", \"overdispersion\"))\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-10-2.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_overdispersion(glmmnb_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# Overdispersion test\n\n dispersion ratio = 0.386\n          p-value =  0.04\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nUnderdispersion detected.\n```\n:::\n:::\n\n:::\n\nIt could still be better; there's evidence now for underdispersion. \n\nThe lingering issues might be because of zero-inflation. If we look at the distribution of the data via a histogram, this certainly looks plausible.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = Arabidopsis, aes(x = total.fruits)) +\n  geom_histogram()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n:::\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n:::\n\n:::\n\nWhat are the next steps in improving this analysis?\n\nWell, we could fit a zero-inflated model to these data. Because zero-inflated models are a bit more complex - you're actually fitting two different models or distributions simultaneously to the same dataset - `lme4` unfortunately doesn't contain a function that allows us to do this.\n\nIf you need to go beyond the standard array of distributions that are offered in `glm` and `glmer`, such as fitting a zero-inflated model, you have to explore other R packages. To help guide you, there is a brief description in the next session of some possible options. \n\n## Alternative packages\n\nThough we have focused heavily on `lme4` in this course, and for this section on GLMMs, it's important to flag to you that this is not the *only* package for fitting generalised mixed effects models (or linear mixed effects models, as it happens).\n\n### The glmmTMB package\n\nThis package is designed explicitly for generalised mixed effects modelling in R (and somewhat as an extension to `lme4`, so the syntax isn't too unfamiliar). \n\nYou can find a manual for the `glmmTMB` package written by the author [here](https://cran.r-project.org/web/packages/glmmTMB/vignettes/glmmTMB.pdf) that contains more information and code examples.\n\nHow might we use the package to fit a zero-inflated Poisson model for the `Arabidopsis` dataset?\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmmzip_arabidopsis <- glmmTMB(total.fruits ~ nutrient + rack + status + amd + reg + \n                          (1|popu) + (1|gen), data = Arabidopsis,\n                          family = \"poisson\", ziformula = ~1)\n\nsummary(glmmzip_arabidopsis)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Family: poisson  ( log )\nFormula:          \ntotal.fruits ~ nutrient + rack + status + amd + reg + (1 | popu) +  \n    (1 | gen)\nZero inflation:                ~1\nData: Arabidopsis\n\n     AIC      BIC   logLik deviance df.resid \n 16065.3  16114.1  -8021.6  16043.3      614 \n\nRandom effects:\n\nConditional model:\n Groups Name        Variance Std.Dev.\n popu   (Intercept) 0.02115  0.1454  \n gen    (Intercept) 0.02795  0.1672  \nNumber of obs: 625, groups:  popu, 9; gen, 24\n\nConditional model:\n                   Estimate Std. Error z value Pr(>|z|)    \n(Intercept)        3.412538   0.132959   25.67  < 2e-16 ***\nnutrient           0.156741   0.002507   62.52  < 2e-16 ***\nrack              -0.668821   0.016042  -41.69  < 2e-16 ***\nstatusPetri.Plate -0.161421   0.022427   -7.20 6.13e-13 ***\nstatusTransplant  -0.184060   0.020137   -9.14  < 2e-16 ***\namdunclipped       0.059388   0.014770    4.02 5.80e-05 ***\nregSP              0.448013   0.156767    2.86  0.00427 ** \nregSW             -0.073457   0.168100   -0.44  0.66212    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nZero-inflation model:\n            Estimate Std. Error z value Pr(>|z|)    \n(Intercept)  -1.3808     0.1001   -13.8   <2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThe new bit of syntax is the `ziformula` argument. If you set this equal to `~0`, you are asking R to estimate the model *excluding* zero-inflation (which is also the default). So, to model the zero-inflation, you must set this argument equal to `~1`.\n\nWe could look at all the diagnostic plots (and in a real analysis situation, you would), but let's focus on the posterior predictive check.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(glmmzip_arabidopsis, residual_type = \"normal\", check = \"pp_check\")\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n:::\n\nIt's doing a much, much better job now of estimating those zeroes (top left of the plot). However, it's suffering from similar problems to our original Poisson model in the range around 1-15.\n\nPerhaps a zero-inflated negative binomial model might do the trick for the `Arabidopsis` dataset? We can fit that in `glmmTMB` by updating the `family` argument.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nglmmzinb_arabidopsis <- glmmTMB(total.fruits ~ nutrient + rack + status + amd + reg + \n                          (1|popu) + (1|gen), data = Arabidopsis,\n                          family = \"nbinom2\", ziformula = ~1)\n\ncheck_model(glmmzinb_arabidopsis, residual_type = \"normal\", check = \"pp_check\")\n```\n\n::: {.cell-output-display}\n![](generalised-mixed-models_files/figure-html/unnamed-chunk-14-1.png){width=672}\n:::\n:::\n\n:::\n\nNot perfect - but perhaps better?\n\n### Even more packages\n\nEven `glmmTMB` is not the end of the road. There are others one could use, including packages such as `brms` and `GLMMadaptive`, or the `glmmPQL` function from `MASS`, and you may see these cropping up in online tutorials or even papers.\n\nFor a detailed list of packages, [this resource](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#which-r-packages-functions-fit-glmms) from Bolker et al. is a great starting point.\n\nA note of caution: not all packages will implement exactly the same computational methods \"under the hood\" as `lme4`, because fitting and assessing mixed effects models, especially non-linear and generalised ones, is difficult to do and therefore is still an area of active research and discussion in statistics. \n\nSo, if you notice that you get different estimates and numbers when fitting models in different packages, don't panic. What matters more than anything is the conclusion you draw from your data overall, and how confident you are in that conclusion.\n\nFor those of you with an interest in the computational side of things, you might find resources such as [this blog post](https://rpubs.com/kaz_yos/glmm1) to be a useful starting place.\n\n## Summary\n\nLinear mixed effects models can be generalised in the same way that standard linear models are: by wrapping the linear equation inside a non-linear link function. The link function is chosen based on the distribution of the response variable.\n\nAlternatively, you might prefer to think of it the other way around: that GLMs can be extended to cope with non-independence by adding random effects to them. In either case, the result is the same. Both random effects and link functions can be used simultaneously, to cope with the (quite common!) situation where a dataset is both hierarchical and has a non-continuous response variable.\n\n::: {.callout-tip}\n#### Key points\n- By including both a link function to linearise the model, and random effects, we can fit generalised linear mixed effects models in R\n- We can do this by using the `glmer` or `glmer.nb` functions from `lme4` for most of the \"common\" GLMMs\n- Other packages such as `glmmTMB` are needed for zero-inflated models and other extensions\n- Evaluating and assessing GLMMs can be done using the same methods as for standard GLMs/linear mixed effects models\n:::\n\n",
     "supporting": [
       "generalised-mixed-models_files"
     ],
diff --git a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-10-1.png b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-10-1.png
index 3ee760b..f5046eb 100644
Binary files a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-10-1.png and b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-10-1.png differ
diff --git a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-13-1.png b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-13-1.png
index 0f18742..8f4525a 100644
Binary files a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-13-1.png and b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-13-1.png differ
diff --git a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-14-1.png b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-14-1.png
index 5a823b4..c537f1a 100644
Binary files a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-14-1.png and b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-14-1.png differ
diff --git a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-5-1.png b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-5-1.png
index 3675858..dfda467 100644
Binary files a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-5-1.png and b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-5-1.png differ
diff --git a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-7-1.png b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-7-1.png
index 0650a4c..726f3e9 100644
Binary files a/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-7-1.png and b/_freeze/materials/generalised-mixed-models/figure-html/unnamed-chunk-7-1.png differ
diff --git a/_freeze/materials/nested-random-effects/execute-results/html.json b/_freeze/materials/nested-random-effects/execute-results/html.json
index 029a2e5..e5f82f1 100644
--- a/_freeze/materials/nested-random-effects/execute-results/html.json
+++ b/_freeze/materials/nested-random-effects/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "741c01046942d153f2bec0d4c0299b74",
+  "hash": "476ac352878f5df7819b8ff73cf36db4",
   "result": {
-    "markdown": "---\ntitle: \"Nested random effects\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nMixed effects models are also sometimes referred to as \"hierarchical\" or \"multi-level\" models. So far in these materials, we've only fitted two-level models, containing a single clustering variable or effect. Sometimes, however, there are random effects nested *inside* others.\n\n## What is a nested random effect?\n\nOnce we are at the stage of having multiple variables in our sample that create clusters or groups, it becomes relevant to consider the relationship that those clustering variables have to one another, to ensure that we're fitting a model that properly represents our experimental design.\n\nWe describe factor B as being nested inside factor A, if each group/category of B only occurs within one group/category of factor A.\n\nFor instance, data on academic performance may be structured as children grouped within classrooms, with classrooms grouped within schools. A histology experiment might measure individual cells grouped within slices, with slices grouped within larger samples. Air pollution data might be measured at observation stations grouped within a particular city, with multiple cities per country.\n\n## Fitting a three-level model\n\nAnother classic example of nested random effects that would prompt a three-level model can be found in a clinical setting: within each hospital, there are multiple doctors, each of whom treats multiple patients. (Here, we will assume that each doctor only works at a single hospital, and that each patient is only treated by a single doctor.)\n\nHere's an image of how that experimental design looks. Level 1 is the patients, level 2 is the doctors, and level 3 is the hospitals. \n\nThis is, of course, a simplified version - we would hope that there are more than two hospitals, four doctors and eight patients in the full sample!\n\n![Experimental design](images_mixed-effects/nested-patients1.png){width=70%}\n\nWe have a single fixed predictor of interest `treatment` (for which there are two possible treatments, A or B), and some continuous response variable `outcome`.\n\nWhat model would we fit to these data? Well, it gets a touch more complex now that we have multiple levels in this dataset.\n\n### A three-level random intercepts model\n\nLet's put random slopes to one side, since they take a bit more thought, and think about how we would fit just some random intercepts for now.\n\nIt would be appropriate to fit two sets of random intercepts in this model, one for each set of clusters we have. In this case, that means a set of intercepts for the doctors, and a set of intercepts for the hospital.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nhealth <- read_csv(\"data/health.csv\")\n```\n:::\n\n:::\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_health_intercepts <- lmer(outcome ~ treatment + (1|doctor) + (1|hospital),\n                   data = health)\n\nsummary(lme_health_intercepts)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: outcome ~ treatment + (1 | doctor) + (1 | hospital)\n   Data: health\n\nREML criterion at convergence: 1848.5\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.56989 -0.66086  0.06162  0.67602  2.84690 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n doctor   (Intercept)  2.1416  1.4634  \n hospital (Intercept)  0.1688  0.4108  \n Residual             26.3425  5.1325  \nNumber of obs: 300, groups:  doctor, 30; hospital, 5\n\nFixed effects:\n                 Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)       26.6155     0.5299   8.4431   50.23 9.34e-12 ***\ntreatmentsurgery   6.2396     0.5926 269.0000   10.53  < 2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ntrtmntsrgry -0.559\n```\n:::\n:::\n\n:::\n\nThis produces a model with two random effects, namely, two sets of random intercepts.\n\n### Where to include random slopes?\n\nDeciding what level(s) we want to fit random slopes at, requires us to think about what level of our hierarchy we've applied our `Treatment` variable at. We'll get to that in a moment.\n\n#### Predictor varies at level 1\n\nLet's start by imagining the following scenario: the `Treatment` variable is varying at our lowest level. Each patient receives only one type of treatment (A or B), but both treatment types are represented \"within\" each doctor and within each hospital:\n\n![Scenario 1: predictor varies at level 1 (between patients, within doctors)](images_mixed-effects/nested-patients2.png){width=70%}\n\nAs a result, it would be inappropriate to ask `lme4` to fit random slopes for the `treatment` variable at the patient level. Instead, the \"full\" model (i.e., a model containing all of the possible fixed and random effects) would be the following:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_health_slopes <- lmer(outcome ~ treatment + (1 + treatment|doctor) + \n                      (1 + treatment|hospital), data = health)\n\nsummary(lme_health_slopes)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: outcome ~ treatment + (1 + treatment | doctor) + (1 + treatment |  \n    hospital)\n   Data: health\n\nREML criterion at convergence: 1834.1\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.38341 -0.60702 -0.02763  0.71956  2.88951 \n\nRandom effects:\n Groups   Name             Variance Std.Dev. Corr \n doctor   (Intercept)       7.8846  2.8080        \n          treatmentsurgery  7.3157  2.7047   -0.96\n hospital (Intercept)       0.5085  0.7131        \n          treatmentsurgery  3.5595  1.8867   -0.91\n Residual                  23.5775  4.8557        \nNumber of obs: 300, groups:  doctor, 30; hospital, 5\n\nFixed effects:\n                 Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)       26.6155     0.7223  4.0083  36.849 3.17e-06 ***\ntreatmentsurgery   6.2396     1.1270  3.9992   5.537  0.00521 ** \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ntrtmntsrgry -0.794\n```\n:::\n:::\n\n:::\n\nThis produces a model with four sets of random effects: two sets of random intercepts, and two sets of random slopes.\n\n#### Predictor varies at level 2\n\nLet's now imagine a (perhaps more realistic) scenario. Each doctor is in fact a specialist in a certain type of treatment, but cannot deliver both. For this, we will need to read in the second version of our dataset.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nhealth2 <- read_csv(\"data/health2.csv\")\n```\n:::\n\n:::\n\nIf you look closely at the dataset, you can see that `treatment` does not vary within `doctor`; instead, it only varies within `hospital`. \n\n![Scenario 2: predictor varies at level 2 (between doctors, within hospitals)](images_mixed-effects/nested-patients3.png){width=70%}\n\nThis means we cannot fit random slopes for treatment at the second level any more. We have to drop our random slopes for `treatment` by `doctor`, like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_health_slopes2 <- lmer(outcome ~ treatment + (1|doctor) + \n                      (1 + treatment|hospital), data = health2)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_health_slopes2)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: outcome ~ treatment + (1 | doctor) + (1 + treatment | hospital)\n   Data: health2\n\nREML criterion at convergence: 1845.7\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.46342 -0.69900 -0.05043  0.69559  3.09601 \n\nRandom effects:\n Groups   Name             Variance Std.Dev. Corr \n doctor   (Intercept)       6.318   2.514         \n hospital (Intercept)       1.372   1.171         \n          treatmentsurgery  3.635   1.907    -1.00\n Residual                  24.417   4.941         \nNumber of obs: 300, groups:  doctor, 30; hospital, 5\n\nFixed effects:\n                 Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)       25.7019     0.9265  4.3745  27.740 4.38e-06 ***\ntreatmentsurgery   4.5061     1.3766  4.0456   3.273   0.0302 *  \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ntrtmntsrgry -0.808\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThis gives us three sets of random effects, as opposed to four.\n\n#### Predictor varies at level 3\n\nFinally, let's imagine a scenario where each hospital is only equipped to offer one type of treatment (hopefully, not a realistic scenario!). Here, all doctors and patients within each hospital use exactly the same method. \n\n![Scenario 3: predictor varies at level 3 (between hospitals)](images_mixed-effects/nested-patients4.png){width=70%}\n\nAt this stage, we can no longer include random slopes for the treatment predictor anywhere in our model. Each `hospital`, `doctor` and `patient` only experiences one of the two treatments, not both, so we have no variation between the treatments to estimate at any of these levels.\n\nSo, here, we would go back to our random intercepts only model.\n\n## Implicit vs explicit nesting\n\nThe different `health` datasets that have been explored above all have an important thing in common: the variables have been **implicitly nested**. Each new hospital, doctor and patient is given a unique identifier, to make it clear that doctors do not reoccur between hospitals, and patients do not reoccur between doctors. \n\nIn other words, all the information about the nesting is captured implicitly in the way that the data are coded.\n\nHowever, you might sometimes be working with a dataset that has not been coded this way. So how do you deal with those situations? You have a few options:\n\n- Recode your dataset so it is implicitly nested\n- Use explicit nesting in your `lme4` model formula\n- Use the `A/B` syntax in your `lme4` model formula\n\n### The Pastes dataset\n\nWe'll use another internal `lme4` dataset, the `Pastes` dataset, to show you what these three options look like in practice.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Pastes\")\n\nhead(Pastes)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n  strength batch cask sample\n1     62.8     A    a    A:a\n2     62.6     A    a    A:a\n3     60.1     A    b    A:b\n4     62.3     A    b    A:b\n5     62.7     A    c    A:c\n6     63.1     A    c    A:c\n```\n:::\n:::\n\n:::\n\nThis dataset is about measuring the strength of a chemical paste product, which is delivered in batches, each batch consisting of several casks. From ten random deliveries of the product, three casks were chosen at random (for a total of 30 casks). A sample was taken from each cask; from each sample, there were two assays, for a total of 60 assays.\n\nThere are four variables: \n\n- `strength`, paste strength, a continuous response variable; measured for each assay\n- `batch`, delivery batch from which the sample was chosen (10 groups, A to J)\n- `cask`, cask within the deliver batch from which the sample was chosen (3 groups, a to c)\n- `sample`, batch & cask combination (30 groups, A:a to J:c)\n\nThe experimental design, when drawn out, looks like this:\n\n![Pastes dataset design](images_mixed-effects/pastes_design.png){width=70%}\n\nAt first glance, this might look as if it's a four-level model: assays within samples within casks within deliveries. However, that's a bit of a overcomplication. There is only one `sample` collected per `cask`, meaning that we can really just think about assays being nested within casks directly.\n\nThere is no fixed predictor in this dataset, only a response variable. This means we won't include any fixed effects in the model - instead, we simply write `1` for the fixed portion of our model. We also won't have any random slopes, only random intercepts. \n\nIf we follow the same procedure we did above for the `health` example, we might try something like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste <- lmer(strength ~ 1 + (1|batch) + (1|cask), data = Pastes)\n\nsummary(lme_paste)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch) + (1 | cask)\n   Data: Pastes\n\nREML criterion at convergence: 301.5\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-1.49025 -0.90096 -0.01247  0.62911  1.82246 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n batch    (Intercept) 3.3639   1.8341  \n cask     (Intercept) 0.1487   0.3856  \n Residual             7.3060   2.7030  \nNumber of obs: 60, groups:  batch, 10; cask, 3\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.7125  6.7290   84.28 1.99e-11 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nSomething is wrong with this model. To spot it, we have to look carefully at the bottom of the random effects section, where it says `Number of obs` (short for observations). \n\n`lme4` has correctly identified that there are 10 delivery batches, and has fitted a set of 10 random intercepts for those batches - all good so far. However, R believes that we only have 3 casks, because the `cask` variable is implicitly nested, and so has only fitted a set of 3 random intercepts for that variable. \n\nBut this isn't what we want. There is no link between cask A in batch A, and cask A in batch D - they have no reason to be more similar to each other than they are to other casks. We actually have 30 unique casks, and would like for each of them to have its own random intercept.\n\n### Recoding for implicit nesting\n\nAs shown above, the formula `strength ~ 1 + (1|batch) + (1|cask)` does not produce the model we want, because we don't have implicit coding in the `cask` variable.\n\nSo, let's create a new variable that gives unique values to each of the casks in our dataset. We'll do this using the `mutate` function and the `:` syntax, which you might recognise from generating interaction terms in standard linear models.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nPastes <- Pastes %>% mutate(unique_cask = batch:cask)\n\nhead(Pastes)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n  strength batch cask sample unique_cask\n1     62.8     A    a    A:a         A:a\n2     62.6     A    a    A:a         A:a\n3     60.1     A    b    A:b         A:b\n4     62.3     A    b    A:b         A:b\n5     62.7     A    c    A:c         A:c\n6     63.1     A    c    A:c         A:c\n```\n:::\n:::\n\n:::\n\nThis generates 30 unique IDs, one for each of our unique casks. (We then have two observations of `strength` for each `unique_cask`.)\n\nNow, we can go ahead and fit our desired model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste_implicit <- lmer(strength ~ 1 + (1|batch) + (1|unique_cask),\n                  data = Pastes)\n\nsummary(lme_paste_implicit)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch) + (1 | unique_cask)\n   Data: Pastes\n\nREML criterion at convergence: 247\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.4798 -0.5156  0.0095  0.4720  1.3897 \n\nRandom effects:\n Groups      Name        Variance Std.Dev.\n unique_cask (Intercept) 8.434    2.9041  \n batch       (Intercept) 1.657    1.2874  \n Residual                0.678    0.8234  \nNumber of obs: 60, groups:  unique_cask, 30; batch, 10\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.6769  9.0000   88.72 1.49e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nNo error message this time, and it has correctly identified that there are 30 unique casks, from 10 different batches. We've solved the problem!\n\nIncidentally, and which you may have already noticed, the recoding that we did above also perfectly replicates the existing `sample` variable. This means we would get an identical result if we fitted the model `strength ~ 1 + (1|batch) + (1|sample)` instead. \n\n### Fitting a model with explicit nesting\n\nIf we're in a situation like the above where we don't have nice, neat implicitly coded variables, but we don't really want to spend loads of time recoding a bunch of variables, we can instead fit our model using explicit nesting in `lme4`.\n\nThat essentially means combining the recoding and model fitting steps, so that you don't have to save a new variable.\n\nFor the `Pastes` dataset, it would look like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste_explicit <- lmer(strength ~ 1 + (1|batch) + (1|batch:cask), data = Pastes)\n\nsummary(lme_paste_explicit)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch) + (1 | batch:cask)\n   Data: Pastes\n\nREML criterion at convergence: 247\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.4798 -0.5156  0.0095  0.4720  1.3897 \n\nRandom effects:\n Groups     Name        Variance Std.Dev.\n batch:cask (Intercept) 8.434    2.9041  \n batch      (Intercept) 1.657    1.2874  \n Residual               0.678    0.8234  \nNumber of obs: 60, groups:  batch:cask, 30; batch, 10\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.6769  9.0000   88.72 1.49e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nNotice that the output for this model and for the one right above are the same - it's because they are exactly equivalent to each other! We used `batch:cask` to create `unique_cask` earlier, so we've just directly inserted that code into our formula.\n\n### An alternative approach: the A/B syntax\n\nThere is another way to deal with nested random effects that haven't been implicitly coded into the dataset. It's not necessarily the way that we would recommend - recoding your variables and/or using explicit nesting has far less potential to trip you up and go wrong - but we'll introduce it briefly here, since it's something you're likely to see if you start working with mixed models a lot.\n\nWe could fit the same model to the `Pastes` dataset, and achieve a set of 30 intercepts for `cask` and 10 for `batch`, without making use of the `sample` variable or using the `A:B` notation.\n\nIt works like this: when random effect B is nested inside random effect A, you can simply write `A/B` on the right hand side of the `|`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste_shorthand <- lmer(strength ~ 1 + (1|batch/cask), data = Pastes)\n\nsummary(lme_paste_shorthand)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch/cask)\n   Data: Pastes\n\nREML criterion at convergence: 247\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.4798 -0.5156  0.0095  0.4720  1.3897 \n\nRandom effects:\n Groups     Name        Variance Std.Dev.\n cask:batch (Intercept) 8.434    2.9041  \n batch      (Intercept) 1.657    1.2874  \n Residual               0.678    0.8234  \nNumber of obs: 60, groups:  cask:batch, 30; batch, 10\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.6769  9.0000   88.72 1.49e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis gives you an identical model output to both of the other two models we've tried. That's because `(1|batch/cask)` is actually shorthand for `(1|batch) + (1|batch:cask)`, and in this dataset as we've seen above, `batch:cask` is the same as `sample` and `unique_cask`. In other words, we've fitted exactly the same model each time.\n\n### Which option is best?\n\n::: {.callout-note appearance=\"minimal\"}\nThe `A/B` syntax is popular, and you will see it used, because it's quick to write. But there are a couple of reasons that we would recommend you steer away from it:\n\n- **It has more potential to go wrong.** `A/B` is order-dependent, meaning that `A/B` $\\neq$ `B/A`, and if you get that wrong, your model won't work as intended. In contrast, explicit coding is order-invariant (`A:B + A` = `A + B:A`). Likewise, if you've implicitly coded your dataset, you can write your random effects in any order.\n\n- **It's harder to interpret.** Using implicit or explicit nesting gives you a separate `(1 + x|y)` structure in the formula for each clustering variable, which is more transparent when figuring out how many random effects you've fitted. Separate structures can also be more flexible, as you'll see in at least one example later in the course.\n:::\n\nOverall: implicit coding of your dataset is best, and it's best to do this implicit coding during data collection itself. It prevents mistakes, because the experimental design is clear from the data structure. It's also the easiest and most flexible in terms of coding in `lme4`.\n\nIn the absence of an implicitly coded dataset, we strongly recommend sticking with explicit coding rather than the shorthand syntax - yes, it requires more typing, but it's somewhat safer.\n\nAnd, no matter which method you choose, always check the model output to see that the number of groups per clustering variables matches what you expect to see.\n\n## Exercises\n\n### Exercise 1 - Cake\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll use the most delicious of the internal `lme4` datasets: `cake`.\n\nThis is a real dataset, taken from a thesis by Cook ([1983](https://www.sumsar.net/blog/source-of-the-cake-dataset/cook_1938_chocolate_cake.pdf)), and is all about measuring cake quality of chocolate cakes baked using different recipes at particular temperatures.\n\nCook's experiment worked as follows: for each recipe, she prepared 15 batches of cake mixture. Each batch was divided into 6 cakes, and each of those cakes were baked at one of the 6 temperatures.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"cake\")\n```\n:::\n\n:::\n\nThere are five variables in this dataset:\n\n- `recipe`, the exact recipe used for the cake (3 categories)\n- `replicate`, the batter batch number for each recipe (15 replicates per recipe)\n- `temperature`, a factor indicating which of 6 temperatures the cake was baked at\n- `temp`, the numeric value of the baking temperature\n- `angle`, the angle at which the cake broke (used as a measure of cake \"tenderness\")\n\nFor this exercise:\n\n1. Sketch a graphic/diagram that captures the experimental design\n2. Figure out what level of the dataset your variables of interest are varying at\n3. Consider how you might recode the dataset to reflect implicit nesting\n4. Fit and test at least one appropriate model\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\n#### Consider the experimental design\n\nThe first thing to do is to draw out a diagram that captures the experimental design. That might look something like this:\n\n![Experimental design for Cook's cakes](images_mixed-effects/cake_design.png){width=70%}\n\nWe've got a 3-level dataset here: individual cakes within batches (replicates) within recipes. There are two fixed effects of interest - `recipe` and `temperature`, both categorical.\n\nThe next thing to think about is whether the coding within the dataset accurately reflects what's going on.\n\nThere are 3 recipes, and 15 replicates of each recipe are mixed, for a total of 45 unique batches or mixtures. In the dataset as we've got it, the numbering has been repeated, so the nesting is not implicitly coded; but of course, replicate 1 for recipe A doesn't have anything more in common with replicate 1 for recipe C.\n\n#### Recode the dataset\n\nSo, let's code up a new variable that captures unique replicates, and we'll call it `batch`:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncake <- cake %>%\n  mutate(batch = recipe:replicate)\n```\n:::\n\n:::\n\nAs you should be able to see in your global environment, the new `batch` variable is a factor with 45 levels. \n\nEach batch is then split into individual cakes, which undergo one of the 6 `temperature` treatments, for a total of 270 measurements.\n\n#### Fit a model\n\nNow, we can try fitting a model. We know that we want `recipe` and `temperature` as fixed effects, and probably also their interaction, at least to start with. We know we want to treat `batch` as a random effect (`replicate` nested within `recipe`), so we'll include random intercepts.\n\nWe don't, however, want to treat `recipe` as a random effect itself. It only has three levels, so it wouldn't work well even if we wanted to. Plus, we're specifically interested in those three levels and the differences between them.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_cake <- lmer(angle ~ recipe*temperature + (1|batch), data = cake)\n\nsummary(lme_cake)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: angle ~ recipe * temperature + (1 | batch)\n   Data: cake\n\nREML criterion at convergence: 1638.6\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.64661 -0.61082 -0.05207  0.56985  2.75374 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n batch    (Intercept) 41.84    6.468   \n Residual             20.47    4.524   \nNumber of obs: 270, groups:  batch, 45\n\nFixed effects:\n                       Estimate Std. Error        df t value Pr(>|t|)    \n(Intercept)            33.12222    1.73683  42.00000  19.070  < 2e-16 ***\nrecipeB                -1.47778    2.45625  42.00000  -0.602  0.55065    \nrecipeC                -1.52222    2.45625  42.00000  -0.620  0.53878    \ntemperature.L           6.43033    1.16822 210.00000   5.504 1.07e-07 ***\ntemperature.Q          -0.71285    1.16822 210.00000  -0.610  0.54239    \ntemperature.C          -2.32551    1.16822 210.00000  -1.991  0.04782 *  \ntemperature^4          -3.35128    1.16822 210.00000  -2.869  0.00454 ** \ntemperature^5          -0.15119    1.16822 210.00000  -0.129  0.89715    \nrecipeB:temperature.L   0.45419    1.65211 210.00000   0.275  0.78365    \nrecipeC:temperature.L   0.08765    1.65211 210.00000   0.053  0.95774    \nrecipeB:temperature.Q  -0.23277    1.65211 210.00000  -0.141  0.88809    \nrecipeC:temperature.Q   1.21475    1.65211 210.00000   0.735  0.46299    \nrecipeB:temperature.C   2.69322    1.65211 210.00000   1.630  0.10456    \nrecipeC:temperature.C   2.63856    1.65211 210.00000   1.597  0.11175    \nrecipeB:temperature^4   3.02372    1.65211 210.00000   1.830  0.06863 .  \nrecipeC:temperature^4   3.13711    1.65211 210.00000   1.899  0.05895 .  \nrecipeB:temperature^5  -0.66354    1.65211 210.00000  -0.402  0.68836    \nrecipeC:temperature^5  -1.62525    1.65211 210.00000  -0.984  0.32637    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n\nCorrelation matrix not shown by default, as p = 18 > 12.\nUse print(x, correlation=TRUE)  or\n    vcov(x)        if you need it\n```\n:::\n:::\n\n:::\n\n#### Alternative models\n\nIf you want to do a bit of significance testing, you can try a few other versions of the model with different structures:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_cake2 <- lmer(angle ~ recipe + temperature + (1|batch), data = cake)\nlme_cake3 <- lmer(angle ~ recipe + (1|batch), data = cake)\nlme_cake4 <- lmer(angle ~ temperature + (1|batch), data = cake)\nlm_cake <- lm(angle ~ recipe*temperature, data = cake)\n\nanova(lme_cake, lm_cake) # random effects dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlm_cake: angle ~ recipe * temperature\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlm_cake    19 1901.3 1969.6 -931.63   1863.3                         \nlme_cake   20 1719.0 1791.0 -839.53   1679.0 184.21  1  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\nanova(lme_cake, lme_cake2) # recipe:temperature dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlme_cake2: angle ~ recipe + temperature + (1 | batch)\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n          npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)\nlme_cake2   10 1709.6 1745.6 -844.79   1689.6                    \nlme_cake    20 1719.0 1791.0 -839.53   1679.0 10.53 10     0.3953\n```\n:::\n\n```{.r .cell-code}\nanova(lme_cake, lme_cake3) # temperature & recipe:temperature dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlme_cake3: angle ~ recipe + (1 | batch)\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlme_cake3    5 1785.7 1803.7 -887.84   1775.7                         \nlme_cake    20 1719.0 1791.0 -839.53   1679.0 96.636 15  5.642e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\nanova(lme_cake, lme_cake4) # recipe & recipe:temperature dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlme_cake4: angle ~ temperature + (1 | batch)\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n          npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)\nlme_cake4    8 1706.1 1734.9 -845.06   1690.1                    \nlme_cake    20 1719.0 1791.0 -839.53   1679.0 11.06 12     0.5238\n```\n:::\n:::\n\n:::\n\nWe see that when we drop the random intercepts for `batch`, and when we drop the `temperature` predictor, our chi-square values are significant. This indicates that these predictors are important. But we can drop `recipe` and `recipe:temperature` without a particularly big change in deviance.\n\n(This is borne out somewhat if you've used the `lmerTest` package to perform degrees of freedom approximation and extract p-values as part of the `lme_cake` model summary.)\n\nSo, our final model is probably `lme_cake4 = angle ~ temperature + (1|batch)`. \n\n#### Check assumptions\n\nFor completeness, we'll check the assumptions of `lme_cake4` and visualise it for the sake of aiding our interpretation.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_cake4, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_cake4, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-17-2.png){width=672}\n:::\n:::\n\n:::\n\nMost of the assumptions look okay, with the exception of the normal Q-Q plot for the random intercepts. The set of intercepts doesn't really look like it's nicely normally distributed here. Maybe a more complicated mixed effects model (something beyond the linear type we're going with here) would help. Or, maybe this just means we should be a little less decisive in our overall conclusions.\n\n#### Visualise the model\n\nLast but not least, we can visualise the model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_cake4), aes(x = temperature, y = angle)) +\n  geom_point(alpha = 0.7) +\n  geom_line(aes(y = .fitted, group = batch), alpha = 0.5)\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-18-1.png){width=672}\n:::\n:::\n\n:::\n\nOverall, `angle` increases with `temperature`. (From what I understand of reading the thesis, this is a good thing from the perspective of the cake quality, as it suggests the cake is more tender. Scientific *and* delicious.)\n\nWe can see visually that the `recipe` and `recipe:temperature` terms don't have much explanatory power by visualising the full model (commenting out the `facet_wrap` may also help you to see this):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_cake), aes(x = temperature, y = angle, colour = recipe)) +\n  facet_wrap(~ recipe) +\n  geom_point(alpha = 0.7) +\n  geom_line(aes(y = .fitted, group = batch))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-19-1.png){width=672}\n:::\n:::\n\n:::\n\n\n\n:::\n\n::: {.callout-tip appearance=\"minimal\"}\n#### Follow-up questions\n\n\n{{< level 2 >}}\n\n\n\nIf you want to think a bit harder about this dataset, consider these additional questions. Feel free to chat about them with a neighbour or with a trainer.\n\n- Why doesn't it work if you try to fit random slopes for `temperature` on `batch`? Have a look at the warning message that R gives you in this situation.\n- What happens if you use the numerical `temp` variable instead of the categorical `temperature`? Does it change your conclusions? Why might you prefer to use the numerical/continuous version?\n- Could `temperature` be treated as a random effect, under certain interpretations of the original research question? Is it possible or sensible to do that with the current dataset?\n:::\n\nFor more information on the very best way to bake a chocolate cake (and a lovely demonstration at the end about the dangers of extrapolating from a linear model), [this blog post](https://www.sumsar.net/blog/source-of-the-cake-dataset/) is a nice source. It's written by a data scientist who was so curious about the quirky `cake` dataset that he contacted Iowa State University, who helped him unearth Cook's original thesis.\n\n### Exercise 2 - Parallel fibres\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll be using a neurohistology dataset that focuses on a particular type of neuron found in the cerebellum, known as a parallel fibre. Parallel fibres are found in the uppermost layer of cerebellar cortex, and are known for being long; this experiment was designed to test whether the depth at which the fibre was found, had a bearing on its length.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nparallel <- read_csv(\"data/parallel.csv\")\n```\n:::\n\n:::\n\nTo measure the length of the fibres (which are <10mm long - big for a neuron!), slices were taken from the cerebella of six cats, at different depths. The depth of each slice was recorded. These slices were then stained, so that individual parallel fibres could be identified and measured.\n\n![An example of a stained slice from cerebellar cortex](images_mixed-effects/cerebellum_histology.jpg){width=40%}\n\nThe dataset contains five variables:\n\n- `length` of the fibre (in micrometres)\n- `depth` of the slice (in micrometres)\n- `fibre`, individual IDs for all of the fibres\n- `slice` ID number (maximum 10 slices per cat)\n- `cat` ID number (1 through 6)\n\nFor this exercise:\n\n1. Sketch a graphic/diagram that captures the experimental design\n2. Determine whether the dataset requires recoding or explicit nesting\n3. Fit and test at least one appropriate model\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\n#### Visualise the design\n\nThis is a nested design with three levels: `fibre` within `slice` within `cat`. But the fixed predictor `depth` varies at level 2, between slices (not between fibres).\n\n![Experimental design](images_mixed-effects/neurohist_design.png){width=70%}\n\n#### Recoding\n\nIf we look at the structure of the dataset, we can see that the numbering for the `slice` variable starts again for each `cat` at 1, which is not what we want.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nparallel %>% slice(1:8, 46)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 9 × 5\n  fibre length depth slice   cat\n  <dbl>  <dbl> <dbl> <dbl> <dbl>\n1     1   5780   260     1     1\n2     2   5730   260     1     1\n3     3   5790   260     1     1\n4     4   5860   260     1     1\n5     5   5690   260     1     1\n6     6   5940   260     1     1\n7     7   5950   260     1     1\n8     8   5940   290     2     1\n9    46   4670   300     1     2\n```\n:::\n:::\n\n:::\n\nSo, we need to recode a new variable, or be prepared to use explicit nesting in our model formula. Note that for this recoding to work, we also need to ask R to treat `slice` and `cat` as factors rather than numeric variables.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nparallel <- parallel %>%\n  mutate(cat = as.factor(cat)) %>%\n  mutate(slice = as.factor(slice)) %>%\n  mutate(unique_slice = slice:cat)\n```\n:::\n\n:::\n\n#### Fit a model\n\nThe full model that we could fit to these data contains three random effects: random intercepts for `unique_slice` (or `slice:cat` if you're explicitly coding), random intercepts for `cat`, and random slopes for `depth` on `cat`.\n\n(Since `depth` doesn't vary within `slice`, i.e., each `slice` has only one `depth`, we can't fit random slopes at level 2.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_parallel <- lmer(length ~ depth + (1|slice:cat) + (1 + depth|cat), \n                     data = parallel)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nModel failed to converge with max|grad| = 1.2981 (tol = 0.002, component 1)\n```\n:::\n:::\n\n:::\n\nYou may notice that you get an error - the model fails to converge. There are a couple of fixes we could try that involve tweaking the settings in the estimation procedure (e.g., increasing the maximum number of iterations allowed).\n\nHowever, most errors like this just mean that we're being too ambitious. So, the approach we'll take here is to make the model simpler. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_parallel_int <- lmer(length ~ depth + (1|slice:cat) + (1|cat), \n                         data = parallel)\n```\n:::\n\n:::\n\n#### Check the assumptions\n\nNext, we check the assumptions of our intercepts-only nested model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_parallel_int, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-25-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_parallel_int, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-25-2.png){width=672}\n:::\n:::\n\n:::\n\nNot bad at all. There are no obvious errors cropping up in these plots.\n\n#### Visualise the model\n\nLast but not least, we should have a look at our model predictions visually.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_parallel_int), aes(x = depth, y = length, colour = cat)) +\n  geom_point(alpha = 0.6) +\n  geom_line(aes(y = .fitted, group = cat))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-26-1.png){width=672}\n:::\n:::\n\n:::\n\nDespite the fact that `depth` is a continuous variable, this plot still has jagged, rather than straight, lines of best fit. This is because the plot is also taking into account the multiple sets of random intercepts for `slice` that are contained within each `cat` cluster.\n\nWe can, however, extract just the set of intercepts by `cat`, and with a bit more fuss, use this to add lines to the plot with `geom_abline`:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\n# use the coef function to extract the coefficients\ncat_coefs <- coef(lme_parallel_int)$cat\n\n# use geom_abline to add individual lines for each cat\nggplot(augment(lme_parallel_int), aes(x = depth, y = length, colour = cat)) +\n  geom_point(alpha = 0.6) +\n  geom_abline(intercept = cat_coefs[1,1], slope = cat_coefs[1,2]) +\n  geom_abline(intercept = cat_coefs[2,1], slope = cat_coefs[2,2]) +\n  geom_abline(intercept = cat_coefs[3,1], slope = cat_coefs[3,2]) +\n  geom_abline(intercept = cat_coefs[4,1], slope = cat_coefs[4,2]) +\n  geom_abline(intercept = cat_coefs[5,1], slope = cat_coefs[5,2]) +\n  geom_abline(intercept = cat_coefs[6,1], slope = cat_coefs[6,2])\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-27-1.png){width=672}\n:::\n:::\n\n:::\n\n#### Is this a good model?\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(parallel, aes(x = depth, y = length, colour = slice)) +\n  facet_wrap(~cat) +\n  geom_point(alpha = 0.6)\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-28-1.png){width=672}\n:::\n:::\n\n:::\n\nIf we plot the data faceted by `cat`, it suggests that the relationship between `depth` and `length` varies between `cat`. But we were forced to drop the random slopes for `depth|cat` due to lack of model convergence.\n\nOur diagnostic plots look pretty good for our simpler, intercepts-only model, but the raw data indicate we might be missing something. Do you still trust the model? If no, what might a researcher do to improve this analysis?\n\n:::\n\n::: {.callout-tip appearance=\"minimal\"}\n#### Optional follow-up question: notation\n\n\n{{< level 3 >}}\n\n\n\nThink back to the brief introduction to linear mixed effects models notation given in section 5.5.1 of the course materials.\n\nWhat would the equation of a three level model fitted to the `parallel` dataset look like?\n\nHint: you'll need more subscript letters than you did for a two-level model!\n\n::: {.callout-note collapse=\"true\"}\n#### Answer: three-level intercepts-only\n\nE.g., `length ~ depth + (1|slice:cat) + (1|cat)`\n\nLevel 1:\n\n$$\ny_{ijk} = \\beta_{0jk} + \\beta_{1}x_{1ijk} + \\epsilon_{ijk}\n$$\n\nLevel 2: \n\n$$\n\\beta_{0jk} = \\delta_{00k} + U_{0jk}\n$$\n\nLevel 3: \n\n$$\n\\delta_{00k} = \\gamma_{000} + V_{00k}\n$$\n\nwhere,\n\n$$\n\\left( \\begin{array}{c} U_{0jk} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} \\end{array} \\right)\n$$\n\nand,\n\n$$\n\\left( \\begin{array}{c} V_{00k} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} \\end{array} \\right)\n$$\n\n:::\n\n::: {.callout-note collapse=\"true\"}\n#### Answer: three-level intercepts & slopes\n\nE.g., `length ~ depth + (1|slice:cat) + (1 + depth|cat)`\n\nLevel 1:\n\n$$\ny_{ijk} = \\beta_{0jk} + \\beta_{1k}x_{1ijk} + \\epsilon_{ijk}\n$$\n\nLevel 2: \n\n$$\n\\beta_{0jk} = \\beta_{00k} + U_{0jk}\n$$\n\nLevel 3: \n\n$$\n\\beta_{00k} = \\gamma_{000} + V_{00k}\n$$\n$$\n\\beta_{1k} = \\gamma_{100} + V_{10k}\n$$\n\nWhere,\n\n$$\n\\left( \\begin{array}{c} U_{0jk} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} \\end{array} \\right)\n$$\n\nand,\n\n$$\n\\left( \\begin{array}{c} V_{00k} \\\\ V_{10k} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\\\ 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} & \\rho_{01} \\\\ \\rho_{01} &  \\tau^2_{10} \\end{array} \\right)\n$$\n\n:::\n\n:::\n\n\n## Summary\n\nSometimes, a dataset contains multiple clustering variables. When one of those clustering variables is nested inside the other, we can model this effectively by estimating random effects at multiple levels. \n\nAdding additional levels can create some complications, e.g., determining which level of the dataset your predictor variables are varying at. But it can also allow us to deal with real-life hierarchical data structures, which are common in research.\n\n::: {.callout-tip}\n#### Key points\n- Random effect B is nested inside random effect A, if each category of B occurs uniquely within only one category of A\n- It's important to figure out what level of the hierarchy or model a predictor variable is varying at, to determine where random slopes are appropriate\n- Nested random effects can be implicitly or explicitly coded in a dataframe, which determines how the model should be specified in `lme4`\n:::\n\n",
+    "markdown": "---\ntitle: \"Nested random effects\"\noutput: html_document\n---\n\n::: {.cell}\n\n:::\n\n\nMixed effects models are also sometimes referred to as \"hierarchical\" or \"multi-level\" models. So far in these materials, we've only fitted two-level models, containing a single clustering variable or effect. Sometimes, however, there are random effects nested *inside* others.\n\n## What is a nested random effect?\n\nOnce we are at the stage of having multiple variables in our sample that create clusters or groups, it becomes relevant to consider the relationship that those clustering variables have to one another, to ensure that we're fitting a model that properly represents our experimental design.\n\nWe describe factor B as being nested inside factor A, if each group/category of B only occurs within one group/category of factor A.\n\nFor instance, data on academic performance may be structured as children grouped within classrooms, with classrooms grouped within schools. A histology experiment might measure individual cells grouped within slices, with slices grouped within larger samples. Air pollution data might be measured at observation stations grouped within a particular city, with multiple cities per country.\n\n## Fitting a three-level model\n\nAnother classic example of nested random effects that would prompt a three-level model can be found in a clinical setting: within each hospital, there are multiple doctors, each of whom treats multiple patients. (Here, we will assume that each doctor only works at a single hospital, and that each patient is only treated by a single doctor.)\n\nHere's an image of how that experimental design looks. Level 1 is the patients, level 2 is the doctors, and level 3 is the hospitals. \n\nThis is, of course, a simplified version - we would hope that there are more than two hospitals, four doctors and eight patients in the full sample!\n\n![Experimental design](images_mixed-effects/nested-patients1.png){width=70%}\n\nWe have a single fixed predictor of interest `treatment` (for which there are two possible treatments, A or B), and some continuous response variable `outcome`.\n\nWhat model would we fit to these data? Well, it gets a touch more complex now that we have multiple levels in this dataset.\n\n### A three-level random intercepts model\n\nLet's put random slopes to one side, since they take a bit more thought, and think about how we would fit just some random intercepts for now.\n\nIt would be appropriate to fit two sets of random intercepts in this model, one for each set of clusters we have. In this case, that means a set of intercepts for the doctors, and a set of intercepts for the hospital.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nhealth <- read_csv(\"data/health.csv\")\n```\n:::\n\n:::\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_health_intercepts <- lmer(outcome ~ treatment + (1|doctor) + (1|hospital),\n                   data = health)\n\nsummary(lme_health_intercepts)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: outcome ~ treatment + (1 | doctor) + (1 | hospital)\n   Data: health\n\nREML criterion at convergence: 1848.5\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.56989 -0.66086  0.06162  0.67602  2.84690 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n doctor   (Intercept)  2.1416  1.4634  \n hospital (Intercept)  0.1688  0.4108  \n Residual             26.3425  5.1325  \nNumber of obs: 300, groups:  doctor, 30; hospital, 5\n\nFixed effects:\n                 Estimate Std. Error       df t value Pr(>|t|)    \n(Intercept)       26.6155     0.5299   8.4431   50.23 9.34e-12 ***\ntreatmentsurgery   6.2396     0.5926 269.0000   10.53  < 2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ntrtmntsrgry -0.559\n```\n:::\n:::\n\n:::\n\nThis produces a model with two random effects, namely, two sets of random intercepts.\n\n### Where to include random slopes?\n\nDeciding what level(s) we want to fit random slopes at, requires us to think about what level of our hierarchy we've applied our `Treatment` variable at. We'll get to that in a moment.\n\n#### Predictor varies at level 1\n\nLet's start by imagining the following scenario: the `Treatment` variable is varying at our lowest level. Each patient receives only one type of treatment (A or B), but both treatment types are represented \"within\" each doctor and within each hospital:\n\n![Scenario 1: predictor varies at level 1 (between patients, within doctors)](images_mixed-effects/nested-patients2.png){width=70%}\n\nAs a result, it would be inappropriate to ask `lme4` to fit random slopes for the `treatment` variable at the patient level. Instead, the \"full\" model (i.e., a model containing all of the possible fixed and random effects) would be the following:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_health_slopes <- lmer(outcome ~ treatment + (1 + treatment|doctor) + \n                      (1 + treatment|hospital), data = health)\n\nsummary(lme_health_slopes)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: outcome ~ treatment + (1 + treatment | doctor) + (1 + treatment |  \n    hospital)\n   Data: health\n\nREML criterion at convergence: 1834.1\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.38341 -0.60702 -0.02763  0.71956  2.88951 \n\nRandom effects:\n Groups   Name             Variance Std.Dev. Corr \n doctor   (Intercept)       7.8846  2.8080        \n          treatmentsurgery  7.3157  2.7047   -0.96\n hospital (Intercept)       0.5085  0.7131        \n          treatmentsurgery  3.5595  1.8867   -0.91\n Residual                  23.5775  4.8557        \nNumber of obs: 300, groups:  doctor, 30; hospital, 5\n\nFixed effects:\n                 Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)       26.6155     0.7223  4.0083  36.849 3.17e-06 ***\ntreatmentsurgery   6.2396     1.1270  3.9992   5.537  0.00521 ** \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ntrtmntsrgry -0.794\n```\n:::\n:::\n\n:::\n\nThis produces a model with four sets of random effects: two sets of random intercepts, and two sets of random slopes.\n\n#### Predictor varies at level 2\n\nLet's now imagine a (perhaps more realistic) scenario. Each doctor is in fact a specialist in a certain type of treatment, but cannot deliver both. For this, we will need to read in the second version of our dataset.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nhealth2 <- read_csv(\"data/health2.csv\")\n```\n:::\n\n:::\n\nIf you look closely at the dataset, you can see that `treatment` does not vary within `doctor`; instead, it only varies within `hospital`. \n\n![Scenario 2: predictor varies at level 2 (between doctors, within hospitals)](images_mixed-effects/nested-patients3.png){width=70%}\n\nThis means we cannot fit random slopes for treatment at the second level any more. We have to drop our random slopes for `treatment` by `doctor`, like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_health_slopes2 <- lmer(outcome ~ treatment + (1|doctor) + \n                      (1 + treatment|hospital), data = health2)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nboundary (singular) fit: see help('isSingular')\n```\n:::\n\n```{.r .cell-code}\nsummary(lme_health_slopes2)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: outcome ~ treatment + (1 | doctor) + (1 + treatment | hospital)\n   Data: health2\n\nREML criterion at convergence: 1845.7\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.46342 -0.69900 -0.05043  0.69559  3.09601 \n\nRandom effects:\n Groups   Name             Variance Std.Dev. Corr \n doctor   (Intercept)       6.318   2.514         \n hospital (Intercept)       1.372   1.171         \n          treatmentsurgery  3.635   1.907    -1.00\n Residual                  24.417   4.941         \nNumber of obs: 300, groups:  doctor, 30; hospital, 5\n\nFixed effects:\n                 Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)       25.7019     0.9265  4.3745  27.740 4.38e-06 ***\ntreatmentsurgery   4.5061     1.3766  4.0456   3.273   0.0302 *  \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n            (Intr)\ntrtmntsrgry -0.808\noptimizer (nloptwrap) convergence code: 0 (OK)\nboundary (singular) fit: see help('isSingular')\n```\n:::\n:::\n\n:::\n\nThis gives us three sets of random effects, as opposed to four.\n\n#### Predictor varies at level 3\n\nFinally, let's imagine a scenario where each hospital is only equipped to offer one type of treatment (hopefully, not a realistic scenario!). Here, all doctors and patients within each hospital use exactly the same method. \n\n![Scenario 3: predictor varies at level 3 (between hospitals)](images_mixed-effects/nested-patients4.png){width=70%}\n\nAt this stage, we can no longer include random slopes for the treatment predictor anywhere in our model. Each `hospital`, `doctor` and `patient` only experiences one of the two treatments, not both, so we have no variation between the treatments to estimate at any of these levels.\n\nSo, here, we would go back to our random intercepts only model.\n\n## Implicit vs explicit nesting\n\nThe different `health` datasets that have been explored above all have an important thing in common: the variables have been **implicitly nested**. Each new hospital, doctor and patient is given a unique identifier, to make it clear that doctors do not reoccur between hospitals, and patients do not reoccur between doctors. \n\nIn other words, all the information about the nesting is captured implicitly in the way that the data are coded.\n\nHowever, you might sometimes be working with a dataset that has not been coded this way. So how do you deal with those situations? You have a few options:\n\n- Recode your dataset so it is implicitly nested\n- Use explicit nesting in your `lme4` model formula\n- Use the `A/B` syntax in your `lme4` model formula\n\n### The Pastes dataset\n\nWe'll use another internal `lme4` dataset, the `Pastes` dataset, to show you what these three options look like in practice.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"Pastes\")\n\nhead(Pastes)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n  strength batch cask sample\n1     62.8     A    a    A:a\n2     62.6     A    a    A:a\n3     60.1     A    b    A:b\n4     62.3     A    b    A:b\n5     62.7     A    c    A:c\n6     63.1     A    c    A:c\n```\n:::\n:::\n\n:::\n\nThis dataset is about measuring the strength of a chemical paste product, which is delivered in batches, each batch consisting of several casks. From ten random deliveries of the product, three casks were chosen at random (for a total of 30 casks). A sample was taken from each cask; from each sample, there were two assays, for a total of 60 assays.\n\nThere are four variables: \n\n- `strength`, paste strength, a continuous response variable; measured for each assay\n- `batch`, delivery batch from which the sample was chosen (10 groups, A to J)\n- `cask`, cask within the deliver batch from which the sample was chosen (3 groups, a to c)\n- `sample`, batch & cask combination (30 groups, A:a to J:c)\n\nThe experimental design, when drawn out, looks like this:\n\n![Pastes dataset design](images_mixed-effects/pastes_design.png){width=70%}\n\nAt first glance, this might look as if it's a four-level model: assays within samples within casks within deliveries. However, that's a bit of a overcomplication. There is only one `sample` collected per `cask`, meaning that we can really just think about assays being nested within casks directly.\n\nThere is no fixed predictor in this dataset, only a response variable. This means we won't include any fixed effects in the model - instead, we simply write `1` for the fixed portion of our model. We also won't have any random slopes, only random intercepts. \n\nIf we follow the same procedure we did above for the `health` example, we might try something like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste <- lmer(strength ~ 1 + (1|batch) + (1|cask), data = Pastes)\n\nsummary(lme_paste)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch) + (1 | cask)\n   Data: Pastes\n\nREML criterion at convergence: 301.5\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-1.49025 -0.90096 -0.01247  0.62911  1.82246 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n batch    (Intercept) 3.3639   1.8341  \n cask     (Intercept) 0.1487   0.3856  \n Residual             7.3060   2.7030  \nNumber of obs: 60, groups:  batch, 10; cask, 3\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.7125  6.7290   84.28 1.99e-11 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nSomething is wrong with this model. To spot it, we have to look carefully at the bottom of the random effects section, where it says `Number of obs` (short for observations). \n\n`lme4` has correctly identified that there are 10 delivery batches, and has fitted a set of 10 random intercepts for those batches - all good so far. However, R believes that we only have 3 casks, because the `cask` variable is implicitly nested, and so has only fitted a set of 3 random intercepts for that variable. \n\nBut this isn't what we want. There is no link between cask A in batch A, and cask A in batch D - they have no reason to be more similar to each other than they are to other casks. We actually have 30 unique casks, and would like for each of them to have its own random intercept.\n\n### Recoding for implicit nesting\n\nAs shown above, the formula `strength ~ 1 + (1|batch) + (1|cask)` does not produce the model we want, because we don't have implicit coding in the `cask` variable.\n\nSo, let's create a new variable that gives unique values to each of the casks in our dataset. We'll do this using the `mutate` function and the `:` syntax, which you might recognise from generating interaction terms in standard linear models.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nPastes <- Pastes %>% mutate(unique_cask = batch:cask)\n\nhead(Pastes)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n  strength batch cask sample unique_cask\n1     62.8     A    a    A:a         A:a\n2     62.6     A    a    A:a         A:a\n3     60.1     A    b    A:b         A:b\n4     62.3     A    b    A:b         A:b\n5     62.7     A    c    A:c         A:c\n6     63.1     A    c    A:c         A:c\n```\n:::\n:::\n\n:::\n\nThis generates 30 unique IDs, one for each of our unique casks. (We then have two observations of `strength` for each `unique_cask`.)\n\nNow, we can go ahead and fit our desired model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste_implicit <- lmer(strength ~ 1 + (1|batch) + (1|unique_cask),\n                  data = Pastes)\n\nsummary(lme_paste_implicit)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch) + (1 | unique_cask)\n   Data: Pastes\n\nREML criterion at convergence: 247\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.4798 -0.5156  0.0095  0.4720  1.3897 \n\nRandom effects:\n Groups      Name        Variance Std.Dev.\n unique_cask (Intercept) 8.434    2.9041  \n batch       (Intercept) 1.657    1.2874  \n Residual                0.678    0.8234  \nNumber of obs: 60, groups:  unique_cask, 30; batch, 10\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.6769  9.0000   88.72 1.49e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nNo error message this time, and it has correctly identified that there are 30 unique casks, from 10 different batches. We've solved the problem!\n\nIncidentally, and which you may have already noticed, the recoding that we did above also perfectly replicates the existing `sample` variable. This means we would get an identical result if we fitted the model `strength ~ 1 + (1|batch) + (1|sample)` instead. \n\n### Fitting a model with explicit nesting\n\nIf we're in a situation like the above where we don't have nice, neat implicitly coded variables, but we don't really want to spend loads of time recoding a bunch of variables, we can instead fit our model using explicit nesting in `lme4`.\n\nThat essentially means combining the recoding and model fitting steps, so that you don't have to save a new variable.\n\nFor the `Pastes` dataset, it would look like this:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste_explicit <- lmer(strength ~ 1 + (1|batch) + (1|batch:cask), data = Pastes)\n\nsummary(lme_paste_explicit)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch) + (1 | batch:cask)\n   Data: Pastes\n\nREML criterion at convergence: 247\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.4798 -0.5156  0.0095  0.4720  1.3897 \n\nRandom effects:\n Groups     Name        Variance Std.Dev.\n batch:cask (Intercept) 8.434    2.9041  \n batch      (Intercept) 1.657    1.2874  \n Residual               0.678    0.8234  \nNumber of obs: 60, groups:  batch:cask, 30; batch, 10\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.6769  9.0000   88.72 1.49e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nNotice that the output for this model and for the one right above are the same - it's because they are exactly equivalent to each other! We used `batch:cask` to create `unique_cask` earlier, so we've just directly inserted that code into our formula.\n\n### An alternative approach: the A/B syntax\n\nThere is another way to deal with nested random effects that haven't been implicitly coded into the dataset. It's not necessarily the way that we would recommend - recoding your variables and/or using explicit nesting has far less potential to trip you up and go wrong - but we'll introduce it briefly here, since it's something you're likely to see if you start working with mixed models a lot.\n\nWe could fit the same model to the `Pastes` dataset, and achieve a set of 30 intercepts for `cask` and 10 for `batch`, without making use of the `sample` variable or using the `A:B` notation.\n\nIt works like this: when random effect B is nested inside random effect A, you can simply write `A/B` on the right hand side of the `|`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_paste_shorthand <- lmer(strength ~ 1 + (1|batch/cask), data = Pastes)\n\nsummary(lme_paste_shorthand)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: strength ~ 1 + (1 | batch/cask)\n   Data: Pastes\n\nREML criterion at convergence: 247\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-1.4798 -0.5156  0.0095  0.4720  1.3897 \n\nRandom effects:\n Groups     Name        Variance Std.Dev.\n cask:batch (Intercept) 8.434    2.9041  \n batch      (Intercept) 1.657    1.2874  \n Residual               0.678    0.8234  \nNumber of obs: 60, groups:  cask:batch, 30; batch, 10\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  60.0533     0.6769  9.0000   88.72 1.49e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis gives you an identical model output to both of the other two models we've tried. That's because `(1|batch/cask)` is actually shorthand for `(1|batch) + (1|batch:cask)`, and in this dataset as we've seen above, `batch:cask` is the same as `sample` and `unique_cask`. In other words, we've fitted exactly the same model each time.\n\n### Which option is best?\n\n::: {.callout-note appearance=\"minimal\"}\nThe `A/B` syntax is popular, and you will see it used, because it's quick to write. But there are a couple of reasons that we would recommend you steer away from it:\n\n- **It has more potential to go wrong.** `A/B` is order-dependent, meaning that `A/B` $\\neq$ `B/A`, and if you get that wrong, your model won't work as intended. In contrast, explicit coding is order-invariant (`A:B + A` = `A + B:A`). Likewise, if you've implicitly coded your dataset, you can write your random effects in any order.\n\n- **It's harder to interpret.** Using implicit or explicit nesting gives you a separate `(1 + x|y)` structure in the formula for each clustering variable, which is more transparent when figuring out how many random effects you've fitted. Separate structures can also be more flexible, as you'll see in at least one example later in the course.\n:::\n\nOverall: implicit coding of your dataset is best, and it's best to do this implicit coding during data collection itself. It prevents mistakes, because the experimental design is clear from the data structure. It's also the easiest and most flexible in terms of coding in `lme4`.\n\nIn the absence of an implicitly coded dataset, we strongly recommend sticking with explicit coding rather than the shorthand syntax - yes, it requires more typing, but it's somewhat safer.\n\nAnd, no matter which method you choose, always check the model output to see that the number of groups per clustering variables matches what you expect to see.\n\n## Exercises\n\n### Cake {#sec-exr_cake}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll use the most delicious of the internal `lme4` datasets: `cake`.\n\nThis is a real dataset, taken from a thesis by Cook ([1983](https://www.sumsar.net/blog/source-of-the-cake-dataset/cook_1938_chocolate_cake.pdf)), and is all about measuring cake quality of chocolate cakes baked using different recipes at particular temperatures.\n\nCook's experiment worked as follows: for each recipe, she prepared 15 batches of cake mixture. Each batch was divided into 6 cakes, and each of those cakes were baked at one of the 6 temperatures.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"cake\")\n```\n:::\n\n:::\n\nThere are five variables in this dataset:\n\n- `recipe`, the exact recipe used for the cake (3 categories)\n- `replicate`, the batter batch number for each recipe (15 replicates per recipe)\n- `temperature`, a factor indicating which of 6 temperatures the cake was baked at\n- `temp`, the numeric value of the baking temperature\n- `angle`, the angle at which the cake broke (used as a measure of cake \"tenderness\")\n\nFor this exercise:\n\n1. Sketch a graphic/diagram that captures the experimental design\n2. Figure out what level of the dataset your variables of interest are varying at\n3. Consider how you might recode the dataset to reflect implicit nesting\n4. Fit and test at least one appropriate model\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\n#### Consider the experimental design\n\nThe first thing to do is to draw out a diagram that captures the experimental design. That might look something like this:\n\n![Experimental design for Cook's cakes](images_mixed-effects/cake_design.png){width=70%}\n\nWe've got a 3-level dataset here: individual cakes within batches (replicates) within recipes. There are two fixed effects of interest - `recipe` and `temperature`, both categorical.\n\nThe next thing to think about is whether the coding within the dataset accurately reflects what's going on.\n\nThere are 3 recipes, and 15 replicates of each recipe are mixed, for a total of 45 unique batches or mixtures. In the dataset as we've got it, the numbering has been repeated, so the nesting is not implicitly coded; but of course, replicate 1 for recipe A doesn't have anything more in common with replicate 1 for recipe C.\n\n#### Recode the dataset\n\nSo, let's code up a new variable that captures unique replicates, and we'll call it `batch`:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncake <- cake %>%\n  mutate(batch = recipe:replicate)\n```\n:::\n\n:::\n\nAs you should be able to see in your global environment, the new `batch` variable is a factor with 45 levels. \n\nEach batch is then split into individual cakes, which undergo one of the 6 `temperature` treatments, for a total of 270 measurements.\n\n#### Fit a model\n\nNow, we can try fitting a model. We know that we want `recipe` and `temperature` as fixed effects, and probably also their interaction, at least to start with. We know we want to treat `batch` as a random effect (`replicate` nested within `recipe`), so we'll include random intercepts.\n\nWe don't, however, want to treat `recipe` as a random effect itself. It only has three levels, so it wouldn't work well even if we wanted to. Plus, we're specifically interested in those three levels and the differences between them.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_cake <- lmer(angle ~ recipe*temperature + (1|batch), data = cake)\n\nsummary(lme_cake)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: angle ~ recipe * temperature + (1 | batch)\n   Data: cake\n\nREML criterion at convergence: 1638.6\n\nScaled residuals: \n     Min       1Q   Median       3Q      Max \n-2.64661 -0.61082 -0.05207  0.56985  2.75374 \n\nRandom effects:\n Groups   Name        Variance Std.Dev.\n batch    (Intercept) 41.84    6.468   \n Residual             20.47    4.524   \nNumber of obs: 270, groups:  batch, 45\n\nFixed effects:\n                       Estimate Std. Error        df t value Pr(>|t|)    \n(Intercept)            33.12222    1.73683  42.00000  19.070  < 2e-16 ***\nrecipeB                -1.47778    2.45625  42.00000  -0.602  0.55065    \nrecipeC                -1.52222    2.45625  42.00000  -0.620  0.53878    \ntemperature.L           6.43033    1.16822 210.00000   5.504 1.07e-07 ***\ntemperature.Q          -0.71285    1.16822 210.00000  -0.610  0.54239    \ntemperature.C          -2.32551    1.16822 210.00000  -1.991  0.04782 *  \ntemperature^4          -3.35128    1.16822 210.00000  -2.869  0.00454 ** \ntemperature^5          -0.15119    1.16822 210.00000  -0.129  0.89715    \nrecipeB:temperature.L   0.45419    1.65211 210.00000   0.275  0.78365    \nrecipeC:temperature.L   0.08765    1.65211 210.00000   0.053  0.95774    \nrecipeB:temperature.Q  -0.23277    1.65211 210.00000  -0.141  0.88809    \nrecipeC:temperature.Q   1.21475    1.65211 210.00000   0.735  0.46299    \nrecipeB:temperature.C   2.69322    1.65211 210.00000   1.630  0.10456    \nrecipeC:temperature.C   2.63856    1.65211 210.00000   1.597  0.11175    \nrecipeB:temperature^4   3.02372    1.65211 210.00000   1.830  0.06863 .  \nrecipeC:temperature^4   3.13711    1.65211 210.00000   1.899  0.05895 .  \nrecipeB:temperature^5  -0.66354    1.65211 210.00000  -0.402  0.68836    \nrecipeC:temperature^5  -1.62525    1.65211 210.00000  -0.984  0.32637    \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n\nCorrelation matrix not shown by default, as p = 18 > 12.\nUse print(x, correlation=TRUE)  or\n    vcov(x)        if you need it\n```\n:::\n:::\n\n:::\n\n#### Alternative models\n\nIf you want to do a bit of significance testing, you can try a few other versions of the model with different structures:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_cake2 <- lmer(angle ~ recipe + temperature + (1|batch), data = cake)\nlme_cake3 <- lmer(angle ~ recipe + (1|batch), data = cake)\nlme_cake4 <- lmer(angle ~ temperature + (1|batch), data = cake)\nlm_cake <- lm(angle ~ recipe*temperature, data = cake)\n\nanova(lme_cake, lm_cake) # random effects dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlm_cake: angle ~ recipe * temperature\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlm_cake    19 1901.3 1969.6 -931.63   1863.3                         \nlme_cake   20 1719.0 1791.0 -839.53   1679.0 184.21  1  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\nanova(lme_cake, lme_cake2) # recipe:temperature dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlme_cake2: angle ~ recipe + temperature + (1 | batch)\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n          npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)\nlme_cake2   10 1709.6 1745.6 -844.79   1689.6                    \nlme_cake    20 1719.0 1791.0 -839.53   1679.0 10.53 10     0.3953\n```\n:::\n\n```{.r .cell-code}\nanova(lme_cake, lme_cake3) # temperature & recipe:temperature dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlme_cake3: angle ~ recipe + (1 | batch)\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlme_cake3    5 1785.7 1803.7 -887.84   1775.7                         \nlme_cake    20 1719.0 1791.0 -839.53   1679.0 96.636 15  5.642e-14 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\nanova(lme_cake, lme_cake4) # recipe & recipe:temperature dropped\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: cake\nModels:\nlme_cake4: angle ~ temperature + (1 | batch)\nlme_cake: angle ~ recipe * temperature + (1 | batch)\n          npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)\nlme_cake4    8 1706.1 1734.9 -845.06   1690.1                    \nlme_cake    20 1719.0 1791.0 -839.53   1679.0 11.06 12     0.5238\n```\n:::\n:::\n\n:::\n\nWe see that when we drop the random intercepts for `batch`, and when we drop the `temperature` predictor, our chi-square values are significant. This indicates that these predictors are important. But we can drop `recipe` and `recipe:temperature` without a particularly big change in deviance.\n\n(This is borne out somewhat if you've used the `lmerTest` package to perform degrees of freedom approximation and extract p-values as part of the `lme_cake` model summary.)\n\nSo, our final model is probably `lme_cake4 = angle ~ temperature + (1|batch)`. \n\n#### Check assumptions\n\nFor completeness, we'll check the assumptions of `lme_cake4` and visualise it for the sake of aiding our interpretation.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_cake4, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_cake4, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-17-2.png){width=672}\n:::\n:::\n\n:::\n\nMost of the assumptions look okay, with the exception of the normal Q-Q plot for the random intercepts. The set of intercepts doesn't really look like it's nicely normally distributed here. Maybe a more complicated mixed effects model (something beyond the linear type we're going with here) would help. Or, maybe this just means we should be a little less decisive in our overall conclusions.\n\n#### Visualise the model\n\nLast but not least, we can visualise the model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_cake4), aes(x = temperature, y = angle)) +\n  geom_point(alpha = 0.7) +\n  geom_line(aes(y = .fitted, group = batch), alpha = 0.5)\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-18-1.png){width=672}\n:::\n:::\n\n:::\n\nOverall, `angle` increases with `temperature`. (From what I understand of reading the thesis, this is a good thing from the perspective of the cake quality, as it suggests the cake is more tender. Scientific *and* delicious.)\n\nWe can see visually that the `recipe` and `recipe:temperature` terms don't have much explanatory power by visualising the full model (commenting out the `facet_wrap` may also help you to see this):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_cake), aes(x = temperature, y = angle, colour = recipe)) +\n  facet_wrap(~ recipe) +\n  geom_point(alpha = 0.7) +\n  geom_line(aes(y = .fitted, group = batch))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-19-1.png){width=672}\n:::\n:::\n\n:::\n\n:::\n\n:::\n\n::: {.callout-exercise}\n#### Bonus questions\n\n\n{{< level 2 >}}\n\n\n\nIf you want to think a bit harder about this dataset, consider these additional questions. Feel free to chat about them with a neighbour or with a trainer.\n\n- Why doesn't it work if you try to fit random slopes for `temperature` on `batch`? Have a look at the warning message that R gives you in this situation.\n- What happens if you use the numerical `temp` variable instead of the categorical `temperature`? Does it change your conclusions? Why might you prefer to use the numerical/continuous version?\n- Could `temperature` be treated as a random effect, under certain interpretations of the original research question? Is it possible or sensible to do that with the current dataset?\n:::\n\nFor more information on the very best way to bake a chocolate cake (and a lovely demonstration at the end about the dangers of extrapolating from a linear model), [this blog post](https://www.sumsar.net/blog/source-of-the-cake-dataset/) is a nice source. It's written by a data scientist who was so curious about the quirky `cake` dataset that he contacted Iowa State University, who helped him unearth Cook's original thesis.\n\n### Parallel fibres {#sec-exr_parallel}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise, we'll be using a neurohistology dataset that focuses on a particular type of neuron found in the cerebellum, known as a parallel fibre. Parallel fibres are found in the uppermost layer of cerebellar cortex, and are known for being long; this experiment was designed to test whether the depth at which the fibre was found, had a bearing on its length.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nparallel <- read_csv(\"data/parallel.csv\")\n```\n:::\n\n:::\n\nTo measure the length of the fibres (which are <10mm long - big for a neuron!), slices were taken from the cerebella of six cats, at different depths. The depth of each slice was recorded. These slices were then stained, so that individual parallel fibres could be identified and measured.\n\n![An example of a stained slice from cerebellar cortex](images_mixed-effects/cerebellum_histology.jpg){width=40%}\n\nThe dataset contains five variables:\n\n- `length` of the fibre (in micrometres)\n- `depth` of the slice (in micrometres)\n- `fibre`, individual IDs for all of the fibres\n- `slice` ID number (maximum 10 slices per cat)\n- `cat` ID number (1 through 6)\n\nFor this exercise:\n\n1. Sketch a graphic/diagram that captures the experimental design\n2. Determine whether the dataset requires recoding or explicit nesting\n3. Fit and test at least one appropriate model\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\n#### Visualise the design\n\nThis is a nested design with three levels: `fibre` within `slice` within `cat`. But the fixed predictor `depth` varies at level 2, between slices (not between fibres).\n\n![Experimental design](images_mixed-effects/neurohist_design.png){width=70%}\n\n#### Recoding\n\nIf we look at the structure of the dataset, we can see that the numbering for the `slice` variable starts again for each `cat` at 1, which is not what we want.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nparallel %>% slice(1:8, 46)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 9 × 5\n  fibre length depth slice   cat\n  <dbl>  <dbl> <dbl> <dbl> <dbl>\n1     1   5780   260     1     1\n2     2   5730   260     1     1\n3     3   5790   260     1     1\n4     4   5860   260     1     1\n5     5   5690   260     1     1\n6     6   5940   260     1     1\n7     7   5950   260     1     1\n8     8   5940   290     2     1\n9    46   4670   300     1     2\n```\n:::\n:::\n\n:::\n\nSo, we need to recode a new variable, or be prepared to use explicit nesting in our model formula. Note that for this recoding to work, we also need to ask R to treat `slice` and `cat` as factors rather than numeric variables.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nparallel <- parallel %>%\n  mutate(cat = as.factor(cat)) %>%\n  mutate(slice = as.factor(slice)) %>%\n  mutate(unique_slice = slice:cat)\n```\n:::\n\n:::\n\n#### Fit a model\n\nThe full model that we could fit to these data contains three random effects: random intercepts for `unique_slice` (or `slice:cat` if you're explicitly coding), random intercepts for `cat`, and random slopes for `depth` on `cat`.\n\n(Since `depth` doesn't vary within `slice`, i.e., each `slice` has only one `depth`, we can't fit random slopes at level 2.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_parallel <- lmer(length ~ depth + (1|slice:cat) + (1 + depth|cat), \n                     data = parallel)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nModel failed to converge with max|grad| = 1.2981 (tol = 0.002, component 1)\n```\n:::\n:::\n\n:::\n\nYou may notice that you get an error - the model fails to converge. There are a couple of fixes we could try that involve tweaking the settings in the estimation procedure (e.g., increasing the maximum number of iterations allowed).\n\nHowever, most errors like this just mean that we're being too ambitious. So, the approach we'll take here is to make the model simpler. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_parallel_int <- lmer(length ~ depth + (1|slice:cat) + (1|cat), \n                         data = parallel)\n```\n:::\n\n:::\n\n#### Check the assumptions\n\nNext, we check the assumptions of our intercepts-only nested model:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_model(lme_parallel_int, \n            check = c(\"linearity\", \"homogeneity\", \"qq\", \"outliers\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-25-1.png){width=672}\n:::\n\n```{.r .cell-code}\ncheck_model(lme_parallel_int, \n            check = c(\"reqq\", \"pp_check\"))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-25-2.png){width=672}\n:::\n:::\n\n:::\n\nNot bad at all. There are no obvious errors cropping up in these plots.\n\n#### Visualise the model\n\nLast but not least, we should have a look at our model predictions visually.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(augment(lme_parallel_int), aes(x = depth, y = length, colour = cat)) +\n  geom_point(alpha = 0.6) +\n  geom_line(aes(y = .fitted, group = cat))\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-26-1.png){width=672}\n:::\n:::\n\n:::\n\nDespite the fact that `depth` is a continuous variable, this plot still has jagged, rather than straight, lines of best fit. This is because the plot is also taking into account the multiple sets of random intercepts for `slice` that are contained within each `cat` cluster.\n\nWe can, however, extract just the set of intercepts by `cat`, and with a bit more fuss, use this to add lines to the plot with `geom_abline`:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\n# use the coef function to extract the coefficients\ncat_coefs <- coef(lme_parallel_int)$cat\n\n# use geom_abline to add individual lines for each cat\nggplot(augment(lme_parallel_int), aes(x = depth, y = length, colour = cat)) +\n  geom_point(alpha = 0.6) +\n  geom_abline(intercept = cat_coefs[1,1], slope = cat_coefs[1,2]) +\n  geom_abline(intercept = cat_coefs[2,1], slope = cat_coefs[2,2]) +\n  geom_abline(intercept = cat_coefs[3,1], slope = cat_coefs[3,2]) +\n  geom_abline(intercept = cat_coefs[4,1], slope = cat_coefs[4,2]) +\n  geom_abline(intercept = cat_coefs[5,1], slope = cat_coefs[5,2]) +\n  geom_abline(intercept = cat_coefs[6,1], slope = cat_coefs[6,2])\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-27-1.png){width=672}\n:::\n:::\n\n:::\n\n#### Is this a good model?\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(parallel, aes(x = depth, y = length, colour = slice)) +\n  facet_wrap(~cat) +\n  geom_point(alpha = 0.6)\n```\n\n::: {.cell-output-display}\n![](nested-random-effects_files/figure-html/unnamed-chunk-28-1.png){width=672}\n:::\n:::\n\n:::\n\nIf we plot the data faceted by `cat`, it suggests that the relationship between `depth` and `length` varies between `cat`. But we were forced to drop the random slopes for `depth|cat` due to lack of model convergence.\n\nOur diagnostic plots look pretty good for our simpler, intercepts-only model, but the raw data indicate we might be missing something. Do you still trust the model? If no, what might a researcher do to improve this analysis?\n\n:::\n\n:::\n\n::: {.callout-exercise}\n#### Bonus question: notation\n\n\n{{< level 3 >}}\n\n\n\nThink back to the brief introduction to linear mixed effects models notation given in section 5.5.1 of the course materials.\n\nWhat would the equation of a three level model fitted to the `parallel` dataset look like?\n\nHint: you'll need more subscript letters than you did for a two-level model!\n\n::: {.callout-tip collapse=\"true\"}\n#### Answer: three-level intercepts-only\n\nE.g., `length ~ depth + (1|slice:cat) + (1|cat)`\n\nLevel 1:\n\n$$\ny_{ijk} = \\beta_{0jk} + \\beta_{1}x_{1ijk} + \\epsilon_{ijk}\n$$\n\nLevel 2: \n\n$$\n\\beta_{0jk} = \\delta_{00k} + U_{0jk}\n$$\n\nLevel 3: \n\n$$\n\\delta_{00k} = \\gamma_{000} + V_{00k}\n$$\n\nwhere,\n\n$$\n\\left( \\begin{array}{c} U_{0jk} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} \\end{array} \\right)\n$$\n\nand,\n\n$$\n\\left( \\begin{array}{c} V_{00k} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} \\end{array} \\right)\n$$\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n#### Answer: three-level intercepts & slopes\n\nE.g., `length ~ depth + (1|slice:cat) + (1 + depth|cat)`\n\nLevel 1:\n\n$$\ny_{ijk} = \\beta_{0jk} + \\beta_{1k}x_{1ijk} + \\epsilon_{ijk}\n$$\n\nLevel 2: \n\n$$\n\\beta_{0jk} = \\beta_{00k} + U_{0jk}\n$$\n\nLevel 3: \n\n$$\n\\beta_{00k} = \\gamma_{000} + V_{00k}\n$$\n$$\n\\beta_{1k} = \\gamma_{100} + V_{10k}\n$$\n\nWhere,\n\n$$\n\\left( \\begin{array}{c} U_{0jk} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} \\end{array} \\right)\n$$\n\nand,\n\n$$\n\\left( \\begin{array}{c} V_{00k} \\\\ V_{10k} \\end{array} \\right) ∼ N \\left( \\begin{array}{c} 0 \\\\ 0 \\end{array}   , \\begin{array}{cc} \\tau^2_{00} & \\rho_{01} \\\\ \\rho_{01} &  \\tau^2_{10} \\end{array} \\right)\n$$\n\n:::\n\n:::\n\n\n## Summary\n\nSometimes, a dataset contains multiple clustering variables. When one of those clustering variables is nested inside the other, we can model this effectively by estimating random effects at multiple levels. \n\nAdding additional levels can create some complications, e.g., determining which level of the dataset your predictor variables are varying at. But it can also allow us to deal with real-life hierarchical data structures, which are common in research.\n\n::: {.callout-tip}\n#### Key points\n- Random effect B is nested inside random effect A, if each category of B occurs uniquely within only one category of A\n- It's important to figure out what level of the hierarchy or model a predictor variable is varying at, to determine where random slopes are appropriate\n- Nested random effects can be implicitly or explicitly coded in a dataframe, which determines how the model should be specified in `lme4`\n:::\n\n",
     "supporting": [
       "nested-random-effects_files"
     ],
diff --git a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-1.png b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-1.png
index 873d9d6..8f5e368 100644
Binary files a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-1.png and b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-1.png differ
diff --git a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-2.png b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-2.png
index 6f3f8f5..1e142f4 100644
Binary files a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-2.png and b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-17-2.png differ
diff --git a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-1.png b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-1.png
index 2c97704..f9a4178 100644
Binary files a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-1.png and b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-1.png differ
diff --git a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-2.png b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-2.png
index 5e812fb..a713441 100644
Binary files a/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-2.png and b/_freeze/materials/nested-random-effects/figure-html/unnamed-chunk-25-2.png differ
diff --git a/_freeze/materials/random-effects/execute-results/html.json b/_freeze/materials/random-effects/execute-results/html.json
index 182ee55..dd14d09 100644
--- a/_freeze/materials/random-effects/execute-results/html.json
+++ b/_freeze/materials/random-effects/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "9860de5acdecd2db5fadb30894a50add",
+  "hash": "347c082b41fb0218fd545514c345bc1a",
   "result": {
-    "markdown": "---\ntitle: \"Introducing random effects\"\noutput: html_document\n---\n\n\n\n::: {.cell}\n\n:::\n\n\nMixed effects models are particularly useful in biological and clinical sciences, where we commonly have innate clusters or groups within our datasets. This is because mixed effects models contain **random effects** in addition to **fixed effects** (hence the name, \"mixed effects\"). \n\nRather than incorrectly assuming independence between observations, random effects allow us to take into account the natural clusters or structures within datasets, without requiring us to calculate separate coefficients for each group - in other words, solving the problem of psuedoreplication, without sacrificing as much statistical power.\n\n## What is a random effect?\n\nThere are a few things that characterise a random effect:\n\n- All random effects are **categorical** variables or factors\n- They create clusters or groups within your dataset (i.e., **non-independence**)\n- The levels/groups of that factor have been chosen \"at random\" from a larger set of possible levels/groups - this is called **exchangeability**\n- Usually, we are **not interested in the random effect as a predictor**; instead, we are trying to account for it in our analysis\n- We expect to have **5 or more distinct levels/groups** to be able to treat a variable as a random effect\n\n### An example of a random effect\n\nLet's put the features listed above in context, with an example. \n\nImagine that you're conducting a study to investigate whether temperature predicts the number of tourists that go to beaches. You study ten different beaches, and for each Saturday in the summer season that year, you record `peak temperature` and `number of tourists`.\n\nHere, the relationship we're interested in is `number of tourists ~ peak temperature`. We've created replicates by measuring these variables on an number of different Saturdays across the summer period. We've *also* replicated, however, by looking at 10 beaches.\n\nThis has created an additional clustering variable, and non-independence within our data: `beach`. Week to week, we would expect both `temperature` and `tourists` to be more similar to other values recorded on the same beach, compared to other beaches - perhaps due to factors like location, size, popularity, cleanliness and so on.\n\nSo, `beach` is both a **categorical** variable, and one that **creates non-independence** in our dataset.\n\nHowever, we are not really interested in these specific 10 beaches. We want to know about the relationship between `tourists` and `temperature` across all beaches, ideally, and we just happen to have tested these ones. In other words, the precise beaches we tested are **exchangeable** with any other beaches that we might've investigated instead.\n\nWe want to make sure that our analysis **accounts for the non-independence** that the `beach` variable has generated, so that we can get at the actual effect of interest. Treating `beach` as a random effect here allows us to quantify how much of the variance in the `tourists ~ temperature` relationship was down to factors that are beach-specific (location and size of beach, local weather differences etc.), and how much of the variance is actually due to our overall effect of interest.\n\nThankfully, we estimated from **5 or more** different beaches, and so we're able to treat this variable as a random effect. Having enough levels/groups within your random effect is important for the underlying maths that goes on when you fit the model, so that you can adequately share information across the different levels.\n\n### Sharing information across levels\n\nWe can also define random effects in the way that they are estimated/fitted in our model.\n\nLet's start by thinking about what happens when we fit a fixed categorical predictor in a standard model - say, if you are interested in finding out whether degree subject predicts number of contact hours per week. \n\nFor each different degree subject in your dataset, you would calculate a separate mean number of contact hours, treating each of your subjects as a distinct sub-set of the data. This is referred to as a \"fixed\" effect, as you've calculated a specific, fixed estimate for each group.\n\nIn a standard linear model, every predictor that you fit will be estimated as a fixed effect.\n\nIn contrast, when we fit a random effect, we include information about the global average within our estimates for each group. In other words, the estimates for the group means are not just based on a distinct sub-set of the data - we have incorporated information about all the *other* levels of that variable, as well, when making our group estimates. The group estimates in a random effect therefore are \"pulled\" or skewed closer to the global mean than they would be if estimated separately from one another.\n\nThere'll be more about the maths of fitting random effects later in the course, but this concept of \"sharing information\" across levels is key to understanding what a random effect is and to deciding whether variables should be treated as such, so it's really useful to know about it now.\n\n## Exercises\n\n### Exercise 1 - Primary schools\n\n\n{{< level 1 >}}\n\n\n\nAn education researcher is interested in the impact of socio-economic status (SES) and gender on students' primary school achievement.\n\nFor twelve schools across the UK, she records the following variables for each child in their final year of school (age 11):\n\n- Standardised academic test scores\n- Whether the child is male or female\n- Standardised SES score, based on parental income, occupation and education\n\nThe response variable in this example is the standardised academic test scores, and we have two predictors: `gender` and `SES score`. Note that we have also tested these variables across three schools.\n\n![Predictor variables](images_mixed-effects/example1_1.png)\n\nWhich of these predictors should be treated as fixed versus random effects? Are there any other \"hidden\" grouping variables that we should consider, based on the description of the experiment?\n\n::: {.callout-note collapse=\"true\"}\n#### Answer\n\nWe care about the effects of `gender` and `SES score`. We might also be interested in testing for the interaction between them, like so: `academic test scores ~ SES + gender + SES:gender`.\n\nThis helps us to determine straight away that both `gender` and `SES score` are fixed effects - we're interested in them directly. Supporting this is the fact that we have restricted `gender` here to a categorical variable with only two levels, while `SES score` is continuous - neither of these could be treated as random effects.\n\nHowever, `school` should be treated as a random effect. We collected data from 12 different schools, but we are not particularly interested in differences between these specific schools. In fact we'd prefer to generalise our results to students across all UK primary schools, and so it makes sense to share information across the levels. But we can't neglect `school` as a variable in this case, as it does create natural clusters in our dataset.\n\n![Fixed versus random effects](images_mixed-effects/example1_2.png)\nWe also have two possible \"hidden\" random effects in this dataset, however.\n\nThe first is `classroom`. If the final year students are grouped into more than one class within each school, then they have been further \"clustered\". Students from the same class share a teacher, and thus will be more similar to one another than to students in another class, even within the same school.\n\nThe `classroom` variable would in fact be \"nested\" inside the `school` variable - more on nested variables in later sections of this course.\n\nOur other possible hidden variable is `family`. If siblings have been included in the study, they will share an identical SES score, because this has been derived from the parent(s) rather than the students themselves. Siblings are, in this context, technical replicates! One way to deal with this is to simply remove siblings from the study; or, if there are enough sibling pairs to warrant it, we could also treat `family` as a random effect.\n:::\n\n### Exercise 2 - Ferns\n\n\n{{< level 1 >}}\n\n\n\nA plant scientist is investigating how light intensity affects the growth rate of young fern seedlings.\n\nHe cultivates 240 seedlings in total in the greenhouse, split across ten trays (24 seedlings in each). Each tray receives one of three different light intensities, which can be varied by changing the settings on purpose-built growlights.\n\nThe height of each seedling is then measured repeatedly at five different time points (days 1, 3, 5, 7 and 9).\n\nWhat are our variables? What's the relationship we're interested in, and which of the variables (if any) should be treated as random effects?\n\n![Predictor variables](images_mixed-effects/example2_1.png){fig-alt=\"Graphic with three variables listed: Tray, Itensity and Timepoint\"}\n\n::: {.callout-note collapse=\"true\"}\n#### Answer\n\nThere are four things here that vary: `tray`, `light intensity`, `timepoint` and `height`. \n\nWe're interested in the relationship between growth rate and light intensity. This makes our first two predictor variables easier to decide about:\n\n![Fixed versus random effects](images_mixed-effects/example2_2.png){fig-alt=\"Graphic with three variables listed: Tray, Itensity and Timepoint. Tray is now identified as a random effect, while Intensity and Timepoint are identified as fixed effects.\"}\nThe variable `tray` is a random effect here. We are not interested in differences between these 10 particular trays that we've grouped our seedlings into, but we do need to recognise the non-independence created by these natural clusters - particularly because we've applied the \"treatment\" (changes in light intensity) to entire trays, rather than to individual seedlings.\n\nIn contrast, `light intensity` - though a categorical variable - is a fixed effect. We are specifically interested in comparing across the three light intensity levels, so we don't want to share information between them; we want fixed estimates of the differences between the group means here.\n\nPerhaps the trickiest variable to decide about is `time`. Sometimes, we will want to treat time as a random effect in mixed effects models. And we have enough timepoints to satisfy the requirement for 5+ levels in this dataset. \n\nBut in this instance, where we are looking at growth rate, we have a good reason to believe that `time` is an important predictor variable, that may have an interesting interaction with `light intensity`. \n\nFurther, our particular levels of `time` - the specific days that we have measured - are not necessarily exchangeable, nor do we necessarily want to share information between these levels.\n\nIn this case, then, `time` would probably be best treated as a fixed rather than random effect. \n\nHowever, if we were not measuring a response variable that changes over time (like growth), that might change. If, for instance, we were investigating the relationship between light intensity and chlorophyll production in adult plants, then measuring across different time points would be a case of technical replication instead, and `time` would be best treated as a random effect. **The research question is key in making this decision.**\n:::\n\n### Exercise 3 - Wolves\n\n\n{{< level 1 >}}\n\n\n\nAn ecologist is conducting a study to demonstrate how the presence of wolves in US national parks predicts the likelihood of flooding. For six different national parks across the country that contain rivers, they record the estimated wolf population, and the average number of centimetres by which the major river in the park overflows its banks, for the last 10 years - for a total of 60 observations.\n\nWhat's the relationship of interest? Is our total *n* really 60?\n\n![Predictor variables](images_mixed-effects/example3_1.png){fig-alt=\"Graphic with three variables listed: Wolf population, National park and Year.\"}\n\n::: {.callout-note collapse=\"true\"}\n#### Answer\n\nThough we have 60 observations, it would of course be a case of pseudoreplication if we failed to understand the clustering within these data.\n\nWe have four things that vary: `wolf population`, `flood depth`, `national park` and `year`.\n\nWith `flood depth` as our response variable, we already know how to treat that. And by now, you've hopefully got the pattern that our continuous effect of interest `wolf population` will always have to be a fixed effect. \n\n![Fixed versus random effects](images_mixed-effects/example3_2.png){fig-alt=\"Graphic with three variables listed: Wolf population, National park and Year. Wolf population is now identified as a fixed effect, while National park and Year are identified as random effects.\"}\nBut there's also `year` and `national park` to contend with, and here, we likely want to treat both as random effects.\n\nWe have measured across several national parks, and over a 10 year period, in order to give us a large enough dataset for sufficient statistical power - these are technical replicates. But from a theoretical standpoint, the exact years and the exact parks that we've measured from, probably aren't that relevant. It's fine if we share information across these different levels.\n\nOf course, you might know more about ecology than me, and have a good reason to believe that the exact years *do* matter - that perhaps something fundamental in the relationship between `flood depth ~ wolf population` really does vary with year in a meaningful way. But given that our research question does not focus on change over time, both `year` and `national park` would be best treated as random effects given the information we currently have.\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- A model with both fixed and random effects is referred to as a mixed effects model\n- Random effects are categorical variables, with 5+ levels, that represent non-independent \"clusters\" or \"groups\" within the data\n- Random effects are estimated by sharing information across levels/groups, which are typically chosen \"at random\" from a larger set of exchangeable levels\n- Whether a variable should be treated as a random effect depends both on the nature of the variable, and also the research question\n:::\n\n",
+    "markdown": "---\ntitle: \"Introducing random effects\"\noutput: html_document\n---\n\n\n\n::: {.cell}\n\n:::\n\n\nMixed effects models are particularly useful in biological and clinical sciences, where we commonly have innate clusters or groups within our datasets. This is because mixed effects models contain **random effects** in addition to **fixed effects** (hence the name, \"mixed effects\"). \n\nRather than incorrectly assuming independence between observations, random effects allow us to take into account the natural clusters or structures within datasets, without requiring us to calculate separate coefficients for each group - in other words, solving the problem of psuedoreplication, without sacrificing as much statistical power.\n\n## What is a random effect?\n\nThere are a few things that characterise a random effect:\n\n- All random effects are **categorical** variables or factors\n- They create clusters or groups within your dataset (i.e., **non-independence**)\n- The levels/groups of that factor have been chosen \"at random\" from a larger set of possible levels/groups - this is called **exchangeability**\n- Usually, we are **not interested in the random effect as a predictor**; instead, we are trying to account for it in our analysis\n- We expect to have **5 or more distinct levels/groups** to be able to treat a variable as a random effect\n\n### An example of a random effect\n\nLet's put the features listed above in context, with an example. \n\nImagine that you're conducting a study to investigate whether temperature predicts the number of tourists that go to beaches. You study ten different beaches, and for each Saturday in the summer season that year, you record `peak temperature` and `number of tourists`.\n\nHere, the relationship we're interested in is `number of tourists ~ peak temperature`. We've created replicates by measuring these variables on an number of different Saturdays across the summer period. We've *also* replicated, however, by looking at 10 beaches.\n\nThis has created an additional clustering variable, and non-independence within our data: `beach`. Week to week, we would expect both `temperature` and `tourists` to be more similar to other values recorded on the same beach, compared to other beaches - perhaps due to factors like location, size, popularity, cleanliness and so on.\n\nSo, `beach` is both a **categorical** variable, and one that **creates non-independence** in our dataset.\n\nHowever, we are not really interested in these specific 10 beaches. We want to know about the relationship between `tourists` and `temperature` across all beaches, ideally, and we just happen to have tested these ones. In other words, the precise beaches we tested are **exchangeable** with any other beaches that we might've investigated instead.\n\nWe want to make sure that our analysis **accounts for the non-independence** that the `beach` variable has generated, so that we can get at the actual effect of interest. Treating `beach` as a random effect here allows us to quantify how much of the variance in the `tourists ~ temperature` relationship was down to factors that are beach-specific (location and size of beach, local weather differences etc.), and how much of the variance is actually due to our overall effect of interest.\n\nThankfully, we estimated from **5 or more** different beaches, and so we're able to treat this variable as a random effect. Having enough levels/groups within your random effect is important for the underlying maths that goes on when you fit the model, so that you can adequately share information across the different levels.\n\n### Sharing information across levels\n\nWe can also define random effects in the way that they are estimated/fitted in our model.\n\nLet's start by thinking about what happens when we fit a fixed categorical predictor in a standard model - say, if you are interested in finding out whether degree subject predicts number of contact hours per week. \n\nFor each different degree subject in your dataset, you would calculate a separate mean number of contact hours, treating each of your subjects as a distinct sub-set of the data. This is referred to as a \"fixed\" effect, as you've calculated a specific, fixed estimate for each group.\n\nIn a standard linear model, every predictor that you fit will be estimated as a fixed effect.\n\nIn contrast, when we fit a random effect, we include information about the global average within our estimates for each group. In other words, the estimates for the group means are not just based on a distinct sub-set of the data - we have incorporated information about all the *other* levels of that variable, as well, when making our group estimates. The group estimates in a random effect therefore are \"pulled\" or skewed closer to the global mean than they would be if estimated separately from one another.\n\nThere'll be more about the maths of fitting random effects later in the course, but this concept of \"sharing information\" across levels is key to understanding what a random effect is and to deciding whether variables should be treated as such, so it's really useful to know about it now.\n\n## Exercises\n\n### Primary schools {#sec-exr_primaryschools}\n\n::: {.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nAn education researcher is interested in the impact of socio-economic status (SES) and gender on students' primary school achievement.\n\nFor twelve schools across the UK, she records the following variables for each child in their final year of school (age 11):\n\n- Standardised academic test scores\n- Whether the child is male or female\n- Standardised SES score, based on parental income, occupation and education\n\nThe response variable in this example is the standardised academic test scores, and we have two predictors: `gender` and `SES score`. Note that we have also tested these variables across three schools.\n\n![Predictor variables](images_mixed-effects/example1_1.png)\n\nWhich of these predictors should be treated as fixed versus random effects? Are there any other \"hidden\" grouping variables that we should consider, based on the description of the experiment?\n\n::: {.callout-tip collapse=\"true\"}\n#### Answer\n\nWe care about the effects of `gender` and `SES score`. We might also be interested in testing for the interaction between them, like so: `academic test scores ~ SES + gender + SES:gender`.\n\nThis helps us to determine straight away that both `gender` and `SES score` are fixed effects - we're interested in them directly. Supporting this is the fact that we have restricted `gender` here to a categorical variable with only two levels, while `SES score` is continuous - neither of these could be treated as random effects.\n\nHowever, `school` should be treated as a random effect. We collected data from 12 different schools, but we are not particularly interested in differences between these specific schools. In fact we'd prefer to generalise our results to students across all UK primary schools, and so it makes sense to share information across the levels. But we can't neglect `school` as a variable in this case, as it does create natural clusters in our dataset.\n\n![Fixed versus random effects](images_mixed-effects/example1_2.png)\nWe also have two possible \"hidden\" random effects in this dataset, however.\n\nThe first is `classroom`. If the final year students are grouped into more than one class within each school, then they have been further \"clustered\". Students from the same class share a teacher, and thus will be more similar to one another than to students in another class, even within the same school.\n\nThe `classroom` variable would in fact be \"nested\" inside the `school` variable - more on nested variables in later sections of this course.\n\nOur other possible hidden variable is `family`. If siblings have been included in the study, they will share an identical SES score, because this has been derived from the parent(s) rather than the students themselves. Siblings are, in this context, technical replicates! One way to deal with this is to simply remove siblings from the study; or, if there are enough sibling pairs to warrant it, we could also treat `family` as a random effect.\n:::\n\n:::\n\n### Ferns {#sec-exr_ferns}\n\n::: {.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nA plant scientist is investigating how light intensity affects the growth rate of young fern seedlings.\n\nHe cultivates 240 seedlings in total in the greenhouse, split across ten trays (24 seedlings in each). Each tray receives one of three different light intensities, which can be varied by changing the settings on purpose-built growlights.\n\nThe height of each seedling is then measured repeatedly at five different time points (days 1, 3, 5, 7 and 9).\n\nWhat are our variables? What's the relationship we're interested in, and which of the variables (if any) should be treated as random effects?\n\n![Predictor variables](images_mixed-effects/example2_1.png){fig-alt=\"Graphic with three variables listed: Tray, Itensity and Timepoint\"}\n\n::: {.callout-tip collapse=\"true\"}\n#### Answer\n\nThere are four things here that vary: `tray`, `light intensity`, `timepoint` and `height`. \n\nWe're interested in the relationship between growth rate and light intensity. This makes our first two predictor variables easier to decide about:\n\n![Fixed versus random effects](images_mixed-effects/example2_2.png){fig-alt=\"Graphic with three variables listed: Tray, Itensity and Timepoint. Tray is now identified as a random effect, while Intensity and Timepoint are identified as fixed effects.\"}\nThe variable `tray` is a random effect here. We are not interested in differences between these 10 particular trays that we've grouped our seedlings into, but we do need to recognise the non-independence created by these natural clusters - particularly because we've applied the \"treatment\" (changes in light intensity) to entire trays, rather than to individual seedlings.\n\nIn contrast, `light intensity` - though a categorical variable - is a fixed effect. We are specifically interested in comparing across the three light intensity levels, so we don't want to share information between them; we want fixed estimates of the differences between the group means here.\n\nPerhaps the trickiest variable to decide about is `time`. Sometimes, we will want to treat time as a random effect in mixed effects models. And we have enough timepoints to satisfy the requirement for 5+ levels in this dataset. \n\nBut in this instance, where we are looking at growth rate, we have a good reason to believe that `time` is an important predictor variable, that may have an interesting interaction with `light intensity`. \n\nFurther, our particular levels of `time` - the specific days that we have measured - are not necessarily exchangeable, nor do we necessarily want to share information between these levels.\n\nIn this case, then, `time` would probably be best treated as a fixed rather than random effect. \n\nHowever, if we were not measuring a response variable that changes over time (like growth), that might change. If, for instance, we were investigating the relationship between light intensity and chlorophyll production in adult plants, then measuring across different time points would be a case of technical replication instead, and `time` would be best treated as a random effect. **The research question is key in making this decision.**\n:::\n\n:::\n\n### Wolves {#sec-exr_wolves}\n\n::: {.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nAn ecologist is conducting a study to demonstrate how the presence of wolves in US national parks predicts the likelihood of flooding. For six different national parks across the country that contain rivers, they record the estimated wolf population, and the average number of centimetres by which the major river in the park overflows its banks, for the last 10 years - for a total of 60 observations.\n\nWhat's the relationship of interest? Is our total *n* really 60?\n\n![Predictor variables](images_mixed-effects/example3_1.png){fig-alt=\"Graphic with three variables listed: Wolf population, National park and Year.\"}\n\n::: {.callout-tip collapse=\"true\"}\n#### Answer\n\nThough we have 60 observations, it would of course be a case of pseudoreplication if we failed to understand the clustering within these data.\n\nWe have four things that vary: `wolf population`, `flood depth`, `national park` and `year`.\n\nWith `flood depth` as our response variable, we already know how to treat that. And by now, you've hopefully got the pattern that our continuous effect of interest `wolf population` will always have to be a fixed effect. \n\n![Fixed versus random effects](images_mixed-effects/example3_2.png){fig-alt=\"Graphic with three variables listed: Wolf population, National park and Year. Wolf population is now identified as a fixed effect, while National park and Year are identified as random effects.\"}\nBut there's also `year` and `national park` to contend with, and here, we likely want to treat both as random effects.\n\nWe have measured across several national parks, and over a 10 year period, in order to give us a large enough dataset for sufficient statistical power - these are technical replicates. But from a theoretical standpoint, the exact years and the exact parks that we've measured from, probably aren't that relevant. It's fine if we share information across these different levels.\n\nOf course, you might know more about ecology than me, and have a good reason to believe that the exact years *do* matter - that perhaps something fundamental in the relationship between `flood depth ~ wolf population` really does vary with year in a meaningful way. But given that our research question does not focus on change over time, both `year` and `national park` would be best treated as random effects given the information we currently have.\n:::\n\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- A model with both fixed and random effects is referred to as a mixed effects model\n- Random effects are categorical variables, with 5+ levels, that represent non-independent \"clusters\" or \"groups\" within the data\n- Random effects are estimated by sharing information across levels/groups, which are typically chosen \"at random\" from a larger set of exchangeable levels\n- Whether a variable should be treated as a random effect depends both on the nature of the variable, and also the research question\n:::\n\n",
     "supporting": [
       "random-effects_files"
     ],
diff --git a/_freeze/materials/significance-and-model-comparison/execute-results/html.json b/_freeze/materials/significance-and-model-comparison/execute-results/html.json
index 4f42461..eb136f4 100644
--- a/_freeze/materials/significance-and-model-comparison/execute-results/html.json
+++ b/_freeze/materials/significance-and-model-comparison/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "95741e88625a8a900d618895aeda0ac9",
+  "hash": "1ad00af1da2563c23f80957866efc3d8",
   "result": {
-    "markdown": "---\ntitle: \"Significance & model comparison\"\noutput: html_document\n---\n\n\n\n::: {.cell}\n\n:::\n\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll primarily be using the `lmerTest` package for performing certain types of significance tests. The `pbkrtest` package is also introduced.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lmerTest)\nlibrary(pbkrtest)\n```\n:::\n\n:::\n\n## The problem \n\nUnlike standard linear models, p-values are not calculated automatically for a mixed effects model in `lme4`, as you may have noticed in the previous section of the materials. There is a little extra work and thought that goes into testing significance for these models.\n\nThe reason for this is the inclusion of random effects, and the way that random effects are estimated. When using partial pooling to estimate the random effects, there is no way to precisely determine the number of **degrees of freedom**. \n\nThis matters, because we need to know the degrees of freedom to calculate p-values in the way we usually do for a linear model (see the drop-down box below if you want a more detailed explanation for this).\n\n::: {.callout-note collapse=\"true\"}\n#### Degrees of freedom & p-values\n\nThe degrees of freedom in a statistical analysis refers to the number of observations in the dataset that are free to vary (i.e., free to take any value) once the necessary parameters have been estimated. This means that the degrees of freedom varies with both the sample size, and the complexity of the model you've fitted.\n\nWhy does this matter? Well, each test statistic (such as F, t, chi-square etc.) has its own distribution, from which we can derive the probability of that statistic taking a certain value. That's precisely what a p-value is: the probability of having collected a sample with this particular test statistic, if the null hypothesis were true. \n\nCrucially, the exact shape of this distribution is determined by the number of degrees of freedom. This means we need to know the degrees of freedom in order to calculate the correct p-value for each of our test statistics.\n:::\n\nHowever, when we fit a mixed effects model, we may still want to be able to discuss significance of a) our overall model and b) individual predictors within our model.\n\n## Overall model significance\n\nLikelihood ratio tests (LRTs) are used to compare goodness-of-fit, or deviance, between two models in order to produce p-values. They don't require us to know the degrees of freedom of those models.\n\nOne use of an LRT is to check the significance of our model as a whole, although we'll revisit the LRT in later sections of this page as well.\n\n::: {.callout-note collapse=\"true\"}\n#### What makes this test a \"likelihood ratio\"? \n\nRemember that mixed effects models are fitted by maximising their likelihood, which is defined as the joint probability of the sample given a particular set of parameters (i.e., how likely is it that this particular set of data points would occur, given a model with this equation?).\n\nEach distinct mixed model that is fitted to a given dataset therefore has its own value of likelihood. It will also, therefore, have its own value of deviance. Deviance is defined as the difference in log-likelihoods between a candidate model, and the hypothetical perfect \"saturated\" model for that dataset.\n\nSo, when we want to compare two models, we can calculate the ratio of their individual likelihoods (which is mathematically equivalent to the difference of their deviances, because of how logarithms work). This ratio can be thought of as a statistic in its own right, and approximately follows a chi-square distribution. \n\nTo determine whether this ratio is significantly different from 1, we calculate the degrees of freedom for the analysis - which is equal to the difference in the number of parameters between the two models we're comparing - to find the corresponding chi-square distribution, from which we can then calculate a p-value.\n:::\n\nLet's try this out on the trusty `sleepstudy` dataset. We create both our candidate model, `lm_sleep`, and a null model, `lm_null` (note, we have to do this using the `lm` function rather than `lmer`)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"sleepstudy\")\n\nlme_sleep <- lmer(Reaction ~ Days + (1 + Days|Subject),\n                   data = sleepstudy)\n\nlm_null <- lm(Reaction ~ 1, data = sleepstudy)\n```\n:::\n\n:::\n\nThen, we use the old faithful `anova` function to compare our candidate model to the null model, by calling them one after the other. Note that we have to call our candidate model first; if you list the null model first, you'll get an error.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep, lm_null)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlm_null: Reaction ~ 1\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlm_null      2 1965.0 1971.4 -980.52   1961.0                         \nlme_sleep    6 1763.9 1783.1 -875.97   1751.9 209.11  4  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis table gives us the $\\chi^2$ statistic (i.e., the likelihood ratio) and an associated p-value. Here, the $\\chi^2$ is large and the p-value small, meaning that our model is significantly better than the null.\n\nA helpful, intuitive way to think about this test is: for the increase in complexity of my candidate model (vs the null model), has the deviance of the model decreased significantly? Or: given the number of predictors in my model, has the goodness-of-fit improved significantly from the null?\n\n::: {.callout-note collapse=\"true\"}\n#### Refitting using ML (instead of ReML)\n\nNote the warning/information message R provides when we use the `anova` function this way: \"refitting model(s) with ML (instead of REML)\".\n\nR, or more specifically the `anova` function, has done something helpful for us here. For reasons that we won't go into too much (though, feel free to ask if you're curious!), we cannot use LRTs to compare models that have been fitted with the ReML method, even though this is the standard method for the `lme4` package. So we must refit the model with ML.\n\n(Incidentally, we could have chosen to fit the models manually with ML, if we'd wanted to. The `lmer` function takes an optional `REML` argument that we can set to FALSE - it's set to TRUE by default. But letting the `anova` function do it for us is much easier!)\n:::\n\n## Fixed effects\n\nIn addition to asking about the model as a whole, we often want to know about individual predictors. Because it's simpler, we'll talk about fixed predictors first.\n\nThere are multiple methods for doing this. We'll step through the some of the most popular in a bit of detail:\n\n- Likelihood ratio tests\n- F-tests using approximations of degrees of freedom\n- t-to-z approximations (Wald tests)\n- Bootstrapping\n\n### Method 1: Likelihood ratio tests (LRTs)\n\nAs we mentioned above, LRTs are useful for comparing the model as a whole to the null - but they can also be used to investigate individual predictors.\n\nCrucially, we are only able to use this sort of test when one of the two models that we are comparing is a \"simpler\" version of the other, i.e., one model has a subset of the parameters of the other model. \n\nSo while we could perform an LRT just fine between two models `Y ~ A + B + C` and `Y ~ A + B + C + D`, to investigate the effect of `D`, or between any model and the null (`Y ~ 1`), we would not be able to use this test to compare `Y ~ A + B + C` and `Y ~ A + B + D`.\n\n![Two ways to use likelihood ratio tests](images_mixed-effects/LRT_schematic.png){width=70%}\n\nLet's use an LRT to test the fixed effect of `Days` in our `sleepstudy` example. First, we'll fit a random-effects-only model (we do this by replacing `Days` with `1`, to indicate no fixed effects).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_sleep_random <- lmer(Reaction ~ 1 + (1 + Days|Subject),\n                   data = sleepstudy)\n```\n:::\n\n:::\n\nThen we use `anova` to compare them, again putting our more complex model first.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep, lme_sleep_random)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlme_sleep_random: Reaction ~ 1 + (1 + Days | Subject)\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n                 npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlme_sleep_random    5 1785.5 1801.4 -887.74   1775.5                         \nlme_sleep           6 1763.9 1783.1 -875.97   1751.9 23.537  1  1.226e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis output tells us that, for the reduction in the number of parameters (i.e., removing `Days`), the difference in deviances is significantly big. In other words, a fixed effect of `Days` is meaningful and useful when predicting reaction times.\n\n### Method 2: Approximation of the degrees of freedom\n\nThis method is perhaps the most intuitive for those coming from a linear modelling background. Put simply, it involves making an educated guess about the degrees of freedom with some formulae, and then deriving a p-value as we usually would. \n\nThis lets us obtain p-values for any t- and F-values that are calculated, with just the one extra step compared to what we're used to with linear models.\n\nFor this approach, we will use the companion package to `lme4`, a package called `lmerTest`.\n\n::: {.callout-note collapse=\"true\"}\n#### lmerTest\n\nThe package provides an \"updated\" version of the `lmer()` function, one that can approximate the number of degrees of freedom, and thus provide estimated p-values.\n\nIf you have `lmerTest` loaded, R will automatically default to its updated version of the `lmer()` function, and perform the degrees of freedom approximation as standard. (You can prevent it from doing so by typing `lme4::lmer()` instead.)\n:::\n\nLet's look again at our random slopes & intercepts model for the `sleepstudy` dataset as a test case. We'll refit the model once we've loaded the new package.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lmerTest)\n\nlme_sleep <- lmer(Reaction ~ Days + (1 + Days|Subject),\n                   data = sleepstudy)\n```\n:::\n\n:::\n\nThe new version of the `lmer` function fits a very similar model object to before, except now it contains the outputs of a number of calculations that are required for the degrees of freedom approximation. By default, `lmerTest` uses the Satterthwaite approximation, which is appropriate for mixed models that are fitted using either MLE or ReML, making it pretty flexible.\n\nWe'll use the `anova` function from the `lmerTest` package to produce an analysis of variance table (R will default to using this version of the function unless told otherwise). This gives us an estimate for the F-statistic and associated p-value for our fixed effect of `Days`:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType III Analysis of Variance Table with Satterthwaite's method\n     Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    \nDays  30031   30031     1    17  45.853 3.264e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\n::: {.callout-note collapse=\"true\"}\n#### F-statistics vs t-statistics\n\nIf you were to look at the summary for our new `lme_sleep` model, you'd notice some t-statistics and p-values appearing next to the fixed effects. These are **not quite the same** as the F-statistics and p-values that we've extracted using the `anova` function.\n\nIn fact, this odd distinction between t-statistics and F-statistics is not unique to mixed models; you might remember it from linear modelling. The t-statistics are what we call \"Wald tests\" (more coming up on those in the next section) and test the null hypothesis that the coefficient $\\beta = 0$ for that predictor. This might not sound *too* dissimilar from what an analysis of variance F-test is assessing - and for continuous predictors, the result is usually very similar. But for a categorical predictor, you will see separate Wald tests for each pairwise comparison against the reference group, while you would only see a single F-statistic for the lot.\n:::\n\n#### Using the Kenward-Roger approximation\n\nAlthough the Satterthwaite approximation is the `lmerTest` default, another option called the Kenward-Roger approximation also exists. It's less popular than Satterthwaite because it's a bit less flexible (it can only be applied to models fitted with ReML). \n\nIf you wanted to switch to the Kenward-Roger approximation, you can do it easily by specifying the `ddf` argument:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep, ddf = \"Kenward-Roger\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType III Analysis of Variance Table with Kenward-Roger's method\n     Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    \nDays  30031   30031     1    17  45.853 3.264e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nIn reality, though, chances are that you'll just stick with the Satterthwaite default if you plan to use approximations for your own analyses. Statisticians have debated the relative merits of Satterthwaite versus Kenward-Roger, but the differences only really tend to emerge under specific conditions. Here, it's given us the same result.\n\n### Method 3: t-to-z approximations\n\nThis is a more unusual method, and another form of approximation. You'll see this less often, but it's included here for completeness. \n\nThis method involves making use of the Wald t-values, which are reported as standard in the `lme4` output.\n\nSpecifically, we can choose to treat these t-values as if they were z-scores instead, if our sample size is considered large enough. And, because z-scores are standardised, we don't need any degrees of freedom information to derive a p-value - we can just read them directly out of a table (or get R to do it for us).\n\n::: {.callout-note collapse=\"true\"}\n#### The logic of using z-scores instead\n\nA z-score is different from a statistic such as t or F. They're standardised, because they're measured in standard deviations - i.e., a z-score of 1.3 tells you that you are 1.3 standard deviations away from the mean. \n\nThis is helpful for deriving a p-value without degrees of freedom, but it raises the question: why is it okay to treat t-values as z-scores? \n\nThe logic here is that the t distribution actually begins to approximate (i.e., match up with) the z distribution as the sample size increases. Officially, when the sample size is infinite, the two distributions are identical. So, with a sufficiently large sample size, we can \"pretend\" or \"imagine\" that the Wald t-values are actually z-distributed, giving us p-values. \n:::\n\nUnfortunately, there are no formal guidelines to tell you whether your dataset is \"large enough\" to do this. It will depend on the number and type of predictors in your model. Plus, the t-to-z approximation is considered to be \"anti-conservative\" - in other words, there's a higher chance of false positives than with other methods.\n\nSome researchers adapt the t-to-z approximation approach a little to help with this; instead of explicitly calculating p-values, they instead use a rule of thumb that any Wald t-value greater than 2 is large enough to be considered significant. This is quite a strict threshold, so it can help to filter out some of the false positives or less convincing results.\n\nCalculating the p-value for a z-score can be done quickly in R using the `pnorm` function. We include the z-score (or, here, the t-value that we are treating as a z-score) as the value for our argument `q`. To make this a two-tailed test, we have to set `lower.tail` to FALSE, and multiply the answer by 2.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary(lme_sleep)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (1 + Days | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1743.6\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.9536 -0.4634  0.0231  0.4634  5.1793 \n\nRandom effects:\n Groups   Name        Variance Std.Dev. Corr\n Subject  (Intercept) 612.10   24.741       \n          Days         35.07    5.922   0.07\n Residual             654.94   25.592       \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***\nDays          10.467      1.546  17.000   6.771 3.26e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.138\n```\n:::\n\n```{.r .cell-code}\n2*pnorm(q = 6.771, lower.tail = FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1.278953e-11\n```\n:::\n:::\n\n:::\n\nIf we input the t-value for our `Days` fixed effect, we can see that it gives us a very small p-value. This p-value of 1.28 x 10^-11^ is quite a bit smaller than the one that our Satterthwaite degrees of freedom approximation provided (3.26 x 10^-6^) - an example of how this t-to-z approximation is more generous. However, in this case it's very clear that the `Days` effect definitely is significant, whichever way we test it, so it's perhaps not a concern.\n\n### Method 4: Bootstrapping\n\nNow, we get a little bit more technical. \n\nEntire pages of course materials could be dedicated to bootstrapping and simulation methods. These ideas go well beyond linear mixed models. But, now is not the time for all that.\n\nWe're going to look at one implementation of bootstrapping for mixed models, as an example, but if you're curious then a good place to start follow-up reading is [this excellent resource](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#testing-significance-of-random-effects).\n\nThe specific option we'll look at is performing parametric bootstrapping via the `PBmodcomp` function from the `pbkrtest` package.\n\nThis method involves:\n\n1. Simulating a bunch of datasets (specifically, based on the \"reduced\" or less complex model)\n2. For each simulated dataset, fit both models\n3. For each simulated dataset, compute the difference in deviances between the two models, to provide a distribution of differences in deviances\n4. Compare this distribution to the actual/observed difference in deviances\n\nThe syntax is very similar to the `anova` function, but you also set a seed. \n\n(This is something that's often done when simulating in general; it ensures that each time you run the code, you'll get the same set of numbers, so long as you use the same seed. You can choose whatever number you like.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_sleep, lme_sleep_random, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 35.46 sec; samples: 1000; extremes: 0;\nRequested samples: 1000 Used samples: 998 Extremes: 0\nlarge : Reaction ~ Days + (1 + Days | Subject)\nReaction ~ 1 + (1 + Days | Subject)\n         stat df   p.value    \nLRT    23.537  1 1.226e-06 ***\nPBtest 23.537     0.001001 ** \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nIt takes several seconds, because running 1000 simulations and fitting 2000 models isn't instantaneous. You may also get a bunch of warnings (they've been suppressed here for these course materials, but don't be alarmed if they appear for you when running this example).\n\nBut, as you can see, the p-value it produces is not necessarily the same as the one produced by a standard LRT.\n\n### Choosing the \"right\" method\n\nSeveral methods have been discussed here. Lots of researchers favour either the F-tests by degrees of freedom approximation, or the likelihood ratio test (LRT) for fixed effects, because they're relatively easy to implement - hence why we've spent slightly more time on them. \n\nIf we had to choose, we personally favour the LRT, because it's generalisable to any type of model that's fitted with maximum likelihood estimation, making it a very useful addition to a researcher's statistical toolkit.\n\nThose with more coding or theoretical background, however, might feel strongly that bootstrapping is always a more appropriate method for deriving p-values. And they might well be right. There's no strict answers once we get this far beyond the standard linear model.\n\nIt's worth noting that there's nothing stopping you using more than one approach when it comes to testing your own models, and \"triangulating\" the results to help you determine how robust your conclusions are. \n\n## Random effects\n\nWith fixed effects under our belt, let's now move to thinking about random effects.\n\nThere is a broader philosophical question to be asked here: what does it even mean for a random effect to be \"significant\"?\n\nRemember that a random effect is not a single coefficient. It's a measure of the distribution across a set of clusters or groups. Quite often, we include a random effect simply to account for it, to better represent our design, not because we want to treat it as a \"predictor\" in the traditional sense.\n\nPerhaps a better way to think about it is: **is my model better with or without this random effect?**\n\nOr even: **is there a need to test significance at all?**\n\nWe'll talk through a few different approaches:\n\n- Using LRTs (with caveats)\n- Using AIC/BIC (also with caveats)\n- Bootstrapping\n- Not testing at all\n\n### Method 1: Using LRTs\n\nThe most common method that you'll see used for judging whether random effects improve a model is the trusty LRT.\n\n::: {.callout-warning collapse=\"true\"}\n#### The major caveat with LRTs for random effects\n\nThough you'll see LRTs used often for random effects, *technically* this doesn't provide great estimates.\n\nWhen we run such a test, we're essentially asking whether the variance of our chosen random effect is equal to zero (i.e., our null hypothesis is $\\sigma^2 = 0$). But, as a statistican might point out, 0 is \"on the boundary of the feasible space\" - in other words, 0 is the lowest possible value that the variance could ever be.\n\nBecause of this, the various approximations to distributions that we rely on for the maths of an LRT to work, kind of break down. The result is that the p-values calculated for LRTs are very conservative, i.e., too large/strict.\n\nIn the simplest case, testing simple random effects one at a time, the p-value is approximately twice as large as it should be. And the problem gets worse when testing multiple correlated random effects (see bonus materials for more info on these correlations).\n\nThis doesn't stop people using them for this purpose, and it doesn't have to stop you. But it's something you should really be aware of if you choose this method.\n:::\n\nThe approach is much the same as for fixed effects: construct two nested models, with and without the effect of interest. \n\nThen, use the `anova` function to perform the LRT.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_sleep_intercepts <- lmer(Reaction ~ Days + (1|Subject),\n                   data = sleepstudy)\n\nanova(lme_sleep, lme_sleep_intercepts)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlme_sleep_intercepts: Reaction ~ Days + (1 | Subject)\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n                     npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_sleep_intercepts    4 1802.1 1814.8 -897.04   1794.1                     \nlme_sleep               6 1763.9 1783.1 -875.97   1751.9 42.139  2  7.072e-10\n                        \nlme_sleep_intercepts    \nlme_sleep            ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nOnce again, there is a significant difference between the two models, as seen by our small p-value. This tells us that the random slopes of `Days|Subject` is meaningful, and makes a difference in our model.\n\nYou can even use the `anova` function to compare models with and without random effects entirely, i.e., compare a linear mixed model to a linear model. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlm_sleep <- lm(Reaction ~ Days, data = sleepstudy)\n\nanova(lme_sleep, lm_sleep)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlm_sleep: Reaction ~ Days\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlm_sleep     3 1906.3 1915.9 -950.15   1900.3                         \nlme_sleep    6 1763.9 1783.1 -875.97   1751.9 148.35  3  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nMake sure you call the linear mixed model (i.e., the more complex model) first, because you will get an error if you put the two models the wrong way around here.\n\n### Method 2: AIC/BIC values\n\nSome researchers use model comparison procedures, such as stepwise elimination, to decide whether or not to keep or drop certain random effects from their models.\n\nAs you may have noticed in the outputs from all of the LRTs above, the `anova` function automatically provides Akaike information criterion (AIC) and Bayesian information criterion (BIC) values for the different nested models.\n\nFor instance, when comparing `lm_sleep` and `lme_sleep` above, we can see that the linear model has larger AIC/BIC values (and greater deviance, i.e., worse goodness-of-fit) than the linear mixed model with our random slopes & intercepts in it.\n\n::: {.callout-warning collapse=\"true\"}\n#### The same caveat as with LRTs\n\nUsing AIC/BIC to make decisions about random effects is subject to **the same caveat as for LRTs**: the values you get for these information criteria end up being overly conservative.\n\nIn other words, AIC/BIC values can give an underestimation of the importance or use of a random effect in a linear mixed model, perhaps leading you to drop it even if it's helpful.\n:::\n\n### Method 3: Bootstrapping\n\nAs we did above for the fixed effects, we can use parametric bootstrapping to investigate random effects.\n\nIt works in exactly the same way: feed in two models, one with and one without the random effect that you're interested in testing, and don't forget to pick a value to set the seed.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_sleep, lme_sleep_intercepts, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 28.75 sec; samples: 1000; extremes: 0;\nlarge : Reaction ~ Days + (1 + Days | Subject)\nReaction ~ Days + (1 | Subject)\n         stat df   p.value    \nLRT    42.139  2 7.072e-10 ***\nPBtest 42.139     0.000999 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nOnce again, you may get a long list of warnings as it simulates and fits models to a bunch of different datasets.\n\n### Method 4: Not testing at all\n\nThis might seem like a bit of an odd concept, especially placed where it is at the end of a page all about significance testing.\n\nAnd of course, we're not advocating for throwing all the possible random effects into an overly complicated model and just accepting whatever numbers fall out. You're still aiming for parsimony, and your model should still represent what's actually going on in your experimental design.\n\nBut, many people - including those with far more experience in mixed models than us - argue that you shouldn't drop a random effect simply because a p-value or AIC/BIC value tells you so. If that random effect is truly important in representing the design and structure of your dataset, then your model is better served by containing it. \n\nIn other words, it's meaningful because of the experimental design, not because of the numbers that come out of your model.\n\nThis philosophical stance is particularly applicable in situations where you're including random effects simply to account for the hierarchical, non-independent structure in your data, because you're interested in the overall or average trends.\n\n::: {.callout-note}\n#### A final thing to add...\n\nSome of the people who take this stance (including authors of some of the packages we've used) might argue that significance is no more important, or is even less important, than the *uncertainty* of the random effects. How confident are we that we've estimated the variance correctly? What are the confidence intervals within which the variance falls?\n\nNow, that really is a can of worms we're not going to open here, but you might be interested to know that packages exist for computing these confidence intervals; `lme4` even comes with a function for it.\n\nIf you're curious, you could start some follow-up reading [here](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#inference-and-confidence-intervals).\n:::\n\n## Exercises\n\n### Exercise 1 - Dragons revisited\n\n\n{{< level 2 >}}\n\n\n\nLet's return to the dataset from our previous example, our dragons dataset.\n\nPreviously, we fit a mixed model to this dataset that included response variable `intelligence`, fixed effects of `wingspan`, `scales` and `wingspan:colour`, and two random effects: random intercepts `1|mountain`, and random slopes for `wingspan|mountain`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndragons <- read_csv(\"data/dragons.csv\")\n\nlme_dragons <- lmer(intelligence ~ wingspan*scales + (1 + wingspan|mountain), \n                    data = dragons)\n```\n:::\n\n:::\n\nUse likelihood ratio tests to assess:\n\n- whether the model above is significant versus the null model\n- whether the fixed effects are significant\n\nIf you're feeling adventurous, you can also:\n\n- use LRTs, AIC and/or bootstrapping to assess the random effects, and compare the results\n- use other methods to assess the significance of the fixed effects, and compare the results\n\n::: {.callout-note collapse=\"true\"}\n#### Worked answer\n\nLet's start by using an LRT to test the overall significance of our model. We'll construct a null model, and then use `anova` to compare it to our model.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_null <- lm(intelligence ~ 1, data = dragons)\n\nanova(lme_dragons, lme_dragons_null)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_null: intelligence ~ 1\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                 npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlme_dragons_null    2 1997.0 2003.6 -996.51   1993.0                         \nlme_dragons         8 1647.8 1674.2 -815.92   1631.8 361.18  6  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nIt's significant. Something in our model is doing something helpful. A really good start!\n\nNext, we'll use LRTs to test the significance of our individual fixed effects. \n\nWe'll start with the interaction. To test this, we'll build an additive model, and compare it to our original full model. For the models to be comparable, we'll keep the random effects structure the same.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_dropx <- lmer(intelligence ~ wingspan + scales + (1 + wingspan|mountain), \n                          data = dragons)\n\nanova(lme_dragons, lme_dragons_dropx)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropx: intelligence ~ wingspan + scales + (1 + wingspan | mountain)\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                  npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropx    7 1647.2 1670.3 -816.60   1633.2                     \nlme_dragons          8 1647.8 1674.2 -815.92   1631.8 1.3648  1     0.2427\n```\n:::\n:::\n\n:::\n\nThe test isn't significant. This tells us that the `wingspan:scales` interaction wasn't doing anything meaningful in this model.\n\nNow, we're going to test the main effects of `scales` and `wingspan` by constructing two new models and comparing them to our additive model. (In this way, we're performing something a little bit like a stepwise elimination procedure.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nFirst, we'll test the interaction term by comparing our additive model to our original full model.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_dropscale <- lmer(intelligence ~ wingspan + (1 + wingspan|mountain), \n                              data = dragons)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nunable to evaluate scaled gradient\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nModel failed to converge: degenerate Hessian with 1 negative eigenvalues\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Model failed to converge with 1 negative eigenvalue: -2.6e+00\n```\n:::\n\n```{.r .cell-code}\nanova(lme_dragons_dropx, lme_dragons_dropscale)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropscale: intelligence ~ wingspan + (1 + wingspan | mountain)\nlme_dragons_dropx: intelligence ~ wingspan + scales + (1 + wingspan | mountain)\n                      npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropscale    6 1673.6 1693.3 -830.78   1661.6                     \nlme_dragons_dropx        7 1647.2 1670.3 -816.60   1633.2 28.359  1  1.008e-07\n                         \nlme_dragons_dropscale    \nlme_dragons_dropx     ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\nlme_dragons_dropwing <- lmer(intelligence ~ scales + (1 + wingspan|mountain), \n                            data = dragons)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nModel failed to converge with max|grad| = 0.003579 (tol = 0.002, component 1)\n```\n:::\n\n```{.r .cell-code}\nanova(lme_dragons_dropx, lme_dragons_dropwing)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropwing: intelligence ~ scales + (1 + wingspan | mountain)\nlme_dragons_dropx: intelligence ~ wingspan + scales + (1 + wingspan | mountain)\n                     npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropwing    6 1653.5 1673.2 -820.73   1641.5                     \nlme_dragons_dropx       7 1647.2 1670.3 -816.60   1633.2 8.2604  1   0.004052\n                       \nlme_dragons_dropwing   \nlme_dragons_dropx    **\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nBoth of these tests come out as significant. This suggests that both fixed effects for `wingspan` and `scales` are making meaningful contributions to our model.\n\nComfortingly, this aligns with what we see in an analysis of variance table using a Satterthwaite degrees of freedom approximation, which shows overall that there seem to be main effects though no significant interaction. The p-values are not the same - we wouldn't expect them to be, they're calculated very differently - but it's a relief that the overall effect is robust across methods:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_dragons)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType III Analysis of Variance Table with Satterthwaite's method\n                 Sum Sq Mean Sq NumDF   DenDF F value  Pr(>F)   \nwingspan        3059.90 3059.90     1   3.992 16.8644 0.01483 * \nscales          1923.44 1923.44     1 188.766 10.6008 0.00134 **\nwingspan:scales  242.84  242.84     1 188.380  1.3384 0.24878   \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nWe would draw the same overall conclusion using t-to-z approximations as well (using the t-values, extracted from the output of the `summary` function). Excellent news.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nThe interaction term:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n2*pnorm(q = -1.157, lower.tail = FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1.752728\n```\n:::\n:::\n\n\nThe main effect of scales:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n2*pnorm(q = 3.256, lower.tail = FALSE) # scales main effect\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.001129938\n```\n:::\n:::\n\n\nThe main effect of wingspan:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n2*pnorm(q = 4.244, lower.tail = FALSE) # wingspan main effect\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2.195704e-05\n```\n:::\n:::\n\n:::\n\nAnd finally, you can check the results from a parametric bootstrap (once again, the warnings have been suppressed here), which yet again agree with the prior tests:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons, lme_dragons_dropx, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 35.74 sec; samples: 1000; extremes: 264;\nRequested samples: 1000 Used samples: 980 Extremes: 264\nlarge : intelligence ~ wingspan * scales + (1 + wingspan | mountain)\nintelligence ~ wingspan + scales + (1 + wingspan | mountain)\n         stat df p.value\nLRT    1.3653  1  0.2426\nPBtest 1.3653     0.2701\n```\n:::\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons_dropx, lme_dragons_dropscale, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 33.89 sec; samples: 1000; extremes: 0;\nRequested samples: 1000 Used samples: 989 Extremes: 0\nlarge : intelligence ~ wingspan + scales + (1 + wingspan | mountain)\nintelligence ~ wingspan + (1 + wingspan | mountain)\n         stat df   p.value    \nLRT    28.364  1 1.005e-07 ***\nPBtest 28.364      0.00101 ** \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons_dropx, lme_dragons_dropwing, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 33.89 sec; samples: 1000; extremes: 17;\nRequested samples: 1000 Used samples: 829 Extremes: 17\nlarge : intelligence ~ wingspan + scales + (1 + wingspan | mountain)\nintelligence ~ scales + (1 + wingspan | mountain)\n         stat df  p.value   \nLRT    8.2598  1 0.004053 **\nPBtest 8.2598    0.021687 * \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nOn the basis of all of these results, you might choose to refine your model slightly, eliminating the unhelpful `wingspan:scales` interaction and making `lme_dragons_dropx` the working minimal model.\n\nWe can visualise that like so:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  facet_wrap(vars(mountain)) +\n  geom_point() +\n  geom_line(data = augment(lme_dragons_dropx), aes(y = .fitted))\n```\n\n::: {.cell-output-display}\n![](significance-and-model-comparison_files/figure-html/unnamed-chunk-24-1.png){width=672}\n:::\n:::\n\n:::\n\nWhat about the random effects, then?\n\nLet's test them first with LRTs (and AIC/BIC).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe construct two new models, one with each of the random effects dropped. We keep the fixed effects structure the same, so that the models are otherwise comparable.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_dropslope <- lmer(intelligence ~ wingspan*scales + (1|mountain), \n                              data = dragons)\n\nlme_dragons_dropint <- lmer(intelligence ~ wingspan*scales + (0 + wingspan|mountain), \n                            data = dragons)\n```\n:::\n\n\nThen, we use the `anova` function to compare:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_dragons, lme_dragons_dropint)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropint: intelligence ~ wingspan * scales + (0 + wingspan | mountain)\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropint    6 1643.9 1663.7 -815.95   1631.9                     \nlme_dragons            8 1647.8 1674.2 -815.92   1631.8 0.0691  2     0.9661\n```\n:::\n\n```{.r .cell-code}\nanova(lme_dragons, lme_dragons_dropslope)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropslope: intelligence ~ wingspan * scales + (1 | mountain)\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                      npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropslope    6 1737.9 1757.7 -862.95   1725.9                     \nlme_dragons              8 1647.8 1674.2 -815.92   1631.8 94.057  2  < 2.2e-16\n                         \nlme_dragons_dropslope    \nlme_dragons           ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThese results would seem to suggest that the random slopes are significant, but the random intercepts are not. \n\nThis is borne out by the change in information criteria values. When we remove `1|mountain`, both AIC and BIC decrease (by 3.9 and 10.5 respectively), suggesting improvement in the model quality - remember that lower values are better for these criteria. In contrast, when we remove `wingspan|moutnain`, both AIC and BIC increase by a large amount (by 90.1 and 83.5 respectively), suggesting we have worsened the quality of the model.\n\nBut, we know that the LRT p-values and AIC/BIC values for random effects aren't always great, so let's compare to a parametric bootstrap just to be sure.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons, lme_dragons_dropint, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 29.92 sec; samples: 1000; extremes: 638;\nRequested samples: 1000 Used samples: 737 Extremes: 638\nlarge : intelligence ~ wingspan * scales + (1 + wingspan | mountain)\nintelligence ~ wingspan * scales + (0 + wingspan | mountain)\n         stat df p.value\nLRT    0.0691  2  0.9661\nPBtest 0.0691     0.8659\n```\n:::\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons, lme_dragons_dropslope, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 29.09 sec; samples: 1000; extremes: 0;\nRequested samples: 1000 Used samples: 998 Extremes: 0\nlarge : intelligence ~ wingspan * scales + (1 + wingspan | mountain)\nintelligence ~ wingspan * scales + (1 | mountain)\n         stat df   p.value    \nLRT    94.057  2 < 2.2e-16 ***\nPBtest 94.057     0.001001 ** \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThe p-values are indeed different, but not different enough to change our conclusions. \n\nHowever: we would likely want to be cautious about dropping the random intercepts from this model. What does a random slopes-only model mean in biological terms? In this instance, it would suggest that all `mountain` ranges have the same baseline `intelligence` level when `wingspan` is very small/zero, but the rate of change based on their size (`intelligence ~ wingspan`) does vary between ranges. \n\nIs this biologically plausible? We're not tracking dragons across multiple time points here, so we can't say for sure, but this could reflect dragons in some mountain ranges learning quicker as they grow than dragons elsewhere due to better schools, in which case it might be plausible that they're all born with the same baseline `intelligence`. But it could also reflect different species of dragon living in each mountain range, in which case, it's very plausible that `intelligence` on average could vary between ranges (even if we're not observing it in this particular dataset).\n\nDo we need to reduce the number of random parameters in our model? Our dataset is not huge, for the number of variables we're testing. But our additive `lm_dragons_dropx` model with both random effects is converging sensibly. It might not be necessary.\n\n:::\n\n### Exercise 2 - Irrigation revisited\n\n\n{{< level 2 >}}\n\n\n\nOnce again, we'll return to a dataset from the previous section of the course, and the model we fitted to it.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nirrigation <- read_csv(\"data/irrigation.csv\")\n\nlme_yield <- lmer(yield ~ irrigation*variety + (1|field), data = irrigation)\n```\n:::\n\n:::\n\nCompare and contrast the results from likelihood ratio test and other methods, to assess:\n\n- the significance of the model overall\n- the significance/usefulness of the fixed predictors\n\nThere's no worked answer for this exercise, but you can use the code from the `sleepstudy` and `dragons` examples to scaffold your work.\n\nConsider also the random intercepts. If an LRT or bootstrap indicated that the random effect wasn't significant, would you drop the intercepts from the model? Why/why not? Feel free to chat to a neighbour or trainer to help make your decision.\n\n## Summary\n\nThis section showcases multiple methods of performing significance testing and model comparison for mixed effects models - but also introduces a broader debate as to when and how significance testing is actually useful for this type of model.\n\nIf you're interested in doing further reading on the different methods for significance testing, then [this article](https://link.springer.com/article/10.3758/s13428-016-0809-y) has a nice comparison of the methods discussed above, including how they perform in terms of type I (false positive) error rates.\n\n::: {.callout-tip}\n#### Key Points\n\n- Calculating p-values for mixed effects models is tricky, and must be done differently to a standard linear model, because there is no precise number of degrees of freedom\n- For fixed effects, p-values can be calculated using F-tests with approximations of degrees of freedom, likelihood ratio tests, t-to-z approximations or bootstrapping\n- For random effects, options are more limited to likelihood ratio tests or bootstrapping methods\n- AIC/BIC values and stepwise elimination procedures can also be used to provide information about fixed and/or random effects in a linear mixed model, and to aid with model comparison\n- Likelihood ratio tests and AIC/BIC values in particular rely heavily on the concept of deviance (goodness-of-fit)\n:::\n\n",
+    "markdown": "---\ntitle: \"Significance & model comparison\"\noutput: html_document\n---\n\n\n\n::: {.cell}\n\n:::\n\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\nWe'll primarily be using the `lmerTest` package for performing certain types of significance tests. The `pbkrtest` package is also introduced.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lmerTest)\nlibrary(pbkrtest)\n```\n:::\n\n:::\n\n## The problem \n\nUnlike standard linear models, p-values are not calculated automatically for a mixed effects model in `lme4`, as you may have noticed in the previous section of the materials. There is a little extra work and thought that goes into testing significance for these models.\n\nThe reason for this is the inclusion of random effects, and the way that random effects are estimated. When using partial pooling to estimate the random effects, there is no way to precisely determine the number of **degrees of freedom**. \n\nThis matters, because we need to know the degrees of freedom to calculate p-values in the way we usually do for a linear model (see the drop-down box below if you want a more detailed explanation for this).\n\n::: {.callout-note collapse=\"true\"}\n#### Degrees of freedom & p-values\n\nThe degrees of freedom in a statistical analysis refers to the number of observations in the dataset that are free to vary (i.e., free to take any value) once the necessary parameters have been estimated. This means that the degrees of freedom varies with both the sample size, and the complexity of the model you've fitted.\n\nWhy does this matter? Well, each test statistic (such as F, t, chi-square etc.) has its own distribution, from which we can derive the probability of that statistic taking a certain value. That's precisely what a p-value is: the probability of having collected a sample with this particular test statistic, if the null hypothesis were true. \n\nCrucially, the exact shape of this distribution is determined by the number of degrees of freedom. This means we need to know the degrees of freedom in order to calculate the correct p-value for each of our test statistics.\n:::\n\nHowever, when we fit a mixed effects model, we may still want to be able to discuss significance of a) our overall model and b) individual predictors within our model.\n\n## Overall model significance\n\nLikelihood ratio tests (LRTs) are used to compare goodness-of-fit, or deviance, between two models in order to produce p-values. They don't require us to know the degrees of freedom of those models.\n\nOne use of an LRT is to check the significance of our model as a whole, although we'll revisit the LRT in later sections of this page as well.\n\n::: {.callout-note collapse=\"true\"}\n#### What makes this test a \"likelihood ratio\"? \n\nRemember that mixed effects models are fitted by maximising their likelihood, which is defined as the joint probability of the sample given a particular set of parameters (i.e., how likely is it that this particular set of data points would occur, given a model with this equation?).\n\nEach distinct mixed model that is fitted to a given dataset therefore has its own value of likelihood. It will also, therefore, have its own value of deviance. Deviance is defined as the difference in log-likelihoods between a candidate model, and the hypothetical perfect \"saturated\" model for that dataset.\n\nSo, when we want to compare two models, we can calculate the ratio of their individual likelihoods (which is mathematically equivalent to the difference of their deviances, because of how logarithms work). This ratio can be thought of as a statistic in its own right, and approximately follows a chi-square distribution. \n\nTo determine whether this ratio is significantly different from 1, we calculate the degrees of freedom for the analysis - which is equal to the difference in the number of parameters between the two models we're comparing - to find the corresponding chi-square distribution, from which we can then calculate a p-value.\n:::\n\nLet's try this out on the trusty `sleepstudy` dataset. We create both our candidate model, `lm_sleep`, and a null model, `lm_null` (note, we have to do this using the `lm` function rather than `lmer`)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(\"sleepstudy\")\n\nlme_sleep <- lmer(Reaction ~ Days + (1 + Days|Subject),\n                   data = sleepstudy)\n\nlm_null <- lm(Reaction ~ 1, data = sleepstudy)\n```\n:::\n\n:::\n\nThen, we use the old faithful `anova` function to compare our candidate model to the null model, by calling them one after the other. Note that we have to call our candidate model first; if you list the null model first, you'll get an error.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep, lm_null)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlm_null: Reaction ~ 1\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlm_null      2 1965.0 1971.4 -980.52   1961.0                         \nlme_sleep    6 1763.9 1783.1 -875.97   1751.9 209.11  4  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis table gives us the $\\chi^2$ statistic (i.e., the likelihood ratio) and an associated p-value. Here, the $\\chi^2$ is large and the p-value small, meaning that our model is significantly better than the null.\n\nA helpful, intuitive way to think about this test is: for the increase in complexity of my candidate model (vs the null model), has the deviance of the model decreased significantly? Or: given the number of predictors in my model, has the goodness-of-fit improved significantly from the null?\n\n::: {.callout-note collapse=\"true\"}\n#### Refitting using ML (instead of ReML)\n\nNote the warning/information message R provides when we use the `anova` function this way: \"refitting model(s) with ML (instead of REML)\".\n\nR, or more specifically the `anova` function, has done something helpful for us here. For reasons that we won't go into too much (though, feel free to ask if you're curious!), we cannot use LRTs to compare models that have been fitted with the ReML method, even though this is the standard method for the `lme4` package. So we must refit the model with ML.\n\n(Incidentally, we could have chosen to fit the models manually with ML, if we'd wanted to. The `lmer` function takes an optional `REML` argument that we can set to FALSE - it's set to TRUE by default. But letting the `anova` function do it for us is much easier!)\n:::\n\n## Fixed effects\n\nIn addition to asking about the model as a whole, we often want to know about individual predictors. Because it's simpler, we'll talk about fixed predictors first.\n\nThere are multiple methods for doing this. We'll step through the some of the most popular in a bit of detail:\n\n- Likelihood ratio tests\n- F-tests using approximations of degrees of freedom\n- t-to-z approximations (Wald tests)\n- Bootstrapping\n\n### Method 1: Likelihood ratio tests (LRTs)\n\nAs we mentioned above, LRTs are useful for comparing the model as a whole to the null - but they can also be used to investigate individual predictors.\n\nCrucially, we are only able to use this sort of test when one of the two models that we are comparing is a \"simpler\" version of the other, i.e., one model has a subset of the parameters of the other model. \n\nSo while we could perform an LRT just fine between two models `Y ~ A + B + C` and `Y ~ A + B + C + D`, to investigate the effect of `D`, or between any model and the null (`Y ~ 1`), we would not be able to use this test to compare `Y ~ A + B + C` and `Y ~ A + B + D`.\n\n![Two ways to use likelihood ratio tests](images_mixed-effects/LRT_schematic.png){width=70%}\n\nLet's use an LRT to test the fixed effect of `Days` in our `sleepstudy` example. First, we'll fit a random-effects-only model (we do this by replacing `Days` with `1`, to indicate no fixed effects).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_sleep_random <- lmer(Reaction ~ 1 + (1 + Days|Subject),\n                   data = sleepstudy)\n```\n:::\n\n:::\n\nThen we use `anova` to compare them, again putting our more complex model first.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep, lme_sleep_random)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlme_sleep_random: Reaction ~ 1 + (1 + Days | Subject)\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n                 npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlme_sleep_random    5 1785.5 1801.4 -887.74   1775.5                         \nlme_sleep           6 1763.9 1783.1 -875.97   1751.9 23.537  1  1.226e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThis output tells us that, for the reduction in the number of parameters (i.e., removing `Days`), the difference in deviances is significantly big. In other words, a fixed effect of `Days` is meaningful and useful when predicting reaction times.\n\n### Method 2: Approximation of the degrees of freedom\n\nThis method is perhaps the most intuitive for those coming from a linear modelling background. Put simply, it involves making an educated guess about the degrees of freedom with some formulae, and then deriving a p-value as we usually would. \n\nThis lets us obtain p-values for any t- and F-values that are calculated, with just the one extra step compared to what we're used to with linear models.\n\nFor this approach, we will use the companion package to `lme4`, a package called `lmerTest`.\n\n::: {.callout-note collapse=\"true\"}\n#### lmerTest\n\nThe package provides an \"updated\" version of the `lmer()` function, one that can approximate the number of degrees of freedom, and thus provide estimated p-values.\n\nIf you have `lmerTest` loaded, R will automatically default to its updated version of the `lmer()` function, and perform the degrees of freedom approximation as standard. (You can prevent it from doing so by typing `lme4::lmer()` instead.)\n:::\n\nLet's look again at our random slopes & intercepts model for the `sleepstudy` dataset as a test case. We'll refit the model once we've loaded the new package.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lmerTest)\n\nlme_sleep <- lmer(Reaction ~ Days + (1 + Days|Subject),\n                   data = sleepstudy)\n```\n:::\n\n:::\n\nThe new version of the `lmer` function fits a very similar model object to before, except now it contains the outputs of a number of calculations that are required for the degrees of freedom approximation. By default, `lmerTest` uses the Satterthwaite approximation, which is appropriate for mixed models that are fitted using either MLE or ReML, making it pretty flexible.\n\nWe'll use the `anova` function from the `lmerTest` package to produce an analysis of variance table (R will default to using this version of the function unless told otherwise). This gives us an estimate for the F-statistic and associated p-value for our fixed effect of `Days`:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType III Analysis of Variance Table with Satterthwaite's method\n     Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    \nDays  30031   30031     1    17  45.853 3.264e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\n::: {.callout-note collapse=\"true\"}\n#### F-statistics vs t-statistics\n\nIf you were to look at the summary for our new `lme_sleep` model, you'd notice some t-statistics and p-values appearing next to the fixed effects. These are **not quite the same** as the F-statistics and p-values that we've extracted using the `anova` function.\n\nIn fact, this odd distinction between t-statistics and F-statistics is not unique to mixed models; you might remember it from linear modelling. The t-statistics are what we call \"Wald tests\" (more coming up on those in the next section) and test the null hypothesis that the coefficient $\\beta = 0$ for that predictor. This might not sound *too* dissimilar from what an analysis of variance F-test is assessing - and for continuous predictors, the result is usually very similar. But for a categorical predictor, you will see separate Wald tests for each pairwise comparison against the reference group, while you would only see a single F-statistic for the lot.\n:::\n\n#### Using the Kenward-Roger approximation\n\nAlthough the Satterthwaite approximation is the `lmerTest` default, another option called the Kenward-Roger approximation also exists. It's less popular than Satterthwaite because it's a bit less flexible (it can only be applied to models fitted with ReML). \n\nIf you wanted to switch to the Kenward-Roger approximation, you can do it easily by specifying the `ddf` argument:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_sleep, ddf = \"Kenward-Roger\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType III Analysis of Variance Table with Kenward-Roger's method\n     Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    \nDays  30031   30031     1    17  45.853 3.264e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nIn reality, though, chances are that you'll just stick with the Satterthwaite default if you plan to use approximations for your own analyses. Statisticians have debated the relative merits of Satterthwaite versus Kenward-Roger, but the differences only really tend to emerge under specific conditions. Here, it's given us the same result.\n\n### Method 3: t-to-z approximations\n\nThis is a more unusual method, and another form of approximation. You'll see this less often, but it's included here for completeness. \n\nThis method involves making use of the Wald t-values, which are reported as standard in the `lme4` output.\n\nSpecifically, we can choose to treat these t-values as if they were z-scores instead, if our sample size is considered large enough. And, because z-scores are standardised, we don't need any degrees of freedom information to derive a p-value - we can just read them directly out of a table (or get R to do it for us).\n\n::: {.callout-note collapse=\"true\"}\n#### The logic of using z-scores instead\n\nA z-score is different from a statistic such as t or F. They're standardised, because they're measured in standard deviations - i.e., a z-score of 1.3 tells you that you are 1.3 standard deviations away from the mean. \n\nThis is helpful for deriving a p-value without degrees of freedom, but it raises the question: why is it okay to treat t-values as z-scores? \n\nThe logic here is that the t distribution actually begins to approximate (i.e., match up with) the z distribution as the sample size increases. Officially, when the sample size is infinite, the two distributions are identical. So, with a sufficiently large sample size, we can \"pretend\" or \"imagine\" that the Wald t-values are actually z-distributed, giving us p-values. \n:::\n\nUnfortunately, there are no formal guidelines to tell you whether your dataset is \"large enough\" to do this. It will depend on the number and type of predictors in your model. Plus, the t-to-z approximation is considered to be \"anti-conservative\" - in other words, there's a higher chance of false positives than with other methods.\n\nSome researchers adapt the t-to-z approximation approach a little to help with this; instead of explicitly calculating p-values, they instead use a rule of thumb that any Wald t-value greater than 2 is large enough to be considered significant. This is quite a strict threshold, so it can help to filter out some of the false positives or less convincing results.\n\nCalculating the p-value for a z-score can be done quickly in R using the `pnorm` function. We include the z-score (or, here, the t-value that we are treating as a z-score) as the value for our argument `q`. To make this a two-tailed test, we have to set `lower.tail` to FALSE, and multiply the answer by 2.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary(lme_sleep)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLinear mixed model fit by REML. t-tests use Satterthwaite's method [\nlmerModLmerTest]\nFormula: Reaction ~ Days + (1 + Days | Subject)\n   Data: sleepstudy\n\nREML criterion at convergence: 1743.6\n\nScaled residuals: \n    Min      1Q  Median      3Q     Max \n-3.9536 -0.4634  0.0231  0.4634  5.1793 \n\nRandom effects:\n Groups   Name        Variance Std.Dev. Corr\n Subject  (Intercept) 612.10   24.741       \n          Days         35.07    5.922   0.07\n Residual             654.94   25.592       \nNumber of obs: 180, groups:  Subject, 18\n\nFixed effects:\n            Estimate Std. Error      df t value Pr(>|t|)    \n(Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***\nDays          10.467      1.546  17.000   6.771 3.26e-06 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nCorrelation of Fixed Effects:\n     (Intr)\nDays -0.138\n```\n:::\n\n```{.r .cell-code}\n2*pnorm(q = 6.771, lower.tail = FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1.278953e-11\n```\n:::\n:::\n\n:::\n\nIf we input the t-value for our `Days` fixed effect, we can see that it gives us a very small p-value. This p-value of 1.28 x 10^-11^ is quite a bit smaller than the one that our Satterthwaite degrees of freedom approximation provided (3.26 x 10^-6^) - an example of how this t-to-z approximation is more generous. However, in this case it's very clear that the `Days` effect definitely is significant, whichever way we test it, so it's perhaps not a concern.\n\n### Method 4: Bootstrapping\n\nNow, we get a little bit more technical. \n\nEntire pages of course materials could be dedicated to bootstrapping and simulation methods. These ideas go well beyond linear mixed models. But, now is not the time for all that.\n\nWe're going to look at one implementation of bootstrapping for mixed models, as an example, but if you're curious then a good place to start follow-up reading is [this excellent resource](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#testing-significance-of-random-effects).\n\nThe specific option we'll look at is performing parametric bootstrapping via the `PBmodcomp` function from the `pbkrtest` package.\n\nThis method involves:\n\n1. Simulating a bunch of datasets (specifically, based on the \"reduced\" or less complex model)\n2. For each simulated dataset, fit both models\n3. For each simulated dataset, compute the difference in deviances between the two models, to provide a distribution of differences in deviances\n4. Compare this distribution to the actual/observed difference in deviances\n\nThe syntax is very similar to the `anova` function, but you also set a seed. \n\n(This is something that's often done when simulating in general; it ensures that each time you run the code, you'll get the same set of numbers, so long as you use the same seed. You can choose whatever number you like.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_sleep, lme_sleep_random, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 32.61 sec; samples: 1000; extremes: 0;\nRequested samples: 1000 Used samples: 998 Extremes: 0\nlarge : Reaction ~ Days + (1 + Days | Subject)\nReaction ~ 1 + (1 + Days | Subject)\n         stat df   p.value    \nLRT    23.537  1 1.226e-06 ***\nPBtest 23.537     0.001001 ** \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nIt takes several seconds, because running 1000 simulations and fitting 2000 models isn't instantaneous. You may also get a bunch of warnings (they've been suppressed here for these course materials, but don't be alarmed if they appear for you when running this example).\n\nBut, as you can see, the p-value it produces is not necessarily the same as the one produced by a standard LRT.\n\n### Choosing the \"right\" method\n\nSeveral methods have been discussed here. Lots of researchers favour either the F-tests by degrees of freedom approximation, or the likelihood ratio test (LRT) for fixed effects, because they're relatively easy to implement - hence why we've spent slightly more time on them. \n\nIf we had to choose, we personally favour the LRT, because it's generalisable to any type of model that's fitted with maximum likelihood estimation, making it a very useful addition to a researcher's statistical toolkit.\n\nThose with more coding or theoretical background, however, might feel strongly that bootstrapping is always a more appropriate method for deriving p-values. And they might well be right. There's no strict answers once we get this far beyond the standard linear model.\n\nIt's worth noting that there's nothing stopping you using more than one approach when it comes to testing your own models, and \"triangulating\" the results to help you determine how robust your conclusions are. \n\n## Random effects\n\nWith fixed effects under our belt, let's now move to thinking about random effects.\n\nThere is a broader philosophical question to be asked here: what does it even mean for a random effect to be \"significant\"?\n\nRemember that a random effect is not a single coefficient. It's a measure of the distribution across a set of clusters or groups. Quite often, we include a random effect simply to account for it, to better represent our design, not because we want to treat it as a \"predictor\" in the traditional sense.\n\nPerhaps a better way to think about it is: **is my model better with or without this random effect?**\n\nOr even: **is there a need to test significance at all?**\n\nWe'll talk through a few different approaches:\n\n- Using LRTs (with caveats)\n- Using AIC/BIC (also with caveats)\n- Bootstrapping\n- Not testing at all\n\n### Method 1: Using LRTs\n\nThe most common method that you'll see used for judging whether random effects improve a model is the trusty LRT.\n\n::: {.callout-warning collapse=\"true\"}\n#### The major caveat with LRTs for random effects\n\nThough you'll see LRTs used often for random effects, *technically* this doesn't provide great estimates.\n\nWhen we run such a test, we're essentially asking whether the variance of our chosen random effect is equal to zero (i.e., our null hypothesis is $\\sigma^2 = 0$). But, as a statistican might point out, 0 is \"on the boundary of the feasible space\" - in other words, 0 is the lowest possible value that the variance could ever be.\n\nBecause of this, the various approximations to distributions that we rely on for the maths of an LRT to work, kind of break down. The result is that the p-values calculated for LRTs are very conservative, i.e., too large/strict.\n\nIn the simplest case, testing simple random effects one at a time, the p-value is approximately twice as large as it should be. And the problem gets worse when testing multiple correlated random effects (see bonus materials for more info on these correlations).\n\nThis doesn't stop people using them for this purpose, and it doesn't have to stop you. But it's something you should really be aware of if you choose this method.\n:::\n\nThe approach is much the same as for fixed effects: construct two nested models, with and without the effect of interest. \n\nThen, use the `anova` function to perform the LRT.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_sleep_intercepts <- lmer(Reaction ~ Days + (1|Subject),\n                   data = sleepstudy)\n\nanova(lme_sleep, lme_sleep_intercepts)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlme_sleep_intercepts: Reaction ~ Days + (1 | Subject)\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n                     npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_sleep_intercepts    4 1802.1 1814.8 -897.04   1794.1                     \nlme_sleep               6 1763.9 1783.1 -875.97   1751.9 42.139  2  7.072e-10\n                        \nlme_sleep_intercepts    \nlme_sleep            ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nOnce again, there is a significant difference between the two models, as seen by our small p-value. This tells us that the random slopes of `Days|Subject` is meaningful, and makes a difference in our model.\n\nYou can even use the `anova` function to compare models with and without random effects entirely, i.e., compare a linear mixed model to a linear model. \n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlm_sleep <- lm(Reaction ~ Days, data = sleepstudy)\n\nanova(lme_sleep, lm_sleep)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: sleepstudy\nModels:\nlm_sleep: Reaction ~ Days\nlme_sleep: Reaction ~ Days + (1 + Days | Subject)\n          npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlm_sleep     3 1906.3 1915.9 -950.15   1900.3                         \nlme_sleep    6 1763.9 1783.1 -875.97   1751.9 148.35  3  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nMake sure you call the linear mixed model (i.e., the more complex model) first, because you will get an error if you put the two models the wrong way around here.\n\n### Method 2: AIC/BIC values\n\nSome researchers use model comparison procedures, such as stepwise elimination, to decide whether or not to keep or drop certain random effects from their models.\n\nAs you may have noticed in the outputs from all of the LRTs above, the `anova` function automatically provides Akaike information criterion (AIC) and Bayesian information criterion (BIC) values for the different nested models.\n\nFor instance, when comparing `lm_sleep` and `lme_sleep` above, we can see that the linear model has larger AIC/BIC values (and greater deviance, i.e., worse goodness-of-fit) than the linear mixed model with our random slopes & intercepts in it.\n\n::: {.callout-warning collapse=\"true\"}\n#### The same caveat as with LRTs\n\nUsing AIC/BIC to make decisions about random effects is subject to **the same caveat as for LRTs**: the values you get for these information criteria end up being overly conservative.\n\nIn other words, AIC/BIC values can give an underestimation of the importance or use of a random effect in a linear mixed model, perhaps leading you to drop it even if it's helpful.\n:::\n\n### Method 3: Bootstrapping\n\nAs we did above for the fixed effects, we can use parametric bootstrapping to investigate random effects.\n\nIt works in exactly the same way: feed in two models, one with and one without the random effect that you're interested in testing, and don't forget to pick a value to set the seed.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_sleep, lme_sleep_intercepts, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 25.19 sec; samples: 1000; extremes: 0;\nlarge : Reaction ~ Days + (1 + Days | Subject)\nReaction ~ Days + (1 | Subject)\n         stat df   p.value    \nLRT    42.139  2 7.072e-10 ***\nPBtest 42.139     0.000999 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nOnce again, you may get a long list of warnings as it simulates and fits models to a bunch of different datasets.\n\n### Method 4: Not testing at all\n\nThis might seem like a bit of an odd concept, especially placed where it is at the end of a page all about significance testing.\n\nAnd of course, we're not advocating for throwing all the possible random effects into an overly complicated model and just accepting whatever numbers fall out. You're still aiming for parsimony, and your model should still represent what's actually going on in your experimental design.\n\nBut, many people - including those with far more experience in mixed models than us - argue that you shouldn't drop a random effect simply because a p-value or AIC/BIC value tells you so. If that random effect is truly important in representing the design and structure of your dataset, then your model is better served by containing it. \n\nIn other words, it's meaningful because of the experimental design, not because of the numbers that come out of your model.\n\nThis philosophical stance is particularly applicable in situations where you're including random effects simply to account for the hierarchical, non-independent structure in your data, because you're interested in the overall or average trends.\n\n::: {.callout-note}\n#### A final thing to add...\n\nSome of the people who take this stance (including authors of some of the packages we've used) might argue that significance is no more important, or is even less important, than the *uncertainty* of the random effects. How confident are we that we've estimated the variance correctly? What are the confidence intervals within which the variance falls?\n\nNow, that really is a can of worms we're not going to open here, but you might be interested to know that packages exist for computing these confidence intervals; `lme4` even comes with a function for it.\n\nIf you're curious, you could start some follow-up reading [here](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#inference-and-confidence-intervals).\n:::\n\n## Exercises\n\n### Dragons revisited {#sec-exr_dragons2}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nLet's return to the dataset from a previous exercise, [Exercise -@sec-exr_dragons].\n\nPreviously, we fit a mixed model to this dataset that included response variable `intelligence`, fixed effects of `wingspan`, `scales` and `wingspan:colour`, and two random effects: random intercepts `1|mountain`, and random slopes for `wingspan|mountain`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\ndragons <- read_csv(\"data/dragons.csv\")\n\nlme_dragons <- lmer(intelligence ~ wingspan*scales + (1 + wingspan|mountain), \n                    data = dragons)\n```\n:::\n\n:::\n\nUse likelihood ratio tests to assess:\n\n- whether the model above is significant versus the null model\n- whether the fixed effects are significant\n\nIf you're feeling adventurous, you can also:\n\n- use LRTs, AIC and/or bootstrapping to assess the random effects, and compare the results\n- use other methods to assess the significance of the fixed effects, and compare the results\n\n::: {.callout-tip collapse=\"true\"}\n#### Worked answer\n\nLet's start by using an LRT to test the overall significance of our model. We'll construct a null model, and then use `anova` to compare it to our model.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_null <- lm(intelligence ~ 1, data = dragons)\n\nanova(lme_dragons, lme_dragons_null)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_null: intelligence ~ 1\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                 npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    \nlme_dragons_null    2 1997.0 2003.6 -996.51   1993.0                         \nlme_dragons         8 1647.8 1674.2 -815.92   1631.8 361.18  6  < 2.2e-16 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nIt's significant. Something in our model is doing something helpful. A really good start!\n\nNext, we'll use LRTs to test the significance of our individual fixed effects. \n\nWe'll start with the interaction. To test this, we'll build an additive model, and compare it to our original full model. For the models to be comparable, we'll keep the random effects structure the same.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_dropx <- lmer(intelligence ~ wingspan + scales + (1 + wingspan|mountain), \n                          data = dragons)\n\nanova(lme_dragons, lme_dragons_dropx)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropx: intelligence ~ wingspan + scales + (1 + wingspan | mountain)\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                  npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropx    7 1647.2 1670.3 -816.60   1633.2                     \nlme_dragons          8 1647.8 1674.2 -815.92   1631.8 1.3648  1     0.2427\n```\n:::\n:::\n\n:::\n\nThe test isn't significant. This tells us that the `wingspan:scales` interaction wasn't doing anything meaningful in this model.\n\nNow, we're going to test the main effects of `scales` and `wingspan` by constructing two new models and comparing them to our additive model. (In this way, we're performing something a little bit like a stepwise elimination procedure.)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nFirst, we'll test the interaction term by comparing our additive model to our original full model.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_dropscale <- lmer(intelligence ~ wingspan + (1 + wingspan|mountain), \n                              data = dragons)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nunable to evaluate scaled gradient\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nModel failed to converge: degenerate Hessian with 1 negative eigenvalues\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: Model failed to converge with 1 negative eigenvalue: -2.6e+00\n```\n:::\n\n```{.r .cell-code}\nanova(lme_dragons_dropx, lme_dragons_dropscale)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropscale: intelligence ~ wingspan + (1 + wingspan | mountain)\nlme_dragons_dropx: intelligence ~ wingspan + scales + (1 + wingspan | mountain)\n                      npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropscale    6 1673.6 1693.3 -830.78   1661.6                     \nlme_dragons_dropx        7 1647.2 1670.3 -816.60   1633.2 28.359  1  1.008e-07\n                         \nlme_dragons_dropscale    \nlme_dragons_dropx     ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\nlme_dragons_dropwing <- lmer(intelligence ~ scales + (1 + wingspan|mountain), \n                            data = dragons)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning in checkConv(attr(opt, \"derivs\"), opt$par, ctrl = control$checkConv, :\nModel failed to converge with max|grad| = 0.003579 (tol = 0.002, component 1)\n```\n:::\n\n```{.r .cell-code}\nanova(lme_dragons_dropx, lme_dragons_dropwing)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropwing: intelligence ~ scales + (1 + wingspan | mountain)\nlme_dragons_dropx: intelligence ~ wingspan + scales + (1 + wingspan | mountain)\n                     npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropwing    6 1653.5 1673.2 -820.73   1641.5                     \nlme_dragons_dropx       7 1647.2 1670.3 -816.60   1633.2 8.2604  1   0.004052\n                       \nlme_dragons_dropwing   \nlme_dragons_dropx    **\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nBoth of these tests come out as significant. This suggests that both fixed effects for `wingspan` and `scales` are making meaningful contributions to our model.\n\nComfortingly, this aligns with what we see in an analysis of variance table using a Satterthwaite degrees of freedom approximation, which shows overall that there seem to be main effects though no significant interaction. The p-values are not the same - we wouldn't expect them to be, they're calculated very differently - but it's a relief that the overall effect is robust across methods:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_dragons)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType III Analysis of Variance Table with Satterthwaite's method\n                 Sum Sq Mean Sq NumDF   DenDF F value  Pr(>F)   \nwingspan        3059.90 3059.90     1   3.992 16.8644 0.01483 * \nscales          1923.44 1923.44     1 188.766 10.6008 0.00134 **\nwingspan:scales  242.84  242.84     1 188.380  1.3384 0.24878   \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nWe would draw the same overall conclusion using t-to-z approximations as well (using the t-values, extracted from the output of the `summary` function). Excellent news.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nThe interaction term:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n2*pnorm(q = -1.157, lower.tail = FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 1.752728\n```\n:::\n:::\n\n\nThe main effect of scales:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n2*pnorm(q = 3.256, lower.tail = FALSE) # scales main effect\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.001129938\n```\n:::\n:::\n\n\nThe main effect of wingspan:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n2*pnorm(q = 4.244, lower.tail = FALSE) # wingspan main effect\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2.195704e-05\n```\n:::\n:::\n\n:::\n\nAnd finally, you can check the results from a parametric bootstrap (once again, the warnings have been suppressed here), which yet again agree with the prior tests:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons, lme_dragons_dropx, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 33.54 sec; samples: 1000; extremes: 223;\nRequested samples: 1000 Used samples: 979 Extremes: 223\nlarge : intelligence ~ wingspan * scales + (1 + wingspan | mountain)\nintelligence ~ wingspan + scales + (1 + wingspan | mountain)\n         stat df p.value\nLRT    1.3653  1  0.2426\nPBtest 1.3653     0.2286\n```\n:::\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons_dropx, lme_dragons_dropscale, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 31.09 sec; samples: 1000; extremes: 0;\nRequested samples: 1000 Used samples: 986 Extremes: 0\nlarge : intelligence ~ wingspan + scales + (1 + wingspan | mountain)\nintelligence ~ wingspan + (1 + wingspan | mountain)\n         stat df   p.value    \nLRT    28.364  1 1.005e-07 ***\nPBtest 28.364     0.001013 ** \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons_dropx, lme_dragons_dropwing, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 30.47 sec; samples: 1000; extremes: 13;\nRequested samples: 1000 Used samples: 855 Extremes: 13\nlarge : intelligence ~ wingspan + scales + (1 + wingspan | mountain)\nintelligence ~ scales + (1 + wingspan | mountain)\n         stat df  p.value   \nLRT    8.2598  1 0.004053 **\nPBtest 8.2598    0.016355 * \n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nOn the basis of all of these results, you might choose to refine your model slightly, eliminating the unhelpful `wingspan:scales` interaction and making `lme_dragons_dropx` the working minimal model.\n\nWe can visualise that like so:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(dragons, aes(x = wingspan, y = intelligence, colour = scales)) +\n  facet_wrap(vars(mountain)) +\n  geom_point() +\n  geom_line(data = augment(lme_dragons_dropx), aes(y = .fitted))\n```\n\n::: {.cell-output-display}\n![](significance-and-model-comparison_files/figure-html/unnamed-chunk-24-1.png){width=672}\n:::\n:::\n\n:::\n\nWhat about the random effects, then?\n\nLet's test them first with LRTs (and AIC/BIC).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe construct two new models, one with each of the random effects dropped. We keep the fixed effects structure the same, so that the models are otherwise comparable.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlme_dragons_dropslope <- lmer(intelligence ~ wingspan*scales + (1|mountain), \n                              data = dragons)\n\nlme_dragons_dropint <- lmer(intelligence ~ wingspan*scales + (0 + wingspan|mountain), \n                            data = dragons)\n```\n:::\n\n\nThen, we use the `anova` function to compare:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nanova(lme_dragons, lme_dragons_dropint)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropint: intelligence ~ wingspan * scales + (0 + wingspan | mountain)\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropint    6 1643.9 1663.7 -815.95   1631.9                     \nlme_dragons            8 1647.8 1674.2 -815.92   1631.8 0.0691  2     0.9661\n```\n:::\n\n```{.r .cell-code}\nanova(lme_dragons, lme_dragons_dropslope)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nrefitting model(s) with ML (instead of REML)\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nData: dragons\nModels:\nlme_dragons_dropslope: intelligence ~ wingspan * scales + (1 | mountain)\nlme_dragons: intelligence ~ wingspan * scales + (1 + wingspan | mountain)\n                      npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)\nlme_dragons_dropslope    6 1737.9 1757.7 -862.95   1725.9                     \nlme_dragons              8 1647.8 1674.2 -815.92   1631.8 94.057  2  < 2.2e-16\n                         \nlme_dragons_dropslope    \nlme_dragons           ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThese results would seem to suggest that the random slopes are significant, but the random intercepts are not. \n\nThis is borne out by the change in information criteria values. When we remove `1|mountain`, both AIC and BIC decrease (by 3.9 and 10.5 respectively), suggesting improvement in the model quality - remember that lower values are better for these criteria. In contrast, when we remove `wingspan|moutnain`, both AIC and BIC increase by a large amount (by 90.1 and 83.5 respectively), suggesting we have worsened the quality of the model.\n\nBut, we know that the LRT p-values and AIC/BIC values for random effects aren't always great, so let's compare to a parametric bootstrap just to be sure.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons, lme_dragons_dropint, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 21.87 sec; samples: 1000; extremes: 628;\nRequested samples: 1000 Used samples: 739 Extremes: 628\nlarge : intelligence ~ wingspan * scales + (1 + wingspan | mountain)\nintelligence ~ wingspan * scales + (0 + wingspan | mountain)\n         stat df p.value\nLRT    0.0691  2  0.9661\nPBtest 0.0691     0.8500\n```\n:::\n\n```{.r .cell-code}\npbkrtest::PBmodcomp(lme_dragons, lme_dragons_dropslope, seed = 20)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBootstrap test; time: 23.14 sec; samples: 1000; extremes: 0;\nlarge : intelligence ~ wingspan * scales + (1 + wingspan | mountain)\nintelligence ~ wingspan * scales + (1 | mountain)\n         stat df   p.value    \nLRT    94.057  2 < 2.2e-16 ***\nPBtest 94.057     0.000999 ***\n---\nSignif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n```\n:::\n:::\n\n:::\n\nThe p-values are indeed different, but not different enough to change our conclusions. \n\nHowever: we would likely want to be cautious about dropping the random intercepts from this model. What does a random slopes-only model mean in biological terms? In this instance, it would suggest that all `mountain` ranges have the same baseline `intelligence` level when `wingspan` is very small/zero, but the rate of change based on their size (`intelligence ~ wingspan`) does vary between ranges. \n\nIs this biologically plausible? We're not tracking dragons across multiple time points here, so we can't say for sure, but this could reflect dragons in some mountain ranges learning quicker as they grow than dragons elsewhere due to better schools, in which case it might be plausible that they're all born with the same baseline `intelligence`. But it could also reflect different species of dragon living in each mountain range, in which case, it's very plausible that `intelligence` on average could vary between ranges (even if we're not observing it in this particular dataset).\n\nDo we need to reduce the number of random parameters in our model? Our dataset is not huge, for the number of variables we're testing. But our additive `lm_dragons_dropx` model with both random effects is converging sensibly. It might not be necessary.\n\n:::\n\n:::\n\n### Irrigation revisited {#sec-exr_irrigation2}\n\n::: {.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nOnce again, we'll return to a dataset from the previous section of the course, this time [Exercise -@sec-exr_irrigation], and the model we fitted to it.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n::: {.cell}\n\n```{.r .cell-code}\nirrigation <- read_csv(\"data/irrigation.csv\")\n\nlme_yield <- lmer(yield ~ irrigation*variety + (1|field), data = irrigation)\n```\n:::\n\n:::\n\nCompare and contrast the results from likelihood ratio test and other methods, to assess:\n\n- the significance of the model overall\n- the significance/usefulness of the fixed predictors\n\nThere's no worked answer for this exercise, but you can use the code from the `sleepstudy` and `dragons` examples to scaffold your work.\n\nConsider also the random intercepts. If an LRT or bootstrap indicated that the random effect wasn't significant, would you drop the intercepts from the model? Why/why not? Feel free to chat to a neighbour or trainer to help make your decision.\n\n:::\n\n## Summary\n\nThis section showcases multiple methods of performing significance testing and model comparison for mixed effects models - but also introduces a broader debate as to when and how significance testing is actually useful for this type of model.\n\nIf you're interested in doing further reading on the different methods for significance testing, then [this article](https://link.springer.com/article/10.3758/s13428-016-0809-y) has a nice comparison of the methods discussed above, including how they perform in terms of type I (false positive) error rates.\n\n::: {.callout-tip}\n#### Key Points\n\n- Calculating p-values for mixed effects models is tricky, and must be done differently to a standard linear model, because there is no precise number of degrees of freedom\n- For fixed effects, p-values can be calculated using F-tests with approximations of degrees of freedom, likelihood ratio tests, t-to-z approximations or bootstrapping\n- For random effects, options are more limited to likelihood ratio tests or bootstrapping methods\n- AIC/BIC values and stepwise elimination procedures can also be used to provide information about fixed and/or random effects in a linear mixed model, and to aid with model comparison\n- Likelihood ratio tests and AIC/BIC values in particular rely heavily on the concept of deviance (goodness-of-fit)\n:::\n\n",
     "supporting": [
       "significance-and-model-comparison_files"
     ],
diff --git a/index.md b/index.md
index bfda8a9..321db80 100644
--- a/index.md
+++ b/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Mixed effects models"
-author: "Vicki Hodgson"
+author: "Vicki Hodgson*, Hugo Tavares, Paul Fannon, Martin van Rongen"
 date: today
 number-sections: false
 ---
@@ -33,7 +33,7 @@ You should have a working knowledge of R/RStudio, and a grasp of core statistics
 Exercises in these materials are labelled according to their level of difficulty:
 
 | Level | Description |
-| ----: | :---------- |
+| :-: | :----------- |
 | {{< fa solid star >}} {{< fa regular star >}} {{< fa regular star >}} | Exercises in level 1 are simpler and designed to get you familiar with the concepts and syntax covered in the course. |
 | {{< fa solid star >}} {{< fa solid star >}} {{< fa regular star >}} | Exercises in level 2 combine different concepts together and apply it to a given task. |
 | {{< fa solid star >}} {{< fa solid star >}} {{< fa solid star >}} | Exercises in level 3 require going beyond the concepts and syntax introduced to solve new problems. |
@@ -68,16 +68,18 @@ About the authors:
 
 ## References
 
-Bolker, B. (2023, 8 October). *GLMM FAQ*. https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
+Baath, R. (2024, 28 January). *The source of the cake dataset*. <https://www.sumsar.net/blog/source-of-the-cake-dataset/>
 
-Choe, J. (2020). *The Correlation Parameter in the Random Effects of Mixed Effects Models.* https://rpubs.com/yjunechoe/correlationsLMEM 
+Bolker, B. (2023, 8 October). *GLMM FAQ*. <https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html>
+
+Choe, J. (2020). *The Correlation Parameter in the Random Effects of Mixed Effects Models.* <https://rpubs.com/yjunechoe/correlationsLMEM> 
 
 Cook, F. E. (1938). *Chocolate cake: I. Optimum baking temperature.* (Doctoral dissertation, Iowa State College).
 
 Faraway, J. J. (2016). *Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models.* Chapman and Hall/CRC.
 
-Hadjuk, G. K. & Gallois E. (2022, 9 February). *Introduction to linear mixed models.* Our Coding Club. https://ourcodingclub.github.io/tutorials/mixed-models/
+Hadjuk, G. K. & Gallois E. (2022, 9 February). *Introduction to linear mixed models.* Our Coding Club. <https://ourcodingclub.github.io/tutorials/mixed-models/>
 
-Oehlert, G. W. (2010). *A first course in design and analysis of experiments.* https://conservancy.umn.edu/server/api/core/bitstreams/87e0734d-31ea-4596-8295-d87705271c07/content 
+Oehlert, G. W. (2010). *A first course in design and analysis of experiments.* <https://conservancy.umn.edu/server/api/core/bitstreams/87e0734d-31ea-4596-8295-d87705271c07/content> 
 
 Winter, B., & Grawunder, S. (2012). *The phonetic profile of Korean formal and informal speech registers.* Journal of Phonetics, 40(6), 808-815.
diff --git a/materials/checking-assumptions.qmd b/materials/checking-assumptions.qmd
index 127a82e..a97360d 100644
--- a/materials/checking-assumptions.qmd
+++ b/materials/checking-assumptions.qmd
@@ -138,11 +138,13 @@ If you find the green, blue and red default colours in `check_model` to be a lit
 
 ## Exercises
 
-### Exercise 1 - Dragons revisited (again)
+### Dragons revisited (again) {#sec-exr_dragons3}
+
+::: {.callout-exercise}
 
 {{< level 1 >}}
 
-Let's once again revisit the `dragons` dataset, and the minimal model that we chose in the previous section based on significance testing:
+Let's once again revisit the `dragons` dataset, and the minimal model that we chose in [Exercise -@sec-exr_dragons2] based on significance testing:
 
 ::: {.panel-tabset group="language"}
 ## R
@@ -157,7 +159,7 @@ lme_dragons_dropx <- lmer(intelligence ~ wingspan + scales +
 
 Fit diagnostic plots for this model using the code given above. What do they show?
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 ::: {.panel-tabset group="language"}
@@ -175,7 +177,11 @@ check_model(lme_dragons_dropx,
 
 Try comparing these diagnostic plots to the diagnostic plots for the full model, `intelligence ~ wingspan*scales + (1 + wingspan|mountain)`. Are the assumptions better met? Why/why not?
 
-### Exercise 2 - Arabidopsis
+:::
+
+### Arabidopsis {#sec-exr_arabidopsis}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
@@ -209,7 +215,7 @@ Fit the following mixed effects model:
 
 and check its assumptions. What can you conclude about the suitability of a linear mixed effects model for this dataset?
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 #### Fitting the model
@@ -263,6 +269,7 @@ To figure out why, and whether it's fixable, think about the types of variables
 
 Chat about these bonus questions with a neighbour, or a trainer. Understanding why these diagnostic plots look bad, and why we might need to take a closer look at the dataset before we fit things, will serve you really well when working with your own data.
 
+:::
 :::
 
 ## Summary
diff --git a/materials/crossed-random-effects.qmd b/materials/crossed-random-effects.qmd
index 34dcf93..9db18b7 100644
--- a/materials/crossed-random-effects.qmd
+++ b/materials/crossed-random-effects.qmd
@@ -139,7 +139,9 @@ If you check the output, you can see that we do indeed have 4 groups each for `r
 
 ## Exercises
 
-### Exercise 1 - Penicillin
+### Penicillin {#sec-exr_penicillin}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
@@ -169,7 +171,7 @@ For this exercise:
 3. Check the model assumptions
 4. Visualise the model
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 This is quite a simple dataset, in that there are only two variables besides the response. But, given the research question, we likely want to consider both of these two variables as random effects.
@@ -210,7 +212,11 @@ ggplot(augment(lme_penicillin), aes(x = plate, y = diameter, colour = sample)) +
 
 :::
 
-### Exercise 2 - Politeness
+:::
+
+### Politeness {#sec-exr_solutions}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
@@ -243,7 +249,7 @@ To answer this question:
 2. Try drawing out the structure of the dataset, and think about what levels the different variables are varying at
 3. You may want to assess the quality and significance of the model to help you draw your final conclusions
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 #### Consider the experimental design
@@ -337,6 +343,8 @@ In the final line of code for the plot, we've included the lines of best fit for
 
 :::
 
+:::
+
 ## Summary
 
 This section has addressed how to fit models with multiple clustering variables, in scenarios where those clustering variables are not nested with one another.
diff --git a/materials/fitting-mixed-models.qmd b/materials/fitting-mixed-models.qmd
index 799cce3..bae252d 100644
--- a/materials/fitting-mixed-models.qmd
+++ b/materials/fitting-mixed-models.qmd
@@ -429,7 +429,9 @@ This idea of taking into account the global average when calculating our set of
 
 ## Exercises
 
-### Exercise 1 - Irrigation
+### Irrigation {#sec-exr_irrigation}
+
+::: {.callout-exercise}
 
 {{< level 1 >}}
 
@@ -458,7 +460,7 @@ For this exercise:
 
 Does it look as if `irrigation` method or crop `variety` are likely to affect `yield`?
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 #### Visualise the data
@@ -536,7 +538,11 @@ ggplot(augment(lme_yield), aes(x = irrigation, y = yield, shape = variety)) +
 
 :::
 
-### Exercise 2 - Solutions
+:::
+
+### Solutions {#sec-exr_solutions}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
@@ -554,7 +560,11 @@ There is no worked answer provided for this exercise, in order to challenge you
 Note: if you encounter the `boundary (singular) fit: see help('isSingular')` error, this doesn't mean that you've used the `lme4` syntax incorrectly; as we'll discuss later in the course, it means that the model you've fitted is too complex to be supported by the size of the dataset.
 :::
 
-### Exercise 3 - Dragons
+:::
+
+### Dragons {#sec-exr_dragons}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
@@ -576,7 +586,7 @@ With more variables, there are more possible models that could be fitted. Think
 
 Try to work through this yourself, before expanding the answer below.
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 Here, we'll work through how to fit and visualise one possible mixed effects model that could be fitted to these data.
@@ -738,7 +748,9 @@ You might also notice in the model summary that the estimated variance for the r
 
 :::
 
-::: {.callout-tip appearance="minimal"}
+:::
+
+::: {.callout-exercise}
 #### Bonus questions
 
 {{< level 3 >}}
@@ -782,6 +794,8 @@ Where $y$ is `intelligence`, $x_1$ is `wingspan`, $x_2$ is `scales`, $j$ represe
 
 :::
 
+:::
+
 ## Summary
 
 This section of the course is designed to introduce the syntax required for fitting two-level mixed models in R, including both random intercepts and random slopes, and how we can visualise the resulting models.
diff --git a/materials/generalised-mixed-models.qmd b/materials/generalised-mixed-models.qmd
index a836ec9..bdc6995 100644
--- a/materials/generalised-mixed-models.qmd
+++ b/materials/generalised-mixed-models.qmd
@@ -60,9 +60,9 @@ The assumptions of a GLMM are an amalgamation of the assumptions of a GLM and a
 - Correct link function; there is a linear relationship between the linearised model
 - Normally distributed random effects
 
-## Revisiting arabidopsis
+## Revisiting Arabidopsis
 
-To give an illustration of how we fit and assess generalised linear mixed effects models (GLMMs), we'll look at the internal dataset `Arabidopsis` from `lme4`.
+To give an illustration of how we fit and assess generalised linear mixed effects models (GLMMs), we'll look at the internal dataset `Arabidopsis`, which we investigated earlier in the course in [Exercise -@sec-exr_arabidopsis].
 
 ::: {.panel-tabset group="language"}
 ## R
diff --git a/materials/nested-random-effects.qmd b/materials/nested-random-effects.qmd
index 222e688..e208b74 100644
--- a/materials/nested-random-effects.qmd
+++ b/materials/nested-random-effects.qmd
@@ -270,7 +270,9 @@ And, no matter which method you choose, always check the model output to see tha
 
 ## Exercises
 
-### Exercise 1 - Cake
+### Cake {#sec-exr_cake}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
@@ -302,7 +304,7 @@ For this exercise:
 3. Consider how you might recode the dataset to reflect implicit nesting
 4. Fit and test at least one appropriate model
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 #### Consider the experimental design
@@ -417,12 +419,12 @@ ggplot(augment(lme_cake), aes(x = temperature, y = angle, colour = recipe)) +
 ```
 :::
 
-
+:::
 
 :::
 
-::: {.callout-tip appearance="minimal"}
-#### Follow-up questions
+::: {.callout-exercise}
+#### Bonus questions
 
 {{< level 2 >}}
 
@@ -435,7 +437,9 @@ If you want to think a bit harder about this dataset, consider these additional
 
 For more information on the very best way to bake a chocolate cake (and a lovely demonstration at the end about the dangers of extrapolating from a linear model), [this blog post](https://www.sumsar.net/blog/source-of-the-cake-dataset/) is a nice source. It's written by a data scientist who was so curious about the quirky `cake` dataset that he contacted Iowa State University, who helped him unearth Cook's original thesis.
 
-### Exercise 2 - Parallel fibres
+### Parallel fibres {#sec-exr_parallel}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
@@ -466,7 +470,7 @@ For this exercise:
 2. Determine whether the dataset requires recoding or explicit nesting
 3. Fit and test at least one appropriate model
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 #### Visualise the design
@@ -593,8 +597,10 @@ Our diagnostic plots look pretty good for our simpler, intercepts-only model, bu
 
 :::
 
-::: {.callout-tip appearance="minimal"}
-#### Optional follow-up question: notation
+:::
+
+::: {.callout-exercise}
+#### Bonus question: notation
 
 {{< level 3 >}}
 
@@ -604,7 +610,7 @@ What would the equation of a three level model fitted to the `parallel` dataset
 
 Hint: you'll need more subscript letters than you did for a two-level model!
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Answer: three-level intercepts-only
 
 E.g., `length ~ depth + (1|slice:cat) + (1|cat)`
@@ -641,7 +647,7 @@ $$
 
 :::
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Answer: three-level intercepts & slopes
 
 E.g., `length ~ depth + (1|slice:cat) + (1 + depth|cat)`
diff --git a/materials/random-effects.qmd b/materials/random-effects.qmd
index ac61ace..7e5f8db 100644
--- a/materials/random-effects.qmd
+++ b/materials/random-effects.qmd
@@ -64,7 +64,9 @@ There'll be more about the maths of fitting random effects later in the course,
 
 ## Exercises
 
-### Exercise 1 - Primary schools
+### Primary schools {#sec-exr_primaryschools}
+
+::: {.callout-exercise}
 
 {{< level 1 >}}
 
@@ -82,7 +84,7 @@ The response variable in this example is the standardised academic test scores,
 
 Which of these predictors should be treated as fixed versus random effects? Are there any other "hidden" grouping variables that we should consider, based on the description of the experiment?
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Answer
 
 We care about the effects of `gender` and `SES score`. We might also be interested in testing for the interaction between them, like so: `academic test scores ~ SES + gender + SES:gender`.
@@ -101,7 +103,11 @@ The `classroom` variable would in fact be "nested" inside the `school` variable
 Our other possible hidden variable is `family`. If siblings have been included in the study, they will share an identical SES score, because this has been derived from the parent(s) rather than the students themselves. Siblings are, in this context, technical replicates! One way to deal with this is to simply remove siblings from the study; or, if there are enough sibling pairs to warrant it, we could also treat `family` as a random effect.
 :::
 
-### Exercise 2 - Ferns
+:::
+
+### Ferns {#sec-exr_ferns}
+
+::: {.callout-exercise}
 
 {{< level 1 >}}
 
@@ -115,7 +121,7 @@ What are our variables? What's the relationship we're interested in, and which o
 
 ![Predictor variables](images_mixed-effects/example2_1.png){fig-alt="Graphic with three variables listed: Tray, Itensity and Timepoint"}
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Answer
 
 There are four things here that vary: `tray`, `light intensity`, `timepoint` and `height`. 
@@ -138,7 +144,11 @@ In this case, then, `time` would probably be best treated as a fixed rather than
 However, if we were not measuring a response variable that changes over time (like growth), that might change. If, for instance, we were investigating the relationship between light intensity and chlorophyll production in adult plants, then measuring across different time points would be a case of technical replication instead, and `time` would be best treated as a random effect. **The research question is key in making this decision.**
 :::
 
-### Exercise 3 - Wolves
+:::
+
+### Wolves {#sec-exr_wolves}
+
+::: {.callout-exercise}
 
 {{< level 1 >}}
 
@@ -148,7 +158,7 @@ What's the relationship of interest? Is our total *n* really 60?
 
 ![Predictor variables](images_mixed-effects/example3_1.png){fig-alt="Graphic with three variables listed: Wolf population, National park and Year."}
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Answer
 
 Though we have 60 observations, it would of course be a case of pseudoreplication if we failed to understand the clustering within these data.
@@ -165,6 +175,8 @@ We have measured across several national parks, and over a 10 year period, in or
 Of course, you might know more about ecology than me, and have a good reason to believe that the exact years *do* matter - that perhaps something fundamental in the relationship between `flood depth ~ wolf population` really does vary with year in a meaningful way. But given that our research question does not focus on change over time, both `year` and `national park` would be best treated as random effects given the information we currently have.
 :::
 
+:::
+
 ## Summary
 
 ::: {.callout-tip}
diff --git a/materials/significance-and-model-comparison.qmd b/materials/significance-and-model-comparison.qmd
index 29cad67..bb43858 100644
--- a/materials/significance-and-model-comparison.qmd
+++ b/materials/significance-and-model-comparison.qmd
@@ -410,11 +410,13 @@ If you're curious, you could start some follow-up reading [here](https://bbolker
 
 ## Exercises
 
-### Exercise 1 - Dragons revisited
+### Dragons revisited {#sec-exr_dragons2}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
-Let's return to the dataset from our previous example, our dragons dataset.
+Let's return to the dataset from a previous exercise, [Exercise -@sec-exr_dragons].
 
 Previously, we fit a mixed model to this dataset that included response variable `intelligence`, fixed effects of `wingspan`, `scales` and `wingspan:colour`, and two random effects: random intercepts `1|mountain`, and random slopes for `wingspan|mountain`.
 
@@ -438,7 +440,7 @@ If you're feeling adventurous, you can also:
 - use LRTs, AIC and/or bootstrapping to assess the random effects, and compare the results
 - use other methods to assess the significance of the fixed effects, and compare the results
 
-::: {.callout-note collapse="true"}
+::: {.callout-tip collapse="true"}
 #### Worked answer
 
 Let's start by using an LRT to test the overall significance of our model. We'll construct a null model, and then use `anova` to compare it to our model.
@@ -606,11 +608,15 @@ Do we need to reduce the number of random parameters in our model? Our dataset i
 
 :::
 
-### Exercise 2 - Irrigation revisited
+:::
+
+### Irrigation revisited {#sec-exr_irrigation2}
+
+::: {.callout-exercise}
 
 {{< level 2 >}}
 
-Once again, we'll return to a dataset from the previous section of the course, and the model we fitted to it.
+Once again, we'll return to a dataset from the previous section of the course, this time [Exercise -@sec-exr_irrigation], and the model we fitted to it.
 
 ::: {.panel-tabset group="language"}
 ## R
@@ -630,6 +636,8 @@ There's no worked answer for this exercise, but you can use the code from the `s
 
 Consider also the random intercepts. If an LRT or bootstrap indicated that the random effect wasn't significant, would you drop the intercepts from the model? Why/why not? Feel free to chat to a neighbour or trainer to help make your decision.
 
+:::
+
 ## Summary
 
 This section showcases multiple methods of performing significance testing and model comparison for mixed effects models - but also introduces a broader debate as to when and how significance testing is actually useful for this type of model.