Skip to content

Commit

Permalink
formatting updates 02/07/2024
Browse files Browse the repository at this point in the history
  • Loading branch information
Vicki-H committed Jul 2, 2024
1 parent 114f3f4 commit b3ad731
Show file tree
Hide file tree
Showing 34 changed files with 116 additions and 59 deletions.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions _freeze/materials/random-effects/execute-results/html.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

14 changes: 8 additions & 6 deletions index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Mixed effects models"
author: "Vicki Hodgson"
author: "Vicki Hodgson*, Hugo Tavares, Paul Fannon, Martin van Rongen"
date: today
number-sections: false
---
Expand Down Expand Up @@ -33,7 +33,7 @@ You should have a working knowledge of R/RStudio, and a grasp of core statistics
Exercises in these materials are labelled according to their level of difficulty:

| Level | Description |
| ----: | :---------- |
| :-: | :----------- |
| {{< fa solid star >}} {{< fa regular star >}} {{< fa regular star >}} | Exercises in level 1 are simpler and designed to get you familiar with the concepts and syntax covered in the course. |
| {{< fa solid star >}} {{< fa solid star >}} {{< fa regular star >}} | Exercises in level 2 combine different concepts together and apply it to a given task. |
| {{< fa solid star >}} {{< fa solid star >}} {{< fa solid star >}} | Exercises in level 3 require going beyond the concepts and syntax introduced to solve new problems. |
Expand Down Expand Up @@ -68,16 +68,18 @@ About the authors:

## References

Bolker, B. (2023, 8 October). *GLMM FAQ*. https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
Baath, R. (2024, 28 January). *The source of the cake dataset*. <https://www.sumsar.net/blog/source-of-the-cake-dataset/>

Choe, J. (2020). *The Correlation Parameter in the Random Effects of Mixed Effects Models.* https://rpubs.com/yjunechoe/correlationsLMEM
Bolker, B. (2023, 8 October). *GLMM FAQ*. <https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html>

Choe, J. (2020). *The Correlation Parameter in the Random Effects of Mixed Effects Models.* <https://rpubs.com/yjunechoe/correlationsLMEM>

Cook, F. E. (1938). *Chocolate cake: I. Optimum baking temperature.* (Doctoral dissertation, Iowa State College).

Faraway, J. J. (2016). *Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models.* Chapman and Hall/CRC.

Hadjuk, G. K. & Gallois E. (2022, 9 February). *Introduction to linear mixed models.* Our Coding Club. https://ourcodingclub.github.io/tutorials/mixed-models/
Hadjuk, G. K. & Gallois E. (2022, 9 February). *Introduction to linear mixed models.* Our Coding Club. <https://ourcodingclub.github.io/tutorials/mixed-models/>

Oehlert, G. W. (2010). *A first course in design and analysis of experiments.* https://conservancy.umn.edu/server/api/core/bitstreams/87e0734d-31ea-4596-8295-d87705271c07/content
Oehlert, G. W. (2010). *A first course in design and analysis of experiments.* <https://conservancy.umn.edu/server/api/core/bitstreams/87e0734d-31ea-4596-8295-d87705271c07/content>

Winter, B., & Grawunder, S. (2012). *The phonetic profile of Korean formal and informal speech registers.* Journal of Phonetics, 40(6), 808-815.
17 changes: 12 additions & 5 deletions materials/checking-assumptions.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,13 @@ If you find the green, blue and red default colours in `check_model` to be a lit

## Exercises

### Exercise 1 - Dragons revisited (again)
### Dragons revisited (again) {#sec-exr_dragons3}

::: {.callout-exercise}

{{< level 1 >}}

Let's once again revisit the `dragons` dataset, and the minimal model that we chose in the previous section based on significance testing:
Let's once again revisit the `dragons` dataset, and the minimal model that we chose in [Exercise -@sec-exr_dragons2] based on significance testing:

::: {.panel-tabset group="language"}
## R
Expand All @@ -157,7 +159,7 @@ lme_dragons_dropx <- lmer(intelligence ~ wingspan + scales +

Fit diagnostic plots for this model using the code given above. What do they show?

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

::: {.panel-tabset group="language"}
Expand All @@ -175,7 +177,11 @@ check_model(lme_dragons_dropx,

Try comparing these diagnostic plots to the diagnostic plots for the full model, `intelligence ~ wingspan*scales + (1 + wingspan|mountain)`. Are the assumptions better met? Why/why not?

### Exercise 2 - Arabidopsis
:::

### Arabidopsis {#sec-exr_arabidopsis}

::: {.callout-exercise}

{{< level 2 >}}

Expand Down Expand Up @@ -209,7 +215,7 @@ Fit the following mixed effects model:

and check its assumptions. What can you conclude about the suitability of a linear mixed effects model for this dataset?

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

#### Fitting the model
Expand Down Expand Up @@ -263,6 +269,7 @@ To figure out why, and whether it's fixable, think about the types of variables

Chat about these bonus questions with a neighbour, or a trainer. Understanding why these diagnostic plots look bad, and why we might need to take a closer look at the dataset before we fit things, will serve you really well when working with your own data.

:::
:::

## Summary
Expand Down
16 changes: 12 additions & 4 deletions materials/crossed-random-effects.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,9 @@ If you check the output, you can see that we do indeed have 4 groups each for `r

## Exercises

### Exercise 1 - Penicillin
### Penicillin {#sec-exr_penicillin}

::: {.callout-exercise}

{{< level 2 >}}

Expand Down Expand Up @@ -169,7 +171,7 @@ For this exercise:
3. Check the model assumptions
4. Visualise the model

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

This is quite a simple dataset, in that there are only two variables besides the response. But, given the research question, we likely want to consider both of these two variables as random effects.
Expand Down Expand Up @@ -210,7 +212,11 @@ ggplot(augment(lme_penicillin), aes(x = plate, y = diameter, colour = sample)) +

:::

### Exercise 2 - Politeness
:::

### Politeness {#sec-exr_solutions}

::: {.callout-exercise}

{{< level 2 >}}

Expand Down Expand Up @@ -243,7 +249,7 @@ To answer this question:
2. Try drawing out the structure of the dataset, and think about what levels the different variables are varying at
3. You may want to assess the quality and significance of the model to help you draw your final conclusions

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

#### Consider the experimental design
Expand Down Expand Up @@ -337,6 +343,8 @@ In the final line of code for the plot, we've included the lines of best fit for

:::

:::

## Summary

This section has addressed how to fit models with multiple clustering variables, in scenarios where those clustering variables are not nested with one another.
Expand Down
26 changes: 20 additions & 6 deletions materials/fitting-mixed-models.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,9 @@ This idea of taking into account the global average when calculating our set of

## Exercises

### Exercise 1 - Irrigation
### Irrigation {#sec-exr_irrigation}

::: {.callout-exercise}

{{< level 1 >}}

Expand Down Expand Up @@ -458,7 +460,7 @@ For this exercise:

Does it look as if `irrigation` method or crop `variety` are likely to affect `yield`?

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

#### Visualise the data
Expand Down Expand Up @@ -536,7 +538,11 @@ ggplot(augment(lme_yield), aes(x = irrigation, y = yield, shape = variety)) +

:::

### Exercise 2 - Solutions
:::

### Solutions {#sec-exr_solutions}

::: {.callout-exercise}

{{< level 2 >}}

Expand All @@ -554,7 +560,11 @@ There is no worked answer provided for this exercise, in order to challenge you
Note: if you encounter the `boundary (singular) fit: see help('isSingular')` error, this doesn't mean that you've used the `lme4` syntax incorrectly; as we'll discuss later in the course, it means that the model you've fitted is too complex to be supported by the size of the dataset.
:::

### Exercise 3 - Dragons
:::

### Dragons {#sec-exr_dragons}

::: {.callout-exercise}

{{< level 2 >}}

Expand All @@ -576,7 +586,7 @@ With more variables, there are more possible models that could be fitted. Think

Try to work through this yourself, before expanding the answer below.

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

Here, we'll work through how to fit and visualise one possible mixed effects model that could be fitted to these data.
Expand Down Expand Up @@ -738,7 +748,9 @@ You might also notice in the model summary that the estimated variance for the r

:::

::: {.callout-tip appearance="minimal"}
:::

::: {.callout-exercise}
#### Bonus questions

{{< level 3 >}}
Expand Down Expand Up @@ -782,6 +794,8 @@ Where $y$ is `intelligence`, $x_1$ is `wingspan`, $x_2$ is `scales`, $j$ represe

:::

:::

## Summary

This section of the course is designed to introduce the syntax required for fitting two-level mixed models in R, including both random intercepts and random slopes, and how we can visualise the resulting models.
Expand Down
4 changes: 2 additions & 2 deletions materials/generalised-mixed-models.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@ The assumptions of a GLMM are an amalgamation of the assumptions of a GLM and a
- Correct link function; there is a linear relationship between the linearised model
- Normally distributed random effects

## Revisiting arabidopsis
## Revisiting Arabidopsis

To give an illustration of how we fit and assess generalised linear mixed effects models (GLMMs), we'll look at the internal dataset `Arabidopsis` from `lme4`.
To give an illustration of how we fit and assess generalised linear mixed effects models (GLMMs), we'll look at the internal dataset `Arabidopsis`, which we investigated earlier in the course in [Exercise -@sec-exr_arabidopsis].

::: {.panel-tabset group="language"}
## R
Expand Down
28 changes: 17 additions & 11 deletions materials/nested-random-effects.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,9 @@ And, no matter which method you choose, always check the model output to see tha

## Exercises

### Exercise 1 - Cake
### Cake {#sec-exr_cake}

::: {.callout-exercise}

{{< level 2 >}}

Expand Down Expand Up @@ -302,7 +304,7 @@ For this exercise:
3. Consider how you might recode the dataset to reflect implicit nesting
4. Fit and test at least one appropriate model

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

#### Consider the experimental design
Expand Down Expand Up @@ -417,12 +419,12 @@ ggplot(augment(lme_cake), aes(x = temperature, y = angle, colour = recipe)) +
```
:::


:::

:::

::: {.callout-tip appearance="minimal"}
#### Follow-up questions
::: {.callout-exercise}
#### Bonus questions

{{< level 2 >}}

Expand All @@ -435,7 +437,9 @@ If you want to think a bit harder about this dataset, consider these additional

For more information on the very best way to bake a chocolate cake (and a lovely demonstration at the end about the dangers of extrapolating from a linear model), [this blog post](https://www.sumsar.net/blog/source-of-the-cake-dataset/) is a nice source. It's written by a data scientist who was so curious about the quirky `cake` dataset that he contacted Iowa State University, who helped him unearth Cook's original thesis.

### Exercise 2 - Parallel fibres
### Parallel fibres {#sec-exr_parallel}

::: {.callout-exercise}

{{< level 2 >}}

Expand Down Expand Up @@ -466,7 +470,7 @@ For this exercise:
2. Determine whether the dataset requires recoding or explicit nesting
3. Fit and test at least one appropriate model

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Worked answer

#### Visualise the design
Expand Down Expand Up @@ -593,8 +597,10 @@ Our diagnostic plots look pretty good for our simpler, intercepts-only model, bu

:::

::: {.callout-tip appearance="minimal"}
#### Optional follow-up question: notation
:::

::: {.callout-exercise}
#### Bonus question: notation

{{< level 3 >}}

Expand All @@ -604,7 +610,7 @@ What would the equation of a three level model fitted to the `parallel` dataset

Hint: you'll need more subscript letters than you did for a two-level model!

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Answer: three-level intercepts-only

E.g., `length ~ depth + (1|slice:cat) + (1|cat)`
Expand Down Expand Up @@ -641,7 +647,7 @@ $$

:::

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Answer: three-level intercepts & slopes

E.g., `length ~ depth + (1|slice:cat) + (1 + depth|cat)`
Expand Down
24 changes: 18 additions & 6 deletions materials/random-effects.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ There'll be more about the maths of fitting random effects later in the course,

## Exercises

### Exercise 1 - Primary schools
### Primary schools {#sec-exr_primaryschools}

::: {.callout-exercise}

{{< level 1 >}}

Expand All @@ -82,7 +84,7 @@ The response variable in this example is the standardised academic test scores,

Which of these predictors should be treated as fixed versus random effects? Are there any other "hidden" grouping variables that we should consider, based on the description of the experiment?

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Answer

We care about the effects of `gender` and `SES score`. We might also be interested in testing for the interaction between them, like so: `academic test scores ~ SES + gender + SES:gender`.
Expand All @@ -101,7 +103,11 @@ The `classroom` variable would in fact be "nested" inside the `school` variable
Our other possible hidden variable is `family`. If siblings have been included in the study, they will share an identical SES score, because this has been derived from the parent(s) rather than the students themselves. Siblings are, in this context, technical replicates! One way to deal with this is to simply remove siblings from the study; or, if there are enough sibling pairs to warrant it, we could also treat `family` as a random effect.
:::

### Exercise 2 - Ferns
:::

### Ferns {#sec-exr_ferns}

::: {.callout-exercise}

{{< level 1 >}}

Expand All @@ -115,7 +121,7 @@ What are our variables? What's the relationship we're interested in, and which o

![Predictor variables](images_mixed-effects/example2_1.png){fig-alt="Graphic with three variables listed: Tray, Itensity and Timepoint"}

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Answer

There are four things here that vary: `tray`, `light intensity`, `timepoint` and `height`.
Expand All @@ -138,7 +144,11 @@ In this case, then, `time` would probably be best treated as a fixed rather than
However, if we were not measuring a response variable that changes over time (like growth), that might change. If, for instance, we were investigating the relationship between light intensity and chlorophyll production in adult plants, then measuring across different time points would be a case of technical replication instead, and `time` would be best treated as a random effect. **The research question is key in making this decision.**
:::

### Exercise 3 - Wolves
:::

### Wolves {#sec-exr_wolves}

::: {.callout-exercise}

{{< level 1 >}}

Expand All @@ -148,7 +158,7 @@ What's the relationship of interest? Is our total *n* really 60?

![Predictor variables](images_mixed-effects/example3_1.png){fig-alt="Graphic with three variables listed: Wolf population, National park and Year."}

::: {.callout-note collapse="true"}
::: {.callout-tip collapse="true"}
#### Answer

Though we have 60 observations, it would of course be a case of pseudoreplication if we failed to understand the clustering within these data.
Expand All @@ -165,6 +175,8 @@ We have measured across several national parks, and over a 10 year period, in or
Of course, you might know more about ecology than me, and have a good reason to believe that the exact years *do* matter - that perhaps something fundamental in the relationship between `flood depth ~ wolf population` really does vary with year in a meaningful way. But given that our research question does not focus on change over time, both `year` and `national park` would be best treated as random effects given the information we currently have.
:::

:::

## Summary

::: {.callout-tip}
Expand Down
Loading

0 comments on commit b3ad731

Please sign in to comment.