+ - 0:00:00
Notes for current slide
Notes for next slide

General and Generalized Linear Mixed models


An Introduction for non-statisticians

Joris De Wolf

Highwoods for VIB


14 and 15 November 2024

1 / 48

Setting the scene

An experiment compares the effect of two treatments on leaf length in plants.

12 observations for each treatment

2 / 48

What is the aim of the analysis?

3 / 48

What is the aim of the analysis?

Is it "Which treatment is best in this experiment?"

3 / 48

What is the aim of the analysis?

Is it "Which treatment is best in this experiment?"

The real aim is:

  • Can I say something more general about the difference between the treatments? Beyond this experiment.

  • Would I find the same or a similar effect again when I redo the experiment?

    • under the same conditions
    • under similar conditions
3 / 48

Restricted to this experiment:

4 / 48

In order to generalize: average + spread around average

5 / 48

...and of course number of observations plays a role

6 / 48

The Classic way: The simple linear model

normal distribution -> for small samples: t-distribution t-distributions for residuals Trt1

7 / 48

A closer look : the observations are not independent

8 / 48

A closer look : observations may not be balanced

9 / 48

Requirements of the residuals in linear models

  1. need to belong to the same, normal distribution with mean = 0

  2. need to be independent from each other

Consequence: Simple linear models will not provide correct answers.

10 / 48

Respect the hierarchy

Possible solution...

  1. summaries by pot
  2. summary of the by-pot summaries
11 / 48

But not all dependency is the same

12 / 48

Hence:

  • too optimistic: all independent (too many degrees of freedom)

  • too strict: ignore individual observations at lowest level (too few degrees of freedom)

What we really need: a model that can find a right balance

13 / 48

Plenty of reasons why observations are not independent

  • experiments with study objects in blocks, or done on different days, assessed with different equipment, treated by different people,...

  • several observations on the same subject (same time, different time)

  • physical position of a study objects in an heterogeneous environment

  • genetic relationship among study objects

  • ...

14 / 48

Plenty of reasons why observations are not independent

  • experiments with study objects in blocks, or done on different days, assessed with different equipment, treated by different people,...

  • several observations on the same subject (same time, different time)

  • physical position of a study objects in an heterogeneous environment

  • genetic relationship among study objects

  • ...

Later on these dependencies will be described via groups or classes, whereas for others, a continuous distance could be used to indicate the relatedness.

14 / 48

Intermezzo 1:

Consequences for design/execution of an experiment

  • Be aware of the hierarchy

  • Pots perhaps more important than individual plants or leaves

15 / 48

Intermezzo 1:

Consequences for design/execution of an experiment

  • Be aware of the hierarchy

  • Pots perhaps more important than individual plants or leaves

  • Role of a factor in an experiment (...)

15 / 48

Intermezzo 2:

Consequences for interpretation

  • If you don't include variability of pots: conclusion limited to these pots

  • If you include variability of pots: conclusion extended to a similar set of pots

16 / 48

Intermezzo 3:

Dependence, is it a curse or a blessing?

17 / 48

Intermezzo 3:

Dependence, is it a curse or a blessing?

  • The reality is not independent (even the one we mimic/create in experiments)

  • Related observations can help/stabilize/improve a prediction of cases with imprecise observations

  • BUT: you have to realize there is dependence and find a proper solution for it (not (always) easy)

17 / 48

Solution: Linear mixed models

  • hierarchical models / crossed models: dependence in discrete groups
  • longitudinal - spatial - relatedness : continuously varying dependence


Models that use/model explicitly the

  • structure among the groups of observations (random effects) or
  • the variance-covariance among the individual observations.

[blocks vs position in experiment]

18 / 48

Additional problem : non-normality

19 / 48

Additional problem : non-normality

19 / 48

Additional problem : non-normality

Not normal distribution

  • concentrated at low values
  • long tail at the right
  • many low values but impossible to go below zero


  • approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore


  • We need generalized models.
  • not easy... (choice of distributions, algorithms, interpretation)
20 / 48

General Linear Mixed Models: a closer look

21 / 48

Linear Mixed Model: why mixed?

  • Basic Linear model: fixed effects + simple residuals

  • Linear mixed model :

    • fixed effects + complex residuals
    • fixed effects + [random effects + residuals]

so: mixed models are models that include both fixed and random effects

22 / 48

Linear Mixed Model: why mixed?

  • Basic Linear model: fixed effects + simple residuals

  • Linear mixed model :

    • fixed effects + complex residuals
    • fixed effects + [random effects + residuals]

so: mixed models are models that include both fixed and random effects

LMM apply a more complex representation of the residuals that is capable to deal with the dependence structure.

22 / 48

Additional benefit of (grouping) random effects

  • 'efficient' representation of a factor
  • representing a larger, more general population of from which tested conditions are drawn
  • help in generalizing findings of an experiment
23 / 48

Additional benefit of (grouping) random effects

  • 'efficient' representation of a factor
  • representing a larger, more general population of from which tested conditions are drawn
  • help in generalizing findings of an experiment

hence alternative name for random effect: Variance Component

23 / 48

Why considering a factor as a random effect?

  • Are we interested in the specific levels of the factor?
  • Are we interested in the factor as a specific source of variation that may have its impact on the observation?
  • Do we want to claim something generic outside the specific situation of the experiment?
  • Are we interested in improving our model without much interest in that factor?
  • How many levels of the factor do we have data for?
  • Is it truly random?
24 / 48

Some matrices...

A simple situation with 2 treatments A and B, with 4 observations each

y1=bA+e1y2=bA+e2y3=bA+e3y7=bB+e7y8=bB+e8y1=bA+e1y2=bA+e2y3=bA+e3y7=bB+e7y8=bB+e8




Y=Xβ+εY=Xβ+ε

with ε=e1,e2,..,e8ε=e1,e2,..,e8

25 / 48

Some matrices...

A simple situation with 2 treatments A and B, with 4 observations each

y1=bA+e1y2=bA+e2y3=bA+e3y7=bB+e7y8=bB+e8y1=bA+e1y2=bA+e2y3=bA+e3y7=bB+e7y8=bB+e8




Y=Xβ+εY=Xβ+ε

with ε=e1,e2,..,e8ε=e1,e2,..,e8

every eiei is a drawn from a normal distribution, all with mean = zero. N(0,σ1),N(0,σ2),...,N(0,σ8)N(0,σ1),N(0,σ2),...,N(0,σ8)

25 / 48

classic models: 'iid' distributed residuals

In linear models we assume 'residuals are identically and independently distributed'.

Identically:

σ1=σ2=σ3=...=σσ1=σ2=σ3=...=σ

so N(0,σ1),N(0,σ2),...,N(0,σ8)N(0,σ1),N(0,σ2),...,N(0,σ8) becomes N(0,σ),N(0,σ),...,N(0,σ)N(0,σ),N(0,σ),...,N(0,σ)


independently means the draw of e2e2 does not depend on e1 and vice versa.

hence:

(e1,e2)N(0,Σ) with

Σ=[σ2covcovσ2]=[σ200σ2]

as cov = 0

26 / 48

remember bivariate normal distributions:

Σa=[1001]     Σb=[1002]     Σc=[10.60.61]

27 / 48

Closer look at the distribution of the residuals

Suppose all 8 observations are independent

ε is distributed N(0,Σ) (=normal distribution with a mean 0 and variance-covariance matrix Σ)

with:

Σ=[σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ2]

Σ only contains one σ and that only on the diagonal (covariance = 0)

i.e the simple linear model.

28 / 48

Sigma in case of dependence

Σ=[σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ2]

The simple linear case
29 / 48

Sigma in case of dependence

Σ=[σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ2]

The simple linear case

Σ=[σ21σ2aσ2bσ2c....σ2aσ22σ2kσ2l....σ2bσ2kσ23σ2k....σ2cσ2lσ2kσ24........σ25........σ26........σ28........σ29]

The ultimate but impossible case
29 / 48

The pratical situation: residual variance by group

Σ=[σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ2]

Y=Xβ+ε

Σ=[σ2100000000σ2100000000σ2100000000σ2100000000σ2200000000σ2200000000σ2200000000σ22]

Y=Xβ+ε with a different variance for A and B (heteroscedastic model)

30 / 48

The pratical situation: groups

Σ=[σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ200000000σ2]


Y=Xβ+ε

Σ=[σ2σ21000000σ21σ200000000σ2σ21000000σ21σ200000000σ2σ21000000σ21σ200000000σ2σ21000000σ21σ2]
split up Σ in a part that deals with the yellow and part that deals with the red part ("the random effect")

Y=Xβ+Zu+ε

31 / 48

The pratical situation: functions of dependence

Other possibility is to apply a function that results in decreasing covariance over "distance":

Σ=[σ2σ21σ21/200000σ21σ2σ21σ21/20000σ21/2σ21σ2σ21σ21/20000σ21/2σ21σ2σ21σ21/20000σ21/2σ21σ2σ21σ21/20000σ21/2σ21σ2σ21σ21/20000σ21/2σ21σ2σ2100000σ21/2σ21σ2]

32 / 48

The pratical situation: scaling a fixed (similarity) matrix

Σ=σ2s[1dadbdc....da1dgdh....dbdg1dk....dcdhdk1........1........1........1........1]

33 / 48

Longitudinal data and time series

Specific but very common case of dependency

Many different ways to handle, depending on the objective and setup of the experiment or study, and prior knowledge

  • repeated observations just to increase accuracy vs interested in the change it self (eg growth curves)
  • if the latter: just general shape and differences in shape or specific coefficients of a known function
  • frequency and scale
34 / 48

Multivariate observations

Yet another case of dependency

As with longitudinal observations, but now multiple parameters measured on same objects

Relying on correlation between the multiple parameters to improve the estimation of the main one

35 / 48

Technical implications

no more ordinary least squares, but algorithms that have to converge to stable solutions

  • maximum likihood (RE ML)
  • markov chains
36 / 48

in R

Frequentist:

  • Maximum likelihood based (broad lme4 and nlme, or case-specific sommer,gamm4,...)

Bayesian:

  • Monte Carlo sampling from generating distributions (brms)
  • Integration of generating distributions (inla)

Hence: output will always differ a little

Let's dive in it: https://hw-appliedlinmixmodinr.netlify.app//

37 / 48

What does the document cover?

  • slow build up how to fit a linear (mixed) model in R
  • how to interpret the output
  • exercises
  • repetition of what is said already
  • a bit more background on the models
  • some side tracks
  • intro to Bayesian style of fitting mixed models
  • longitudinal models
  • model using relationship covariances

We will not cover everything in detail. It is a reference.

https://hw-appliedlinmixmodinr.netlify.app//

38 / 48

Generalized Linear Mixed models

39 / 48

Remember the additional problem : non-normality

Not normal distribution

  • concentrated at low values
  • long tail at the right
  • many low values but impossible to go below zero


  • approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore


  • We need generalized models.
  • not easy... (choice of distributions, algorithms, interpretation)
40 / 48

Generalized Linear Mixed models

The generalization part deals with two issues:

  1. the response can impossibly change proportionally with the predictor(s)

  2. the response is not normally distributed

while remaining within the realm of linear models

41 / 48

How?

  • More complex algorithms that allow non-normal error distribution
  • Modelling happens in a transformed space

linking

42 / 48

But Additional worries:

  • Choice of error distribution and link, on top of choice of fixed effects and random effects
  • Slow algorithms, more quickly issues with non-convergence
43 / 48

Main difficulty : choice of error distribution and link

Usually:

  • choice between a few classic solutions
  • go for a family : a combination of an error distribution with its default link
44 / 48

Classic solutions

  • count data: poisson or negative binomial distribution with log link

  • yes/no - 0/1 data: binomial distribution with logit link

  • skewed continuous data strictly positive: gamma distribution with log link

  • observed continuous proportions: beta distribution with logit link

  • data with too many zeros: zero-inflated versions of the above

  • other (censored, zero-inflated, ratios of 2 observed,...): seek help

45 / 48

Consequences for design/execution of the experiments

  • The closer to normal distribution the easier (more power)
  • The more extreme the more difficult it gets to fit a powerful model
  • Ultimately: binomial models (yes/no; diseased/healthy)
  • The less powerful, the more observations are needed to conclude something meaningful
  • Yet opportunity: shift from few precise observations to many rough observations
46 / 48

How practically?

  • indicating the family
  • more complex fitting algorithms (choice, computing time)
  • less simple checking whether assumptions are fulfilled
  • more complex interpretation
  • more complex calculation of contrasts
47 / 48

How practically in R?

https://hw-appliedgeneralizedlinmixmodinr.netlify.app/

Content of the document:

  • a bit of background
  • practical examples with lme4, brms and inla

    • lme4::glmer()
    • brms::brm()
    • INLA::inla()
  • how to interpret the output

  • examples of families and links

We will not cover everything in detail. It is a reference.

48 / 48

Setting the scene

An experiment compares the effect of two treatments on leaf length in plants.

12 observations for each treatment

2 / 48
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
sToggle scribble toolbox
oTile View: Overview of Slides
Esc Back to slideshow