General and Generalized Linear Mixed models

An Introduction for non-statisticians
Joris De Wolf
Highwoods for VIB

14 and 15 November 2024
1 / 48

Setting the scene

An experiment compares the effect of two treatments on leaf length in plants.

12 observations for each treatment

2 / 48

What is the aim of the analysis?3 / 48

What is the aim of the analysis?

Is it "Which treatment is best in this experiment?"

3 / 48

What is the aim of the analysis?

Is it "Which treatment is best in this experiment?"

The real aim is:

Can I say something more general about the difference between the treatments? Beyond this experiment.
Would I find the same or a similar effect again when I redo the experiment?
- under the same conditions
- under similar conditions

3 / 48

Restricted to this experiment:

4 / 48

In order to generalize: average + spread around average

5 / 48

...and of course number of observations plays a role

6 / 48

The Classic way: The simple linear model

normal distribution -> for small samples: t-distribution t-distributions for residuals Trt1

7 / 48

A closer look : the observations are not independent

8 / 48

A closer look : observations may not be balanced

9 / 48

Requirements of the residuals in linear models

need to belong to the same, normal distribution with mean = 0
need to be independent from each other

Consequence: Simple linear models will not provide correct answers.

10 / 48

Respect the hierarchy

Possible solution...

summaries by pot
summary of the by-pot summaries

11 / 48

But not all dependency is the same

12 / 48

Hence:

too optimistic: all independent (too many degrees of freedom)
too strict: ignore individual observations at lowest level (too few degrees of freedom)

What we really need: a model that can find a right balance

13 / 48

Plenty of reasons why observations are not independent

experiments with study objects in blocks, or done on different days, assessed with different equipment, treated by different people,...
several observations on the same subject (same time, different time)
physical position of a study objects in an heterogeneous environment
genetic relationship among study objects
...

14 / 48

Plenty of reasons why observations are not independent

experiments with study objects in blocks, or done on different days, assessed with different equipment, treated by different people,...
several observations on the same subject (same time, different time)
physical position of a study objects in an heterogeneous environment
genetic relationship among study objects
...

Later on these dependencies will be described via groups or classes, whereas for others, a continuous distance could be used to indicate the relatedness.

14 / 48

Intermezzo 1:

Consequences for design/execution of an experiment

Be aware of the hierarchy
Pots perhaps more important than individual plants or leaves

15 / 48

Intermezzo 1:

Consequences for design/execution of an experiment

Be aware of the hierarchy
Pots perhaps more important than individual plants or leaves
Role of a factor in an experiment (...)

15 / 48

Intermezzo 2:

Consequences for interpretation

If you don't include variability of pots: conclusion limited to these pots
If you include variability of pots: conclusion extended to a similar set of pots

16 / 48

Intermezzo 3:

Dependence, is it a curse or a blessing?

17 / 48

Intermezzo 3:

Dependence, is it a curse or a blessing?

The reality is not independent (even the one we mimic/create in experiments)
Related observations can help/stabilize/improve a prediction of cases with imprecise observations
BUT: you have to realize there is dependence and find a proper solution for it (not (always) easy)

17 / 48

Solution: Linear mixed models

hierarchical models / crossed models: dependence in discrete groups
longitudinal - spatial - relatedness : continuously varying dependence

Models that use/model explicitly the

structure among the groups of observations (random effects) or
the variance-covariance among the individual observations.

[blocks vs position in experiment]

18 / 48

Additional problem : non-normality

19 / 48

Additional problem : non-normality

19 / 48

Additional problem : non-normality

Not normal distribution

concentrated at low values
long tail at the right
many low values but impossible to go below zero
approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore
We need generalized models.
not easy... (choice of distributions, algorithms, interpretation)

20 / 48

General Linear Mixed Models: a closer look21 / 48

Linear Mixed Model: why mixed?

Basic Linear model: fixed effects + simple residuals
Linear mixed model :
- fixed effects + complex residuals
- fixed effects + [random effects + residuals]

so: mixed models are models that include both fixed and random effects

22 / 48

Linear Mixed Model: why mixed?

Basic Linear model: fixed effects + simple residuals
Linear mixed model :
- fixed effects + complex residuals
- fixed effects + [random effects + residuals]

so: mixed models are models that include both fixed and random effects

LMM apply a more complex representation of the residuals that is capable to deal with the dependence structure.

22 / 48

Additional benefit of (grouping) random effects'efficient' representation of a factor
representing a larger, more general population of from which tested conditions are drawn
help in generalizing findings of an experiment
23 / 48

Additional benefit of (grouping) random effects

'efficient' representation of a factor
representing a larger, more general population of from which tested conditions are drawn
help in generalizing findings of an experiment

hence alternative name for random effect: Variance Component

23 / 48

Why considering a factor as a random effect?Are we interested in the specific levels of the factor?
Are we interested in the factor as a specific source of variation that may have its impact on the observation?
Do we want to claim something generic outside the specific situation of the experiment?
Are we interested in improving our model without much interest in that factor?
How many levels of the factor do we have data for?
Is it truly random?
24 / 48

Some matrices...

A simple situation with 2 treatments A and B, with 4 observations each

$y_{1} = b_{A} + e_{1} y_{2} = b_{A} + e_{2} y_{3} = b_{A} + e_{3} ⋮ y_{7} = b_{B} + e_{7} y_{8} = b_{B} + e_{8}$

$Y = X β + ε$

with $ε = e_{1}, e_{2}, . ., e_{8}$

25 / 48

Some matrices...

A simple situation with 2 treatments A and B, with 4 observations each

$y_{1} = b_{A} + e_{1} y_{2} = b_{A} + e_{2} y_{3} = b_{A} + e_{3} ⋮ y_{7} = b_{B} + e_{7} y_{8} = b_{B} + e_{8}$

$Y = X β + ε$

with $ε = e_{1}, e_{2}, . ., e_{8}$

every $e_{i}$ is a drawn from a normal distribution, all with mean = zero. $N (0, σ_{1}), N (0, σ_{2}), . . ., N (0, σ_{8})$

25 / 48

classic models: 'iid' distributed residuals

In linear models we assume 'residuals are identically and independently distributed'.

Identically:

$σ_{1} = σ_{2} = σ_{3} = . . . = σ$

so $N (0, σ_{1}), N (0, σ_{2}), . . ., N (0, σ_{8})$ becomes $N (0, σ), N (0, σ), . . ., N (0, σ)$

independently means the draw of $e_{2}$ does not depend on and vice versa.

hence:

with

as cov = 0

26 / 48

remember bivariate normal distributions:

27 / 48

Closer look at the distribution of the residuals

Suppose all 8 observations are independent

is distributed (=normal distribution with a mean 0 and variance-covariance matrix )

with:

only contains one and that only on the diagonal (covariance = 0)

i.e the simple linear model.

28 / 48

Sigma in case of dependence

The simple linear case

29 / 48

Sigma in case of dependence

The simple linear case

The ultimate but impossible case

29 / 48

The pratical situation: residual variance by group

with a different variance for A and B (heteroscedastic model)

30 / 48

The pratical situation: groups

split up in a part that deals with the yellow and part that deals with the red part ("the random effect")

31 / 48

The pratical situation: functions of dependence

Other possibility is to apply a function that results in decreasing covariance over "distance":

32 / 48

The pratical situation: scaling a fixed (similarity) matrix

33 / 48

Longitudinal data and time series

Specific but very common case of dependency

Many different ways to handle, depending on the objective and setup of the experiment or study, and prior knowledge

repeated observations just to increase accuracy vs interested in the change it self (eg growth curves)
if the latter: just general shape and differences in shape or specific coefficients of a known function
frequency and scale

34 / 48

Multivariate observations

Yet another case of dependency

As with longitudinal observations, but now multiple parameters measured on same objects

Relying on correlation between the multiple parameters to improve the estimation of the main one

35 / 48

Technical implications

no more ordinary least squares, but algorithms that have to converge to stable solutions

maximum likihood (RE ML)
markov chains

36 / 48

in R

Frequentist:

Maximum likelihood based (broad lme4 and nlme, or case-specific sommer,gamm4,...)

Bayesian:

Monte Carlo sampling from generating distributions (brms)
Integration of generating distributions (inla)

Hence: output will always differ a little

Let's dive in it: https://hw-appliedlinmixmodinr.netlify.app//

37 / 48

What does the document cover?

slow build up how to fit a linear (mixed) model in R
how to interpret the output
exercises
repetition of what is said already
a bit more background on the models
some side tracks
intro to Bayesian style of fitting mixed models
longitudinal models
model using relationship covariances

We will not cover everything in detail. It is a reference.

https://hw-appliedlinmixmodinr.netlify.app//

38 / 48

Generalized Linear Mixed models39 / 48

Remember the additional problem : non-normality

Not normal distribution

concentrated at low values
long tail at the right
many low values but impossible to go below zero
approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore
We need generalized models.
not easy... (choice of distributions, algorithms, interpretation)

40 / 48

Generalized Linear Mixed models

The generalization part deals with two issues:

the response can impossibly change proportionally with the predictor(s)
the response is not normally distributed

while remaining within the realm of linear models

41 / 48

How?

More complex algorithms that allow non-normal error distribution
Modelling happens in a transformed space

linking

42 / 48

But Additional worries:Choice of error distribution and link, on top of choice of fixed effects and random effects
Slow algorithms, more quickly issues with non-convergence
43 / 48

Main difficulty : choice of error distribution and link

Usually:

choice between a few classic solutions
go for a family : a combination of an error distribution with its default link

44 / 48

Classic solutions

count data: poisson or negative binomial distribution with log link
yes/no - 0/1 data: binomial distribution with logit link
skewed continuous data strictly positive: gamma distribution with log link
observed continuous proportions: beta distribution with logit link
data with too many zeros: zero-inflated versions of the above
other (censored, zero-inflated, ratios of 2 observed,...): seek help

45 / 48

Consequences for design/execution of the experimentsThe closer to normal distribution the easier (more power)
The more extreme the more difficult it gets to fit a powerful model
Ultimately: binomial models (yes/no; diseased/healthy)
The less powerful, the more observations are needed to conclude something meaningful
Yet opportunity: shift from few precise observations to many rough observations
46 / 48

How practically?indicating the family
more complex fitting algorithms (choice, computing time)
less simple checking whether assumptions are fulfilled
more complex interpretation
more complex calculation of contrasts
47 / 48

How practically in R?

https://hw-appliedgeneralizedlinmixmodinr.netlify.app/

Content of the document:

a bit of background
practical examples with lme4, brms and inla
- lme4::glmer()
- brms::brm()
- INLA::inla()
how to interpret the output
examples of families and links

We will not cover everything in detail. It is a reference.

48 / 48

General and Generalized Linear Mixed models

An Introduction for non-statisticians
Joris De Wolf
Highwoods for VIB

14 and 15 November 2024
1 / 48

Setting the scene

An experiment compares the effect of two treatments on leaf length in plants.

12 observations for each treatment

2 / 48

What is the aim of the analysis?3 / 48

What is the aim of the analysis?

Is it "Which treatment is best in this experiment?"

3 / 48

What is the aim of the analysis?

Is it "Which treatment is best in this experiment?"

The real aim is:

Can I say something more general about the difference between the treatments? Beyond this experiment.
Would I find the same or a similar effect again when I redo the experiment?
- under the same conditions
- under similar conditions

3 / 48

Restricted to this experiment:

4 / 48

In order to generalize: average + spread around average

5 / 48

...and of course number of observations plays a role

6 / 48

The Classic way: The simple linear model

normal distribution -> for small samples: t-distribution t-distributions for residuals Trt1

7 / 48

A closer look : the observations are not independent

8 / 48

A closer look : observations may not be balanced

9 / 48

Requirements of the residuals in linear models

need to belong to the same, normal distribution with mean = 0
need to be independent from each other

Consequence: Simple linear models will not provide correct answers.

10 / 48

Respect the hierarchy

Possible solution...

summaries by pot
summary of the by-pot summaries

11 / 48

But not all dependency is the same

12 / 48

Hence:

too optimistic: all independent (too many degrees of freedom)
too strict: ignore individual observations at lowest level (too few degrees of freedom)

What we really need: a model that can find a right balance

13 / 48

Plenty of reasons why observations are not independent

experiments with study objects in blocks, or done on different days, assessed with different equipment, treated by different people,...
several observations on the same subject (same time, different time)
physical position of a study objects in an heterogeneous environment
genetic relationship among study objects
...

14 / 48

Plenty of reasons why observations are not independent

experiments with study objects in blocks, or done on different days, assessed with different equipment, treated by different people,...
several observations on the same subject (same time, different time)
physical position of a study objects in an heterogeneous environment
genetic relationship among study objects
...

Later on these dependencies will be described via groups or classes, whereas for others, a continuous distance could be used to indicate the relatedness.

14 / 48

Intermezzo 1:

Consequences for design/execution of an experiment

Be aware of the hierarchy
Pots perhaps more important than individual plants or leaves

15 / 48

Intermezzo 1:

Consequences for design/execution of an experiment

Be aware of the hierarchy
Pots perhaps more important than individual plants or leaves
Role of a factor in an experiment (...)

15 / 48

Intermezzo 2:

Consequences for interpretation

If you don't include variability of pots: conclusion limited to these pots
If you include variability of pots: conclusion extended to a similar set of pots

16 / 48

Intermezzo 3:

Dependence, is it a curse or a blessing?

17 / 48

Intermezzo 3:

Dependence, is it a curse or a blessing?

The reality is not independent (even the one we mimic/create in experiments)
Related observations can help/stabilize/improve a prediction of cases with imprecise observations
BUT: you have to realize there is dependence and find a proper solution for it (not (always) easy)

17 / 48

Solution: Linear mixed models

hierarchical models / crossed models: dependence in discrete groups
longitudinal - spatial - relatedness : continuously varying dependence

Models that use/model explicitly the

structure among the groups of observations (random effects) or
the variance-covariance among the individual observations.

[blocks vs position in experiment]

18 / 48

Additional problem : non-normality

19 / 48

Additional problem : non-normality

19 / 48

Additional problem : non-normality

Not normal distribution

concentrated at low values
long tail at the right
many low values but impossible to go below zero
approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore
We need generalized models.
not easy... (choice of distributions, algorithms, interpretation)

20 / 48

General Linear Mixed Models: a closer look21 / 48

Linear Mixed Model: why mixed?

Basic Linear model: fixed effects + simple residuals
Linear mixed model :
- fixed effects + complex residuals
- fixed effects + [random effects + residuals]

so: mixed models are models that include both fixed and random effects

22 / 48

Linear Mixed Model: why mixed?

Basic Linear model: fixed effects + simple residuals
Linear mixed model :
- fixed effects + complex residuals
- fixed effects + [random effects + residuals]

so: mixed models are models that include both fixed and random effects

LMM apply a more complex representation of the residuals that is capable to deal with the dependence structure.

22 / 48

Additional benefit of (grouping) random effects'efficient' representation of a factor
representing a larger, more general population of from which tested conditions are drawn
help in generalizing findings of an experiment
23 / 48

Additional benefit of (grouping) random effects

'efficient' representation of a factor
representing a larger, more general population of from which tested conditions are drawn
help in generalizing findings of an experiment

hence alternative name for random effect: Variance Component

23 / 48

Why considering a factor as a random effect?Are we interested in the specific levels of the factor?
Are we interested in the factor as a specific source of variation that may have its impact on the observation?
Do we want to claim something generic outside the specific situation of the experiment?
Are we interested in improving our model without much interest in that factor?
How many levels of the factor do we have data for?
Is it truly random?
24 / 48

Some matrices...

A simple situation with 2 treatments A and B, with 4 observations each

with

25 / 48

Some matrices...

A simple situation with 2 treatments A and B, with 4 observations each

with

every is a drawn from a normal distribution, all with mean = zero.

25 / 48

classic models: 'iid' distributed residuals

In linear models we assume 'residuals are identically and independently distributed'.

Identically:

so becomes

independently means the draw of does not depend on and vice versa.

hence:

with

as cov = 0

26 / 48

remember bivariate normal distributions:

27 / 48

Closer look at the distribution of the residuals

Suppose all 8 observations are independent

is distributed (=normal distribution with a mean 0 and variance-covariance matrix )

with:

only contains one and that only on the diagonal (covariance = 0)

i.e the simple linear model.

28 / 48

Sigma in case of dependence

The simple linear case

29 / 48

Sigma in case of dependence

The simple linear case

The ultimate but impossible case

29 / 48

The pratical situation: residual variance by group

with a different variance for A and B (heteroscedastic model)

30 / 48

The pratical situation: groups

split up in a part that deals with the yellow and part that deals with the red part ("the random effect")

31 / 48

The pratical situation: functions of dependence

Other possibility is to apply a function that results in decreasing covariance over "distance":

32 / 48

The pratical situation: scaling a fixed (similarity) matrix

33 / 48

Longitudinal data and time series

Specific but very common case of dependency

Many different ways to handle, depending on the objective and setup of the experiment or study, and prior knowledge

repeated observations just to increase accuracy vs interested in the change it self (eg growth curves)
if the latter: just general shape and differences in shape or specific coefficients of a known function
frequency and scale

34 / 48

Multivariate observations

Yet another case of dependency

As with longitudinal observations, but now multiple parameters measured on same objects

Relying on correlation between the multiple parameters to improve the estimation of the main one

35 / 48

Technical implications

no more ordinary least squares, but algorithms that have to converge to stable solutions

maximum likihood (RE ML)
markov chains

36 / 48

in R

Frequentist:

Maximum likelihood based (broad lme4 and nlme, or case-specific sommer,gamm4,...)

Bayesian:

Monte Carlo sampling from generating distributions (brms)
Integration of generating distributions (inla)

Hence: output will always differ a little

Let's dive in it: https://hw-appliedlinmixmodinr.netlify.app//

37 / 48

What does the document cover?

slow build up how to fit a linear (mixed) model in R
how to interpret the output
exercises
repetition of what is said already
a bit more background on the models
some side tracks
intro to Bayesian style of fitting mixed models
longitudinal models
model using relationship covariances

We will not cover everything in detail. It is a reference.

https://hw-appliedlinmixmodinr.netlify.app//

38 / 48

Generalized Linear Mixed models39 / 48

Remember the additional problem : non-normality

Not normal distribution

concentrated at low values
long tail at the right
many low values but impossible to go below zero
approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore
We need generalized models.
not easy... (choice of distributions, algorithms, interpretation)

40 / 48

Generalized Linear Mixed models

The generalization part deals with two issues:

the response can impossibly change proportionally with the predictor(s)
the response is not normally distributed

while remaining within the realm of linear models

41 / 48

How?

More complex algorithms that allow non-normal error distribution
Modelling happens in a transformed space

linking

42 / 48

But Additional worries:Choice of error distribution and link, on top of choice of fixed effects and random effects
Slow algorithms, more quickly issues with non-convergence
43 / 48

Main difficulty : choice of error distribution and link

Usually:

choice between a few classic solutions
go for a family : a combination of an error distribution with its default link

44 / 48

Classic solutions

count data: poisson or negative binomial distribution with log link
yes/no - 0/1 data: binomial distribution with logit link
skewed continuous data strictly positive: gamma distribution with log link
observed continuous proportions: beta distribution with logit link
data with too many zeros: zero-inflated versions of the above
other (censored, zero-inflated, ratios of 2 observed,...): seek help

45 / 48

Consequences for design/execution of the experimentsThe closer to normal distribution the easier (more power)
The more extreme the more difficult it gets to fit a powerful model
Ultimately: binomial models (yes/no; diseased/healthy)
The less powerful, the more observations are needed to conclude something meaningful
Yet opportunity: shift from few precise observations to many rough observations
46 / 48

How practically?indicating the family
more complex fitting algorithms (choice, computing time)
less simple checking whether assumptions are fulfilled
more complex interpretation
more complex calculation of contrasts
47 / 48

How practically in R?

https://hw-appliedgeneralizedlinmixmodinr.netlify.app/

Content of the document:

a bit of background
practical examples with lme4, brms and inla
- lme4::glmer()
- brms::brm()
- INLA::inla()
how to interpret the output
examples of families and links

We will not cover everything in detail. It is a reference.

48 / 48

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
s	Toggle scribble toolbox
o	Tile View: Overview of Slides

General and Generalized Linear Mixed models

An Introduction for non-statisticians

Joris De Wolf

Highwoods for VIB

14 and 15 November 2024

Setting the scene

What is the aim of the analysis?

What is the aim of the analysis?

What is the aim of the analysis?

Restricted to this experiment:

In order to generalize: average + spread around average

...and of course number of observations plays a role

The Classic way: The simple linear model

A closer look : the observations are not independent

A closer look : observations may not be balanced

Requirements of the residuals in linear models

Respect the hierarchy

But not all dependency is the same

Hence:

Plenty of reasons why observations are not independent

Plenty of reasons why observations are not independent

Intermezzo 1:

Intermezzo 1:

Intermezzo 2:

Intermezzo 3:

Intermezzo 3:

Solution: Linear mixed models

Additional problem : non-normality

Additional problem : non-normality

Additional problem : non-normality

General Linear Mixed Models: a closer look

Linear Mixed Model: why mixed?

Linear Mixed Model: why mixed?

Additional benefit of (grouping) random effects

Additional benefit of (grouping) random effects

Why considering a factor as a random effect?

Some matrices...

Some matrices...

classic models: 'iid' distributed residuals

remember bivariate normal distributions:

Closer look at the distribution of the residuals

Sigma in case of dependence

Sigma in case of dependence

The pratical situation: residual variance by group

The pratical situation: groups

The pratical situation: functions of dependence

The pratical situation: scaling a fixed (similarity) matrix

Longitudinal data and time series

Multivariate observations

Technical implications

in R

What does the document cover?

Generalized Linear Mixed models

Remember the additional problem : non-normality

Generalized Linear Mixed models

How?

But Additional worries:

Main difficulty : choice of error distribution and link

Classic solutions

Consequences for design/execution of the experiments

How practically?

How practically in R?

Setting the scene

Help

General and Generalized Linear Mixed models

General and Generalized Linear Mixed models

An Introduction for non-statisticians

Joris De Wolf

Highwoods for VIB

14 and 15 November 2024

Setting the scene

What is the aim of the analysis?

What is the aim of the analysis?

What is the aim of the analysis?

Restricted to this experiment:

In order to generalize: average + spread around average

...and of course number of observations plays a role

The Classic way: The simple linear model

A closer look : the observations are not independent

A closer look : observations may not be balanced