class: center, middle, inverse, title-slide .title[ # General and Generalized Linear Mixed models ] .subtitle[ ##
An Introduction for non-statisticians ] .author[ ### Joris De Wolf ] .institute[ ### Highwoods for VIB ] .date[ ###
14 and 15 November 2024 ] --- class: main-slide, inverse #Setting the scene .Mleft-column[ <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-1-1.png" width="95%" /> ] .Mright-column[ An experiment compares the effect of two treatments on leaf length in plants. 12 observations for each treatment ] --- class: main-slide,inverse # What is the aim of the analysis? -- Is it "*Which treatment is best in this experiment?*" <br/> -- <br/> The **real aim** is: - Can I say something more general about the difference between the treatments? Beyond this experiment. - Would I find the same or a similar effect again when I redo the experiment? - under *the same* conditions - under *similar* conditions --- class: inverse,top # Restricted to this experiment: <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-2-1.png" width="50%" /><img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-2-2.png" width="50%" /> --- class: inverse,top # In order to generalize: average + spread around average <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-3-1.png" width="50%" /><img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-3-2.png" width="50%" /> --- class: inverse,top # ...and of course number of observations plays a role <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-4-1.png" width="50%" /><img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-4-2.png" width="50%" /> --- class: inverse,top # The Classic way: The simple linear model normal distribution -> for small samples: t-distribution .pull-right[t-distributions for residuals Trt1] <br/> <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-5-1.png" width="50%" /><img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-5-2.png" width="50%" /> --- class: inverse,top # A closer look : the observations are not independent <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-6-1.png" width="50%" /><img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-6-2.png" width="50%" /> --- class: inverse,top # A closer look : observations may not be balanced <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-7-1.png" width="50%" /><img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-7-2.png" width="50%" /> --- class: main-slide, inverse # Requirements of the residuals in linear models .blockquote[ 1. need to belong to the same, normal distribution with mean = 0 2. need to be independent from each other ] Consequence: Simple linear models will not provide correct answers. --- class: inverse,top # Respect the hierarchy .Mleft-column[ <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-8-1.png" width="95%" /> ] .Mright-column[ .HL[Possible solution...] 1. summaries by pot 2. summary of the by-pot summaries ] --- class: inverse,top # But not all dependency is the same <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-9-1.png" width="47%" /><img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-9-2.png" width="47%" /> --- class: main-slide,inverse # Hence: - too optimistic: all independent (too many degrees of freedom) - too strict: ignore individual observations at lowest level (too few degrees of freedom) **What we really need:** a model that can find a right balance --- class: inverse,middle,top # Plenty of reasons why observations are not independent - experiments with study objects in blocks, or done on different days, assessed with different equipment, treated by different people,... - several observations on the same subject (same time, different time) - physical position of a study objects in an heterogeneous environment - genetic relationship among study objects - ... -- Later on these dependencies will be described via .hl[groups or classes], whereas for others, a .hl[continuous distance] could be used to indicate the relatedness. --- class: main-slide,inverse # Intermezzo 1: .HL[Consequences for design/execution of an experiment] - Be aware of the hierarchy - Pots perhaps more important than individual plants or leaves -- - Role of a factor in an experiment (...) --- class: main-slide,inverse # Intermezzo 2: .HL[Consequences for interpretation] - If you don't include variability of pots: conclusion limited to these pots - If you include variability of pots: conclusion extended to a similar set of pots --- class: main-slide, inverse # Intermezzo 3: .HL[Dependence, is it a curse or a blessing?] -- - The reality is not independent (even the one we mimic/create in experiments) - Related observations can help/stabilize/improve a prediction of cases with imprecise observations - **BUT**: you have to realize there is dependence and find a proper solution for it (not (always) easy) --- class: main-slide,inverse,top # Solution: **Linear mixed models** - hierarchical models / crossed models: dependence in discrete groups - longitudinal - spatial - relatedness : continuously varying dependence <br/> Models that use/model explicitly the - structure among the groups of observations (**random effects**) or - the **variance-covariance** among the individual observations. [blocks *vs* position in experiment] --- class: inverse,top # Additional problem : non-normality .Mleft-column[ <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-10-1.png" width="95%" /> ] -- .Mleft-column[ <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-11-1.png" width="95%" /> ] --- class: inverse,top # Additional problem : non-normality .Mleft-column[ <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-12-1.png" width="85%" height="25%" /> ] .Mleft-column[ **Not normal distribution** - concentrated at low values - long tail at the right - many low values but impossible to go below zero <br/> - approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore <br/> - We need *generalized* models. - not easy... (choice of distributions, algorithms, interpretation) ] --- class: title-slide,middle,center # General Linear Mixed Models: a closer look --- class: main-slide,inverse,top # Linear Mixed Model: why mixed? - Basic Linear model: fixed effects + simple residuals - Linear mixed model : - fixed effects + complex residuals - fixed effects + [random effects + residuals] .small[so: mixed models are models that include both fixed and random effects] -- LMM apply a more complex representation of the residuals that is capable to deal with the dependence structure. --- class: main-slide,inverse,top # Additional benefit of (grouping) random effects - 'efficient' representation of a factor - representing a larger, more general population of from which tested conditions are drawn - help in generalizing findings of an experiment -- hence alternative name for random effect: .hl[*Variance Component*] --- class: main-slide,inverse,top # Why considering a factor as a random effect? - Are we interested in the specific levels of the factor? - Are we interested in the factor as a specific source of variation that may have its impact on the observation? - Do we want to claim something generic outside the specific situation of the experiment? - Are we interested in improving our model without much interest in that factor? - How many levels of the factor do we have data for? - Is it truly random? --- class: inverse,middle,top # Some matrices... A simple situation with 2 treatments *A* and *B*, with 4 observations each .Mleft-column[ `$$y_1 = b_A + e_1 \\ y_2 = b_A + e_2 \\ y_3 = b_A + e_3 \\ \vdots \\ y_{7} = b_{B} + e_{7}\\ y_{8} = b_{B} + e_{8}$$` <br/> <br/> <br/> `$$Y = X\beta +\varepsilon$$` with `\(\varepsilon = e_1, e_2,..,e_8\)` ] -- .Mright-column[ every `\(e_i\)` is a drawn from a normal distribution, all with mean = zero. $$ N(0,\sigma_1), N(0,\sigma_2), ... , N(0,\sigma_8) $$ ] --- class: inverse,middle,top # classic models: 'iid' distributed residuals In linear models we assume 'residuals are identically and independently distributed'. .hl[Identically:] `\(\sigma_1 = \sigma_2 = \sigma_3 = ... = \sigma\)` so `\(N(0,\sigma_1), N(0,\sigma_2), ... , N(0,\sigma_8)\)` becomes `\(N(0,\sigma), N(0,\sigma), ... , N(0,\sigma)\)` <br/> .hl[independently] means the draw of `\(e_2\)` does not depend on `\(e_1\)` and vice versa. hence: `$$(e_1,e_2) \sim N(0,\Sigma)$$` with `$$\Sigma = \begin{bmatrix} \sigma^2 & cov\\ cov & \sigma^2\\ \end{bmatrix} = \begin{bmatrix} \sigma^2 & 0\\ 0 & \sigma^2\\ \end{bmatrix}$$` as cov = 0 --- class: inverse,top #remember bivariate normal distributions: `$$\Sigma_a = \begin{bmatrix} 1 & 0\\ 0 & 1\\ \end{bmatrix}\ \ \ \ \ \Sigma_b = \begin{bmatrix} 1 & 0\\ 0 & 2\\ \end{bmatrix} \ \ \ \ \ \Sigma_c = \begin{bmatrix} 1 & 0.6\\ 0.6 & 1\\ \end{bmatrix}$$` .center[ <!-- --> ] --- class: inverse,top # Closer look at the distribution of the residuals .HL[Suppose all 8 observations are independent] `\(\varepsilon\)` is distributed `\(N(0,\Sigma)\)` (=normal distribution with a mean 0 and variance-covariance matrix `\(\Sigma\)`) with: `$$\Sigma =\begin{bmatrix} \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 &\sigma^2\\\end{bmatrix}$$` `\(\Sigma\)` only contains one `\(\sigma\)` and that only on the diagonal (covariance = 0) i.e the simple linear model. --- class: inverse,top # Sigma in case of dependence .Mleft-column[ `$$\Sigma =\begin{bmatrix} \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 &\sigma^2\\\end{bmatrix}$$` The simple linear case ] -- .Mright-column[ `$$\Sigma =\begin{bmatrix}\sigma^2_1 & \sigma^2_a & \sigma^2_b & \sigma^2_c & . & . & . & . \\ \sigma^2_a & \sigma^2_2 & \sigma^2_k & \sigma^2_l & . & . & . & . \\ \sigma^2_b & \sigma^2_k & \sigma^2_3 & \sigma^2_k & . & . & . & .\\ \sigma^2_c & \sigma^2_l & \sigma^2_k & \sigma^2_4 & . & . & . & .\\ . & . & . & . & \sigma^2_5 & . & . & .\\ . & . & . & . & . & \sigma^2_6 & . & .\\ . & . & . & . & . & . & \sigma^2_8 & .\\ . & . & . & . & . & . & . &\sigma^2_9\\\end{bmatrix}$$` The ultimate but impossible case ] --- class: inverse,top # The pratical situation: residual variance by group .Mleft-column[ `$$\Sigma =\begin{bmatrix} \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 &\sigma^2\\\end{bmatrix}$$` <br/> `$$Y = X\beta + \varepsilon$$` ] .Mright-column[ `$$\Sigma =\begin{bmatrix} \color{yellow}{\sigma^2_1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \color{yellow}{\sigma^2_1} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \color{yellow}{\sigma^2_1} & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & \color{yellow}{\sigma^2_1} & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & \color{red}{\sigma^2_2} & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \color{red}{\sigma^2_2} & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \color{red}{\sigma^2_2} & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 &\color{red}{\sigma^2_2}\\\end{bmatrix}$$` <br/> `$$Y = X\beta + \varepsilon$$` with a different variance for *A* and *B* (heteroscedastic model) ] --- class: inverse,top # The pratical situation: groups .Mleft-column[ `$$\Sigma =\begin{bmatrix} \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma^2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 &\sigma^2\\\end{bmatrix}$$` <br/> `$$Y = X\beta + \varepsilon$$` ] .Mright-column[ `$$\Sigma =\begin{bmatrix} \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & 0 & 0 & 0 & 0 & 0 & 0 \\ \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & 0 & 0 & 0 & 0\\ 0 & 0 & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & 0 & 0\\ 0 & 0 & 0 & 0 & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1}\\ 0 & 0 & 0 & 0 & 0 & 0 & \color{red}{\sigma^2_1} &\color{yellow}{\sigma^2}\\\end{bmatrix}$$` <br/> split up `\(\Sigma\)` in a part that deals with the yellow and part that deals with the red part ("the random effect") `$$Y = X\beta + Zu + \varepsilon$$` ] --- class: inverse,top # The pratical situation: functions of dependence Other possibility is to apply a function that results in decreasing covariance over "distance": `$$\Sigma =\begin{bmatrix} \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & \color{orange}{\sigma^2_1/2} & 0 & 0 & 0 & 0 & 0 \\ \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & \color{orange}{\sigma^2_1/2} & 0 & 0 & 0 & 0 \\ \color{orange}{\sigma^2_1/2} & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & \color{orange}{\sigma^2_1/2} & 0 & 0 & 0\\ 0 & \color{orange}{\sigma^2_1/2} & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & \color{orange}{\sigma^2_1/2} & 0 & 0\\ 0 & 0 & \color{orange}{\sigma^2_1/2} & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & \color{orange}{\sigma^2_1/2} & 0\\ 0 & 0 & 0 & \color{orange}{\sigma^2_1/2} & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1} & \color{orange}{\sigma^2_1/2}\\ 0 & 0 & 0 & 0 & \color{orange}{\sigma^2_1/2} & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2} & \color{red}{\sigma^2_1}\\ 0 & 0 & 0 & 0 & 0 & \color{orange}{\sigma^2_1/2} & \color{red}{\sigma^2_1} & \color{yellow}{\sigma^2}\\\end{bmatrix}$$` --- class: inverse,top # The pratical situation: scaling a fixed (similarity) matrix `$$\Sigma =\sigma^2_s \begin{bmatrix} 1 & d_a & d_b & d_c & . & . & . & . \\ d_a & 1 & d_g & d_h & . & . & . & . \\ d_b & d_g & 1 & d_k & . & . & . & .\\ d_c & d_h & d_k & 1 & . & . & . & .\\ . & . & . & . & 1 & . & . & .\\ . & . & . & . & . & 1 & . & .\\ . & . & . & . & . & . & 1 & .\\ . & . & . & . & . & . & . &1\\\end{bmatrix}$$` --- class: main-slide,inverse,top # Longitudinal data and time series Specific but very common case of dependency Many different ways to handle, depending on the objective and setup of the experiment or study, and prior knowledge - repeated observations just to increase accuracy *vs* interested in the change it self (eg growth curves) - if the latter: just general shape and differences in shape or specific coefficients of a known function - frequency and scale --- class: main-slide,inverse,top # Multivariate observations Yet another case of dependency As with longitudinal observations, but now multiple parameters measured on same objects Relying on correlation between the multiple parameters to improve the estimation of the main one --- class: main-slide, inverse, top # Technical implications no more ordinary least squares, but algorithms that have to converge to stable solutions - maximum likihood (RE ML) - markov chains --- #in R Frequentist: - Maximum likelihood based (broad *lme4* and *nlme*, or case-specific *sommer*,*gamm4*,...) Bayesian: - Monte Carlo sampling from generating distributions (*brms*) - Integration of generating distributions (*inla*) Hence: output will always differ a little Let's dive in it: https://hw-appliedlinmixmodinr.netlify.app// --- # What does the document cover? - slow build up how to fit a linear (mixed) model in R - how to interpret the output - exercises - repetition of what is said already - a bit more background on the models - some side tracks - intro to Bayesian style of fitting mixed models - longitudinal models - model using relationship covariances We will not cover everything in detail. It is a reference. https://hw-appliedlinmixmodinr.netlify.app// --- class: title-slide,middle,center # Generalized Linear Mixed models --- class: inverse,top # Remember the additional problem : non-normality .Mleft-column[ <img src="data:image/png;base64,#Slides_GLMM_files/figure-html/unnamed-chunk-14-1.png" width="85%" height="25%" /> ] .Mleft-column[ **Not normal distribution** - concentrated at low values - long tail at the right - many low values but impossible to go below zero <br/> - approximations via normal distribution and t-distributions to judge the difference and spread will not work anymore <br/> - We need *generalized* models. - not easy... (choice of distributions, algorithms, interpretation) ] --- class: main-slide,inverse,top #.HL[Generalized] Linear Mixed models The *generalization* part deals with two issues: .blockquote[ 1. the response can impossibly change proportionally with the predictor(s) 2. the response is not normally distributed ] .small[while remaining within the realm of linear models] --- class: main-slide, inverse, top # How? - More complex algorithms that allow non-normal error distribution - Modelling happens in a transformed space  --- class: main-slide, inverse, top # But Additional worries: - Choice of error distribution and link, on top of choice of fixed effects and random effects - Slow algorithms, more quickly issues with non-convergence --- class: main-slide, inverse,top # Main difficulty : choice of error distribution and link Usually: - choice between a few classic solutions - go for a ***family*** : a combination of an error distribution with its default link --- class: main-slide, inverse,top # Classic solutions - .hl[count data]: poisson or negative binomial distribution with *log* link - .hl[yes/no - 0/1 data]: binomial distribution with *logit* link - .hl[skewed continuous data strictly positive]: gamma distribution with *log* link - .hl[observed continuous proportions]: beta distribution with *logit* link - .hl[data with too many zeros]: zero-inflated versions of the above - .hl[other (censored, zero-inflated, ratios of 2 observed,...)]: seek help --- class: main-slide,inverse # Consequences for design/execution of the experiments - The closer to normal distribution the easier (more power) - The more extreme the more difficult it gets to fit a powerful model - Ultimately: *binomial* models (yes/no; diseased/healthy) - The less powerful, the more observations are needed to conclude something meaningful - Yet opportunity: shift from few precise observations to many rough observations --- class: main-slide, inverse,top # How practically? - indicating the family - more complex fitting algorithms (choice, computing time) - less simple checking whether assumptions are fulfilled - more complex interpretation - more complex calculation of contrasts --- class: # How practically in R? https://hw-appliedgeneralizedlinmixmodinr.netlify.app/ Content of the document: - a bit of background - practical examples with lme4, brms and inla - `lme4::glmer()` - `brms::brm()` - `INLA::inla()` - how to interpret the output - examples of families and links We will not cover everything in detail. It is a reference.