Analysing Longitudinal Data: Multilevel Growth Models (II)

This is the third post in the longitudinal data series. Previously, we introduced what longitudinal data is, how we can convert between long and wide format data-sets, and a basic multilevel model for analysis. Apparently, the basic multilevel model is not quite enough to analyse our imaginary randomised controlled trial (RCT) data-set. This post is going to continue our analysis and introduce a proper way to handle treatment effects in multilevel models.

Generate a longitudinal dataset and convert it into long format

As usual, we start by generating our longitudinal data-set and convert it into long format.

library(MASS)

dat.tx.a <- mvrnorm(n=250, mu=c(30, 20, 28), 
                    Sigma=matrix(c(25.0, 17.5, 12.3, 
                                   17.5, 25.0, 17.5, 
                                   12.3, 17.5, 25.0), nrow=3, byrow=TRUE))

dat.tx.b <- mvrnorm(n=250, mu=c(30, 20, 22), 
                    Sigma=matrix(c(25.0, 17.5, 12.3, 
                                   17.5, 25.0, 17.5, 
                                   12.3, 17.5, 25.0), nrow=3, byrow=TRUE))

dat <- data.frame(rbind(dat.tx.a, dat.tx.b))
names(dat) <- c(‘measure.1’, ‘measure.2’, ‘measure.3’)

dat <- data.frame(subject.id = factor(1:500), tx = rep(c(‘A’, ‘B’), each = 250), dat)

rm(dat.tx.a, dat.tx.b)

dat <- reshape(dat, varying = c(‘measure.1’, ‘measure.2’, ‘measure.3’), 
               idvar = ‘subject.id’, direction = ‘long’)

A multilevel growth model considering treatment effect

This is a RCT data-set, implying that there should be some potential differences between the two treatment groups. Last time we ignored this heterogeneity and specified only a common time effect across the two groups. Intuitively, we could add the treatment variable as an fixed effect in the model to capture the between group difference.

library(lmerTest)
m1 <- lmer(measure ~ time + tx + (1 | subject.id), data=dat)

We still used the package lmerTest because it allows the test of fixed effect using approximate degrees of freedom. The model formula is the same as the one we used last time except we added the treatment variable tx as an independent variable.

summary(m1)

Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of
  freedom [lmerMod]
Formula: measure ~ time + tx + (1 | subject.id)
   Data: dat

REML criterion at convergence: 9788.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.68035 -0.66131  0.07007  0.66151  2.87378 

Random effects:
 Groups     Name        Variance Std.Dev.
 subject.id (Intercept)  9.918   3.149   
 Residual               32.114   5.667   
Number of obs: 1500, groups:  subject.id, 500

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   30.5932     0.4593 1474.4000  66.610  < 2e-16 ***
time          -2.2201     0.1792  999.0000 -12.389  < 2e-16 ***
txB           -2.3377     0.4062  498.0000  -5.755 1.51e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
     (Intr) time  
time -0.780       
txB  -0.442  0.000

It is clear from the model summary that both time and tx effects are statistically significant and participants in treatment B had a depression score 2.34 points lower than those in treatment A. Should we conclude that treatment B is more effective? Definitely no. The model m1 is not a proper way to compare group difference in the context of longitudinal data. The ‘treatment effect’ in m1 is in fact the the average treatment effect along time. In other words, the coefficient -2.34 is the difference in depression score between treatments A and B averaged in time 1, 2, and 3. This is an RCT, we therefore expect to see no (or very little) difference at time 1 (pre-intervention) between the participants in the two treatments. We are actually not interested in knowing the average difference between the two treatments but how the trajectories differ at time 2 and 3.

The concept may be a bit tricky for first-timers but the execution is fairly simple: we just add the interaction term between time and tx.

m2 <- lmer(measure ~ time * tx + (1 | subject.id), data=dat)

In R, pure interaction term is indicated by the operator : so we could specify the model by time + tx + time:tx. But we have a short hand instead: time * tx. The * operator asks R to include both main and treatment effects so we could just use the * operator and skip to indicate all effects separately. Please note that the * operator also works for higher dimension interactions. For instance, A * B * C asks R to include all the three main effects, the three two-way interaction effects, and the one three-way interaction effects.

summary(m2)

Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of
  freedom [lmerMod]
Formula: measure ~ time * tx + (1 | subject.id)
   Data: dat

REML criterion at convergence: 9721.9

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.71431 -0.65906  0.08873  0.65358  2.63778 

Random effects:
 Groups     Name        Variance Std.Dev.
 subject.id (Intercept) 10.60    3.256   
 Residual               30.06    5.483   
Number of obs: 1500, groups:  subject.id, 500

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   27.7106     0.5683 1456.6000  48.757  < 2e-16 ***
time          -0.7788     0.2452  998.0000  -3.176  0.00154 ** 
txB            3.4275     0.8037 1456.6000   4.264 2.13e-05 ***
time:txB      -2.8826     0.3468  998.0000  -8.312 4.44e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
         (Intr) time   txB   
time     -0.863              
txB      -0.707  0.610       
time:txB  0.610 -0.707 -0.863

So we now have three fixed effects in the model m2. In this model, the time effect is the time effect for treatment A (as it is conditional on both treatment and treatment-time effects) and the txB effect is the treatment effect at time 0 (an imaginary time point in this data-set). Both provide little information about the trajectory difference between the two treatments. The effect of interest is the time:txB interaction. This term is a little bit tricky to interpret: the coefficient value indicates the difference between the two treatments per unit time increment, conditional on average time and treatment effect. In words that human can understand, this indicates how the two treatment groups diverge with the progression of time. For each unit time increase (e.g. from time 1 to time 2), participants in treatment B could expect a 2.88 lower depression score than those in treatment A after accounting for the average treatment effect difference. So at time 2, participants in treatment B could expect a -2.88 x 2 + 3.43 = -2.33 score difference and at time 3, participants in treatment B could expect a -2.88 x 3 + 3.43 = -5.21 difference. We should always consider both treatment main effect and treatment-time interaction effect when comparing treatment differences at a given time. But why? Try to see what would be the treatment difference at time 1 if you omit the treatment main effect.

To choose the best model

I can tell you model m2 is better than m1 and m0 (the model in the last post) because the data-set is generated by me. But in real life, we could have a hard time deciding which model is better. One way to choose is to see the statistical significance of the additional variables using the summary() function as we did above. Other common approaches are Analysis of Variance (ANOVA) table (only for nested models) and model fit indices.

m0 <- lmer(measure ~ time + (1 | subject.id), data=dat)
anova(m0, m1, m2)

Data: dat
Models:
object: measure ~ time + (1 | subject.id)
..1: measure ~ time + tx + (1 | subject.id)
..2: measure ~ time * tx + (1 | subject.id)
       Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
object  4 9825.8 9847.0 -4908.9   9817.8                             
..1     5 9795.6 9822.2 -4892.8   9785.6 32.197      1  1.393e-08 ***
..2     6 9730.6 9762.5 -4859.3   9718.6 66.944      1  2.794e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here we include both m0, m1, and m2 in the anova() function because m0 is nested within m1, which in turn nested within m2. A model is nested within another if it is a subset of another. In other words, a model with independent variables A and B is nested within another model with independent variables A, B, and C.

This ANOVA table should be quite familiar to you if you have experience with ANOVA analysis. The Df column indicates the degrees of freedom associated with the model, which simply means the number of parameters estimated in this case. For example, the 4 degrees of freedom in m0 indicates there were 4 parameters estimated in the model (1. Coefficient of the fixed intercept effect, 2. Coefficient of time effect, 3. Variance of the random intercept, 4. Variance of the residual).

The AIC and BIC are two commonly used fit indices. Both AIC and BIC consider two factors: how well the model fit the data and how simple the model is. The difference between AIC and BIC is just the penalisation on model complexity (BIC penalise complex model more). I personally find AIC easier to interpret because it asymptotically (when n approaches infinity) converge to the leave-one-out cross-validation (LOOCV) prediction performance. We would cover the concept of cross-validation and prediction performance later so just leave it if you do not understand what they are.

The logLik column is the logarithm of likelihood in estimating the models. Basically it is an indicator of how well the model fit the data. Deviance is simply \(-2 \times logLik\). Why do we need that? Because the difference in deviance between two nested model follows a Chi-squared distribution under null hypothesis (no difference in deviance between two nested model) which we could use as a test for model fit difference. The difference in deviance is listed under the Chisq column. The associated degrees of freedom of the Chi-squared statistics are listed under Chi Df (which is simply the number of additional parameter). The last column should be quite clear now: it indicates the p-values for testing the difference in model fit between two nested models. Nonetheless, please be cautious about the ANOVA tests results because type I error would inflate furiously if there are many candidate models.

As m2 is strictly better than the other two candidate models in AIC, BIC, and the ANOVA tests, we could now conclude it to be our best model so far. Of course there are still some ways to improve further. We will cover the possible improvements and the plotting of the predicted outcomes later. Stay tuned.

Linear Mixed ModelRepeated Measures