This is the third post in the longitudinal data series. Previously, we introduced what longitudinal data is, how we can convert between long and wide format data-sets, and a basic multilevel model for analysis. Apparently, the basic multilevel model is not quite enough to analyse our imaginary randomised controlled trial (RCT) data-set. This post is going to continue our analysis and introduce a proper way to handle treatment effects in multilevel models.
As usual, we start by generating our longitudinal data-set and convert it into long format.
library(MASS) dat.tx.a <- mvrnorm(n=250, mu=c(30, 20, 28), Sigma=matrix(c(25.0, 17.5, 12.3, 17.5, 25.0, 17.5, 12.3, 17.5, 25.0), nrow=3, byrow=TRUE)) dat.tx.b <- mvrnorm(n=250, mu=c(30, 20, 22), Sigma=matrix(c(25.0, 17.5, 12.3, 17.5, 25.0, 17.5, 12.3, 17.5, 25.0), nrow=3, byrow=TRUE)) dat <- data.frame(rbind(dat.tx.a, dat.tx.b)) names(dat) <- c(‘measure.1’, ‘measure.2’, ‘measure.3’) dat <- data.frame(subject.id = factor(1:500), tx = rep(c(‘A’, ‘B’), each = 250), dat) rm(dat.tx.a, dat.tx.b) dat <- reshape(dat, varying = c(‘measure.1’, ‘measure.2’, ‘measure.3’), idvar = ‘subject.id’, direction = ‘long’)
This is a RCT data-set, implying that there should be some potential differences between the two treatment groups. Last time we ignored this heterogeneity and specified only a common time effect across the two groups. Intuitively, we could add the treatment variable as an fixed effect in the model to capture the between group difference.
library(lmerTest) m1 <- lmer(measure ~ time + tx + (1 | subject.id), data=dat)
We still used the package
lmerTest because it allows the test of fixed effect using approximate degrees of freedom. The model formula is the same as the one we used last time except we added the treatment variable
tx as an independent variable.
summary(m1) Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [lmerMod] Formula: measure ~ time + tx + (1 | subject.id) Data: dat REML criterion at convergence: 9788.5 Scaled residuals: Min 1Q Median 3Q Max -2.68035 -0.66131 0.07007 0.66151 2.87378 Random effects: Groups Name Variance Std.Dev. subject.id (Intercept) 9.918 3.149 Residual 32.114 5.667 Number of obs: 1500, groups: subject.id, 500 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 30.5932 0.4593 1474.4000 66.610 < 2e-16 *** time -2.2201 0.1792 999.0000 -12.389 < 2e-16 *** txB -2.3377 0.4062 498.0000 -5.755 1.51e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) time time -0.780 txB -0.442 0.000
It is clear from the model summary that both
tx effects are statistically significant and participants in treatment B had a depression score 2.34 points lower than those in treatment A. Should we conclude that treatment B is more effective? Definitely no. The model
m1 is not a proper way to compare group difference in the context of longitudinal data. The ‘treatment effect’ in
m1 is in fact the the average treatment effect along time. In other words, the coefficient -2.34 is the difference in depression score between treatments A and B averaged in time 1, 2, and 3. This is an RCT, we therefore expect to see no (or very little) difference at time 1 (pre-intervention) between the participants in the two treatments. We are actually not interested in knowing the average difference between the two treatments but how the trajectories differ at time 2 and 3.
The concept may be a bit tricky for first-timers but the execution is fairly simple: we just add the interaction term between
m2 <- lmer(measure ~ time * tx + (1 | subject.id), data=dat)
In R, pure interaction term is indicated by the operator
: so we could specify the model by
time + tx + time:tx. But we have a short hand instead:
time * tx. The
* operator asks R to include both main and treatment effects so we could just use the
* operator and skip to indicate all effects separately. Please note that the
* operator also works for higher dimension interactions. For instance,
A * B * C asks R to include all the three main effects, the three two-way interaction effects, and the one three-way interaction effects.
summary(m2) Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [lmerMod] Formula: measure ~ time * tx + (1 | subject.id) Data: dat REML criterion at convergence: 9721.9 Scaled residuals: Min 1Q Median 3Q Max -2.71431 -0.65906 0.08873 0.65358 2.63778 Random effects: Groups Name Variance Std.Dev. subject.id (Intercept) 10.60 3.256 Residual 30.06 5.483 Number of obs: 1500, groups: subject.id, 500 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 27.7106 0.5683 1456.6000 48.757 < 2e-16 *** time -0.7788 0.2452 998.0000 -3.176 0.00154 ** txB 3.4275 0.8037 1456.6000 4.264 2.13e-05 *** time:txB -2.8826 0.3468 998.0000 -8.312 4.44e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) time txB time -0.863 txB -0.707 0.610 time:txB 0.610 -0.707 -0.863
So we now have three fixed effects in the model
m2. In this model, the
time effect is the time effect for treatment A (as it is conditional on both treatment and treatment-time effects) and the
txB effect is the treatment effect at time 0 (an imaginary time point in this data-set). Both provide little information about the trajectory difference between the two treatments. The effect of interest is the
time:txB interaction. This term is a little bit tricky to interpret: the coefficient value indicates the difference between the two treatments per unit time increment, conditional on average time and treatment effect. In words that human can understand, this indicates how the two treatment groups diverge with the progression of time. For each unit time increase (e.g. from time 1 to time 2), participants in treatment B could expect a 2.88 lower depression score than those in treatment A after accounting for the average treatment effect difference. So at time 2, participants in treatment B could expect a -2.88 x 2 + 3.43 = -2.33 score difference and at time 3, participants in treatment B could expect a -2.88 x 3 + 3.43 = -5.21 difference. We should always consider both treatment main effect and treatment-time interaction effect when comparing treatment differences at a given time. But why? Try to see what would be the treatment difference at time 1 if you omit the treatment main effect.
I can tell you model
m2 is better than
m0 (the model in the last post) because the data-set is generated by me. But in real life, we could have a hard time deciding which model is better. One way to choose is to see the statistical significance of the additional variables using the
summary() function as we did above. Other common approaches are Analysis of Variance (ANOVA) table (only for nested models) and model fit indices.
m0 <- lmer(measure ~ time + (1 | subject.id), data=dat) anova(m0, m1, m2) Data: dat Models: object: measure ~ time + (1 | subject.id) ..1: measure ~ time + tx + (1 | subject.id) ..2: measure ~ time * tx + (1 | subject.id) Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) object 4 9825.8 9847.0 -4908.9 9817.8 ..1 5 9795.6 9822.2 -4892.8 9785.6 32.197 1 1.393e-08 *** ..2 6 9730.6 9762.5 -4859.3 9718.6 66.944 1 2.794e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here we include both
m2 in the
anova() function because
m0 is nested within
m1, which in turn nested within
m2. A model is nested within another if it is a subset of another. In other words, a model with independent variables A and B is nested within another model with independent variables A, B, and C.
This ANOVA table should be quite familiar to you if you have experience with ANOVA analysis. The
Df column indicates the degrees of freedom associated with the model, which simply means the number of parameters estimated in this case. For example, the 4 degrees of freedom in
m0 indicates there were 4 parameters estimated in the model (1. Coefficient of the fixed intercept effect, 2. Coefficient of time effect, 3. Variance of the random intercept, 4. Variance of the residual).
The AIC and BIC are two commonly used fit indices. Both AIC and BIC consider two factors: how well the model fit the data and how simple the model is. The difference between AIC and BIC is just the penalisation on model complexity (BIC penalise complex model more). I personally find AIC easier to interpret because it asymptotically (when n approaches infinity) converge to the leave-one-out cross-validation (LOOCV) prediction performance. We would cover the concept of cross-validation and prediction performance later so just leave it if you do not understand what they are.
logLik column is the logarithm of likelihood in estimating the models. Basically it is an indicator of how well the model fit the data.
Deviance is simply \(-2 \times logLik\). Why do we need that? Because the difference in deviance between two nested model follows a Chi-squared distribution under null hypothesis (no difference in deviance between two nested model) which we could use as a test for model fit difference. The difference in deviance is listed under the
Chisq column. The associated degrees of freedom of the Chi-squared statistics are listed under
Chi Df (which is simply the number of additional parameter). The last column should be quite clear now: it indicates the p-values for testing the difference in model fit between two nested models. Nonetheless, please be cautious about the ANOVA tests results because type I error would inflate furiously if there are many candidate models.
m2 is strictly better than the other two candidate models in AIC, BIC, and the ANOVA tests, we could now conclude it to be our best model so far. Of course there are still some ways to improve further. We will cover the possible improvements and the plotting of the predicted outcomes later. Stay tuned.