Thursday, July 10, 2014

Within-Subject Error Bars

Psychologists and neuroscientists often plot error bars to show the variability in their obtained results. Very often, the error bar shows the standard error of the mean, or SEM. A quick rule of thumb when looking at graphs is to check whether the error bars between conditions overlap - if they do, you can conclude that the the means are NOT significantly different at p < 0.05. Note that the converse is not true, that is, non-overlapping error bars ≠ significant difference. The means have to be approximately 3SEM apart (assuming the two means have similar SEMs).

This rule, however, only applies to between-subject designs. What about within-subject (i.e. repeated measures) designs? Error bars would be too large as they take into account both between-subject and within-subject variance (variability in the paired difference between the different conditions in the same subject). In my opinion, plotting between-subject SEM for within-subject tests (e.g. paired t-tests) is meaningless, because conflating between-subject and within-subject variance makes the error bars uninterpretable.

A solution was first proposed by Loftus and Masson (1994). Briefly, it involves running a repeated measures anova, and taking the mean squared error (MSE) from the appropriate (repeated-measures) ANOVA analysis (i.e. the denominator from the appropriate F-test). SEMwithin can then be calculated by taking the square root of MS divided by N. This procedure works because a repeated-measures ANOVA removes between-subject variance first, so the MSE captures only within-subject variability.

If all you want to know is how to plot within-subject error bars, you can stop reading here. If you would like a more nuanced discussion, read on.

**************************

The Loftus and Masson method, however, yields a single SEM value for all conditions. The implicit assumption is that the variance between the pairwise difference between conditions is constant (i.e. sphericity assumption). Of course, this assumption can be tested and corrected (e.g., Greenhouse-Geisser or Huynd-Feldt). But wouldn't it be nice if we could just tell from the error bars (i.e. the same way we can tell if the homogeneity of variance is violated from the error bars of a between-subject design)? This prompted Cousineau (2005) to propose a different solution, which involves "normalizing" data from all subjects such that the between-subject variance is removed. This is done by subtracting, for each condition and subject (i.e. each "cell"), the subject average across conditions, and then adding the grand average of all cells. Hence all subjects will have the same average, but the within-subject effects are preserved. The SEM can then be calculated as per normal, and each condition will have it's own error bar. Cousineau, unfortunately, did not take into account the fact that normalization (in particular, adding the grand average of all cells) induces positive correlations between the cells, so the error bars are a little too small compared to those calculated by the Loftus and Masson method. This discrepancy was identified by Morey (2005) who proposed a simple solution of multiplying the Cousineau variance (note: NOT SEM) by M/(M-1), where M = number of conditions, before calculating error bars**.

Ahh... but the story is not over yet. I just found out that Franz and Loftus (2012) published a recent study challenging the normalization method. They argue that the normalization method does not actually allow for the checking of the sphericity assumption and even though the method produces different error bars for each condition, any difference in the error bars will not not interpretable. Specifically, sphericity requires inspecting all pairwise differences between conditions, and the only way to do it visually, is to plot all the pairwise differences, and see if the variance of those pairwise differences are similar. Franz and Loftus argue that these pairwise differences should be plotted next to the plot of means for visual inspection.

My thoughts? I have to say, I still recommend Loftus and Masson + appropriate test for sphericity. It isn't THAT difficult to compute - an actual charge by detractors of the method. I mean, it's not computationally intractable or anything - just run the ANOVA >.<. That said, I wouldn't write off a paper which uses Morey (2005). For the purpose of "eye-balling" significance... those error bars do the trick as well. And the condition-specific variance reflected in those error bars is still information (though I need to think a little harder about what it might be good for). At the end of the day, I see error bars as visual aids*. Authors ought to be clear and honest about how the error bars were obtained. Readers, however, should always defer to the results of the appropriate statistical test to evaluate findings.

*On that note, I don't really think plotting the variability of the pairwise differences is all that useful. If there are say, 6 conditions... you'll need to plot 15 pairwise differences... and that's not really all that helpful for visualization, which defeats the purpose of graphing in the first place.

** Previous version of this posts states that the correction is M/(M-1). Thanks to Eric Garr for pointing out the mistake :)