# Chapter 20 Analysis of Variance

\(t\)-tests are the standard approach for comparing means between two groups. When you want to compare means between more than two groups a technique called “Analysis of Variance” (ANOVA) is used.

## 20.1 Hypotheses for ANOVA

When using ANOVA to compare means, the null and alternative hypotheses are:

- \(H_0\): The means of all the groups are equal
- \(H_A\): At least one of the means is different from the others

## 20.2 ANOVA, assumptions

ANOVA assumes:

- The measurements in every group represent a random sample from the corresponding population
- The varaible of interest is normally distributed
- The variance is approximately the same in all the groups

## 20.3 ANOVA, key idea

The key idea behind ANOVA is that:

- If the observations in each group are drawn from populations with equal means (and variances) then the variation
*between*group means should be similar to the inter-individual variation*within groups*.

## 20.4 Partioning of sum of squares

Another way to think about ANOVA is as a “partitioning of variance”. The total variance among all the individuals across groups can be decomposed into:

variance of the group means around the “grand mean”;

variance of individuals around the group means.

However, rather than using variance we use sums of square deviations around the respectives means (usually shortened to “sums of squares”).

This decomposition is represented visually in the figure below:

## 20.5 Mathematical partitioning of sums of squares

Variable \(X\) with a total sample of \(N\) observations, partitioned ito \(k\) groups. The sample size of the g-th group is \(n_g\), and thus \(N = \sum_{g=1}^{k} n_g\). Let \(\overline{X}\) indicate the grand mean of \(X\) and \(\overline{X}_g\) indicate the mean of \(X\) in the g-th group.

**Total sums of squares**

We call the sum of the squared deviations around the grand mean the “total sum of sqaures” (\(SS_\text{total}\)).

\[ SS_\text{total} = \sum_{i=1}^N (x_i-\overline{X})^2 \]

- The total degrees of freedom is: \(df_\text{total} = N - 1\)

**Group sum of squares and group mean square deviation**

The sum of squared deviations of the group means around the grand mean is called the “group sum of squares”: \[ SS_\text{group} = \sum_{g=1}^kn_g(\overline{X}_g - \overline{X})^2 \]

The degrees of freedom associated with the group sum of squares is: \(df_\text{group} = k - 1\)

Define the “group mean squared deviation” as: \[ MS_\text{group} = \frac{SS_\text{group}}{k-1} \]

**Error sum of squares and error mean square deviation**

The sum of squared deviations of the individual observations about their respective group means is called the “error sum of squares”: \[ SS_\text{error} = \sum_{g=1}^k\sum_{i=1}^{n_g} (x_{i,g} - \overline{X}_g)^2 \]

The degrees of freedom associated with the error sum of squares is: \(df_\text{error} = N - k\)

Define the “error mean squared deviation” as: \[ MS_\text{error} = \frac{SS_\text{error}}{N-k} \]

**Variation explained: \(R^2\)**

We can summarize the contribution of the group differences to the total variation in the data, using the \(R^2\) **value**.

To do this we note that the total sum of squares is the sum of the group and error sum of squares: \[ SS_\text{total} = SS_\text{group} + SS_\text{error} \]

The \(R^2\)-value is thus defined as: \[ R^2 = \frac{SS_\text{groups}}{SS_\text{total}} \]

## 20.6 ANOVA test statistic and sampling distribution

**F-statistic**

The test statistic used in ANOVA is designated \(F\), and is defined as follows:

\[ F = \frac{\text{group mean square}}{\text{error mean square}} = \frac{\text{MS}_\text{group}}{\text{MS}_\text{error}} \]

Under the null hypothesis, the between group and within group variances are similar and thus the \(F\) statistic should be approximately 1.

Large values of the \(F\)-statistic means that the between group variance exceeds the within group variance, indicating that at least one of the means is different from the others

**The F-distribution**

The sampling distribution of the \(F\)-statistic is called the \(F\)-distribution.

The \(F\)-distribution depends on two parameters:

the degrees of freedom associated with the group sum of squares, \(df_\text{group} = k - 1\);

the degrees of freedom associated with the error sum of squares, \(df_\text{error} = N - k\);

We designate a particular \(F\)-distribution as \(F_{k-1,N-k}\). We will illustrate what an F-distribution looks like for particular parameters below.

## 20.7 ANOVA tables

The results of an analysis of variance test are often presented in the form of a table organized as follows:

Source | \(SS\) | \(df\) | \(MS\) | \(F\) |
---|---|---|---|---|

Group | \(SS_\text{group}\) | \(k-1\) | \(MS_\text{group}\) | \(MS_\text{group}/MS_\text{error}\) |

Error | \(SS_\text{error}\) | \(N-k\) | \(MS_\text{error}\) | |

Total | \(SS_\text{total}\) | \(N-1\) |

## 20.8 The `aov()`

function

As you would suspect, there is a built in R function to carry out ANOVA. This function is designated `aov()`

. `aov`

takes a formula style argument where the variable of interest is on the left, and the grouping variable indicated on the right.

`aov(variable.of.interest ~ grouping.variable, data = df)`

## 20.9 Example, circadian rythm data

Your textbook describes an examplar data set from a study designed to test the effects of light treatment on circadian rhythms (see Whitlock & Schluter, Example 15.1).

- The investigators randomly assigned 22 individuals to one of three treatment groups and measured phase shifts in melatonin production. The treatment groups were:
- control group (8 indiviuals)
- light applied on the back of the knee (7 individuals)
- light applied to the eyes (7 individuals)

These data are available at: ABD-circadian-rythms.csv

The null and alternative hypotheses associated with the ANOVA of these data are:

- \(H_0\): the means of the treatments groups are the same
- \(H_1\): the mean of at least one of the treatment groups is different from the others

**Libraries**

**Load the data**

The data is very simple: just two columns indicating treatment group and the variable of interest:

```
head(circadian, n=3)
#> # A tibble: 3 x 2
#> treatment shift
#> <chr> <dbl>
#> 1 control 0.53
#> 2 control 0.36
#> 3 control 0.2
```

### 20.9.1 Visualizing the data

As is our standard practice, let’s start by visualizing the data. We’ll create a point plot depicting the observations colored by treatment group.

```
# we're going to re-use our jittering across plots so
# assign it to a variable
pd <- position_jitter(width=0.2, height=0)
point.plot <-
circadian %>%
ggplot(aes(x=treatment, y=shift,
color=treatment, group=row.names(circadian))) +
geom_point(position = pd) +
ylim(-3,1)+
labs(x = "Treatment", y = "Phase shift (h)")
point.plot
```

### 20.9.2 Carrying out the ANOVA

From our visualization it certainly seems like there may be differences among the group (treatment) means. Let’s test this formally using the `aov()`

function:

The `summary`

function applied to the `aov`

fit will print out a typical ANOVA table and calculate the associated P-value for the \(F\) test statistic:

```
summary(circadian.aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> treatment 2 7.224 3.612 7.289 0.00447 **
#> Residuals 19 9.415 0.496
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

If you want the ANOVA table in a form you can compute with, the `broom::tidy`

function we explored previously comes in handy:

```
circadian.aov.table <- tidy(circadian.aov)
circadian.aov.table
#> # A tibble: 2 x 6
#> term df sumsq meansq statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 treatment 2 7.224 3.612 7.289 0.004472
#> 2 Residuals 19 9.415 0.4955 NA NA
```

The table above tells us that the \(F\)-statistic for this ANOVA is ~7.29. The table also tells us that the P-value associated with this F-statistic, is quite small, P-value < 0.005.

The P-value can be calculated explicitly using the `pf()`

function (similar to the `pnorm()`

and `pt()`

functions we’ve seen previously):

```
circadian.F.stat <- circadian.aov.table$statistic[1]
pf(circadian.F.stat, df1 = 2, df2 = 19, lower.tail = FALSE)
#> [1] 0.004472271
```

Note that we set `lower.tail = FALSE`

to calculate the probability of getting an F-statistic this large or greater.

**Visualizing the F-distribution**

Let’s draw the corresponding F-distribution, \(F_{2,19}\), using the `df()`

function (parallel to `dnorm()`

and `dt()`

, with the region corresponding to an F-statistic greater than 7.29 shaded red. Because the area in the right tail we’re interested in is quite small, in a second plot we’ve zoomed in on this region.

```
fdist <- data_frame(f = seq(0, 10, length.out = 250),
density = df(f, df1=2, df2=19))
plot.a <-
ggplot(fdist, aes(f, density)) +
geom_line() +
geom_area(data = filter(fdist, f >= 7.29), fill='red', alpha=0.25) +
labs(x = "F", y = "Density", title="F(2,19)")
plot.b <-
plot.a + xlim(5,10) + ylim(0, 0.02) +
labs(title = "Zoomed in view of right tail of F(2,19)")
plot_grid(plot.a, plot.b, labels="AUTO")
```

**Critical values of the F-distribution**

If we wanted to know what the critical F value is for a corresponding type I error rate we can use the `qf()`

function:

## 20.10 ANOVA calculations: Step-by-step

The `aov()`

function carries out all the ANOVA calculations behind the scenes. It’s useful to pull back the curtain and see how the various quantities are calculated.

**Total SS**

```
grand.mean <- mean(circadian$shift)
# total sum of squares
total.table <-
circadian %>%
summarize(SS = sum((shift - grand.mean)**2),
df = n() - 1)
total.table
#> # A tibble: 1 x 2
#> SS df
#> <dbl> <dbl>
#> 1 16.64 21
```

**Group SS and MS**

We use `group_by`

and `summarize`

to calculates group means and the group deviates (the difference between the group means and the grand mean):

```
group.df <-
circadian %>%
group_by(treatment) %>%
summarize(n = n(),
group.mean = mean(shift),
grand.mean = grand.mean,
group.deviates = group.mean - grand.mean)
group.df
#> # A tibble: 3 x 5
#> treatment n group.mean grand.mean group.deviates
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 control 8 -0.3088 -0.7127 0.4040
#> 2 eyes 7 -1.551 -0.7127 -0.8387
#> 3 knee 7 -0.3357 -0.7127 0.3770
```

Having calculated the group deviates, we calculate the group sum of squares and related quantities:

```
group.table <-
group.df %>%
summarize(SS = sum(n * group.deviates**2),
k = n(),
df = k-1,
MS = SS/df)
group.table
#> # A tibble: 1 x 4
#> SS k df MS
#> <dbl> <int> <dbl> <dbl>
#> 1 7.224 3 2 3.612
```

**Error SS and MS**

Next we turn to variation of the individual observations around the group means, which is the basis of the error sum of squares and mean square. Again we calculate this in two steps:

```
error.df <-
circadian %>%
group_by(treatment) %>%
mutate(group.mean = mean(shift),
error.deviates = shift - group.mean) %>%
summarize(SS = sum(error.deviates**2),
n = n())
error.df
#> # A tibble: 3 x 3
#> treatment SS n
#> <chr> <dbl> <int>
#> 1 control 2.670 8
#> 2 eyes 2.993 7
#> 3 knee 3.752 7
```

Now we calculate the error sum of squares and related quantities:

```
error.table <-
error.df %>%
summarize(SS = sum(SS),
k = n(),
N = sum(n),
df = N - k,
MS = SS/df)
error.table
#> # A tibble: 1 x 5
#> SS k N df MS
#> <dbl> <int> <int> <int> <dbl>
#> 1 9.415 3 22 19 0.4955
```

**Calculating the F-statistic and \(R^2\)**

Having calculated our estimates of between group variance and within group variance (\(MS_\text{group}\) and $MS_) we’re now ready to calculate the \(F\) test statistic.

The variation “explained” by the group is:

Thus about 43% of the total sum of squared deviation among subjects, with respect to the phase shift variable, is explained by differences in light treatment.

## 20.11 Visualizing the partitioning of sum-of-squares

**Total SS**

```
total.plot <-
circadian %>%
ggplot(aes(x=treatment, y=shift,
color=treatment, group=row.names(circadian))) +
geom_linerange(aes(ymin = grand.mean, ymax = shift), position = pd) +
geom_hline(yintercept = grand.mean, linetype='dashed') +
ylim(-3,1)+
labs(x = "Treatment", y = "Phase shift (h)",
title = "Deviation of observations around the grand mean (dashed line)") +
theme(plot.title = element_text(size=9))
total.plot
```

**Group SS**

Let’s visualize the difference of the group means from the grand mean:

```
group.plot <-
group.df %>%
ggplot(aes(x = treatment, y = group.mean, color=treatment)) +
geom_linerange(aes(ymin = grand.mean, ymax = group.mean), size=2) +
geom_point(size = 3, alpha = 0.25) +
geom_hline(yintercept = grand.mean, linetype='dashed') +
ylim(-3,1) +
labs(x = "Treatment", y = "Phase shift (h)",
title = "Deviation of group means around the grand mean") +
theme(plot.title = element_text(size=9))
group.plot
```

**Error SS**

We can visualize the individual deviates around the group means as so:

```
error.plot <-
circadian %>%
group_by(treatment) %>%
mutate(group.mean = mean(shift)) %>%
ggplot(aes(x = treatment, y = shift, color = treatment)) +
geom_point(aes(y = group.mean),size=3,alpha=0.1) +
geom_linerange(aes(ymin = group.mean, ymax = shift), position = pd) +
ylim(-3,1) +
labs(x = "Treatment", y = "Phase shift (h)",
title = "Deviation of observations around the groups means") +
theme(plot.title = element_text(size=9))
error.plot
```

**Combined visualization**

We can combine our three plots created above into a single figure using `cowplot::plot_grid`

:

## 20.12 Which pairs of group means are different?

If an ANOVA indicates that at least one of the group means is different than the others, the next question is usually “which pairs are different?”. There are slighly different tests for what are called “planned” versus “unplanned” comparisons. Your textbook discusses the differences between these two types

Here we focus on a common test for unplanned comparisons, called the Tukey Honest Significant Differences test (referred to as the Tukey-Kramer test in your textbook). The Tukey HSD test controls for the “family-wise error rate”, meaning it tries to keep the overall false positive (Type I error) rate at a specified value.

### 20.12.1 Tukey-Kramer test

The function `TukeyHSD`

implements the Tukey-Kramer test. The input to `TukeyHSD`

is the fit from `aov`

:

```
TukeyHSD(circadian.aov)
#> Tukey multiple comparisons of means
#> 95% family-wise confidence level
#>
#> Fit: aov(formula = shift ~ treatment, data = circadian)
#>
#> $treatment
#> diff lwr upr p adj
#> eyes-control -1.24267857 -2.1682364 -0.3171207 0.0078656
#> knee-control -0.02696429 -0.9525222 0.8985936 0.9969851
#> knee-eyes 1.21571429 0.2598022 2.1716263 0.0116776
```

Here again, the `broom::tidy`

function comes in handy:

```
tidy(TukeyHSD(circadian.aov))
#> # A tibble: 3 x 6
#> term comparison estimate conf.low conf.high adj.p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 treatment eyes-control -1.243 -2.168 -0.3171 0.007866
#> 2 treatment knee-control -0.02696 -0.9525 0.8986 0.9970
#> 3 treatment knee-eyes 1.216 0.2598 2.172 0.01168
```

The Tukey HSD test by default give us 95% confidence intervals for the differences in means between each pair of groups, and an associated P-value for the null hypothesis of equal means between pairs. Interpretting the results above, we see that we fail to reject the null hypothesis of equal means for the knee and control treatment groups (i.e. we have no statistical support to conclude they are different). However, we reject the null hypothesis for equality of means between control and eye treatments and between knee and eye treatements. We have evidence that light treatments applied to the eye cause a mean negative shift in the phase of melatonin production relative to control and knee treatment groups.

## 20.13 Repeatability

Nearly every measure of a continuous measurement or assay we apply in biology (and other sicences) has associated with it some

**measurement error**.Measurement error is usually not “biological variation” of interest, but rather “technical variation” associated with our ability to measure quantities of interest precisely

**Example**: Have three people independently measure the length of a human femur to the nearest millimeter. Unless the values are aggressively rounded, there is a high likelihood that you’ll get three different values.

### Estimating Repeatability using ANOVA

ANOVA can be used to estimate the **Repeatability** of a measure, which provides a way to quantify how much of the variance we observe between individuals is due to measurement error.

**Estimating Repeatability: Experimental steps**

- Repeatedly, but independently, measure the same variable in the same individual or other unit of observation
- Carry out the same repeated measurements across individuals

**Estimating Repeatability: Statistical steps**

- Calculate \(MS_\text{groups}\) and \(MS_\text{error}\) where individuals are the grouping unit.
- Estimate the variance among groups, \(s_A^2\) as: \[ s_A^2 = \frac{MS_\text{groups} - MS_\text{error}}{n} \]

where \(n\) is the number of replicate measures per individual.

Estimate the repeatability as: \[ \text{Repeatability} = \frac{s_A^2}{s_A^2 + MS_\text{error}} \]

\(0 \leq \text{Repeatability} \leq 1\); values near zero indicate nearly all variance is due to measurement error, values near one indicate small fraction of variance due to measurement error

### Repeatability: Walking stick example

Nosil and Crespi measured various morphological features of walking stick insects from digital photographs. For each specimen they took two independent photographs and measured femur length (in cm) on each photograph.

```
by.specimen <-
walking.sticks %>%
group_by(specimen) %>%
summarize(min.fl = min(femurLength), max.fl = max(femurLength), avg.fl = 0.5*(min.fl + max.fl)) %>%
arrange(avg.fl) %>%
ungroup() %>%
mutate(rank.fl = rank(avg.fl, ties.method = "first"))
ggplot(by.specimen, aes(x = rank.fl)) +
geom_point(aes(y = min.fl), color="firebrick") +
geom_point(aes(y = max.fl + 0.0025), color="firebrick") +
geom_linerange(aes(ymin = min.fl, ymax = max.fl + 0.0025)) +
labs(x = "Individual walking sticks", y = "Femur length (cm)") +
theme_classic()+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
```

First we carry out the ANOVA in the standard way, using the specimen variable as the grouping variable:

```
sticks.aov <- aov(femurLength ~ as.factor(specimen),
data = walking.sticks)
sticks.table <- tidy(sticks.aov)
sticks.table
#> # A tibble: 2 x 6
#> term df sumsq meansq statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 as.factor(specimen) 24 0.05913 0.002464 6.921 0.000004077
#> 2 Residuals 25 0.008900 0.0003560 NA NA
```

From the ANOVA table, use \(MS_\text{group}\) and \(MS_\text{error}\) to calculate \(s_A\):

```
repeatability.table <- data_frame(
MS.groups = sticks.table$meansq[1],
MS.error = sticks.table$meansq[2],
sA = (MS.groups - MS.error)/2,
repeatability = sA/(sA + MS.error))
repeatability.table
#> # A tibble: 1 x 4
#> MS.groups MS.error sA repeatability
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.002464 0.0003560 0.001054 0.7475
```

From the table we calculated:

- \(s_A^2 =\) 0.0010539
- \(\text{Repeatability} =\) 0.7475028

A repeatability of 0.75 indicates that we would estimate that ~75% of the total variance in femure length measurements in our population of walking stick insects is the result of true differences in femur length between individuals, while roughly 25% of the variance is attributable to measurement error.