Why not analysis-by-twos?
- Example: group 1 v. group 2, group 2 v. group 3, group 3 v. group 1
- Testing multiple pairs of means inflates the probability of committing at least one Type I error
- Escalates quickly with more increasing number of groups
- “Multiple testing problem”
- At $\alpha = 0.05$ critical value, you still have a 5% chance for rejecting the null hypothesis even it is true (false positive)
- Assuming multiple tests are independent, an example of 10 comparisons will give 2 false rejections of null hypothesis
- Genomic or bioinformatic analyses usually run 10,000 separate tests for example
Analysis of variance
- $H_0$: Population means $\mu_i$ are the same for all groups
- $H_1$: Population means $\mu_i$ are not the same for all groups (at least one is different from the others)
- Even $H_0$ is true, sample means $\bar Y_i$ still differ from each other solely because of sampling error
- Taking a random sample from each population is equivalent to taking the same number of samples from a single population
- If $H_0$ is not true, variation in sample means attributable to chance still exists, but there is an additional component caused by real variation among population means

Mean squares
- Group mean square ($\text{MS}_\text{groups}$) proportional to observed amount of variation among group sample means
- Error mean square ($\text{MS}_\text{error}$) estimates variance among individuals that belong to the same group
- Tested with an $F$-ratio $F=\text{MS}\text{groups} /\text{MS}\text{error}$
- If $H_0$ is true, $\text{MS}\text{groups} =\text{MS}\text{error}$, except by chance, thus $F=1$
- If $H_0$ is rejected, $\text{MS}\text{groups} >\text{MS}\text{error}$, thus $F>1$
Partitioning the sum of squares
$$
Y_{ij}-\bar Y=(Y_{ij}-\bar Y_i)+(\bar Y_i-\bar Y)
$$