- Recall that for estimating and testing population means it is assumed that the variable has an approximately normal distribution
- Recall the two-sample t-test requires that the standard deviations (and variances) are homogenous
Detecting deviations from normality
- Histograms (frequency distributions) can be a useful tool to quickly visualise the shape of the distribution

- Normal quantile plot (Q-Q plot) is sometimes more powerful
- Compares each observation in the sample with the corresponding quantile expected from the standard normal distribution
- If the sample is normally distributed, the points will follow a straight line roughly





Formal test of normality
- Shapiro-Wilk test is used to test departures from normality
- $H_0$: Data are sampled from a population having a normal distribution
- $H_1$: Data are sampled from a population not having a normal distribution
- First estimates the mean and standard deviation of the population using the sample
- Then tests the goodness of fit to the data of the normal distribution having this same mean and standard deviation
Ignoring violations
- Feasible when we know that a method for estimating and testing means are not highly sensitive to violations of the assumptions of normality and equality of standard deviations
- Usually because of the central limit theorem
- Also not sensitive when comparing two groups with similar distribution, even deviated from normality
- Frequency distribution with extreme outliers cannot be used
- Two sample t-test will still hold with even a threefold difference in standard deviation, as long as the sample sizes are approximately equal
- Otherwise Welch’s t-test is a much better alternative
Data transformation