t-distribution for sample means
- Recall that a population distribution $Y$, after standardisation, is normally distributed, where $Y\sim\mathcal{N}(0,1^2)$
- Recall that $\bar Y\sim\mathcal{N}(\mu,(\frac{\sigma}{\sqrt n})^2)$ for drawing indefinite samples, each with size $n$, from $Y$
- Recall that standard error of sample means $\sigma_{\bar Y }=\frac{\sigma}{\sqrt n}$ theoretically
- However, with real data, Z-standardisation is hardly justified to $\bar Y$: $\sigma_{\bar Y}$ is always unknown, the estimate of the standard error of sample means is ****$SE_{\bar Y}=\frac{s}{\sqrt n}$, where $s$ is an estimate of the standard deviation of sample means
- $SE_{\bar Y}$ is a variable, $s$ varies by chance from sample to sample
- Therefore, standardising sample mean $\bar Y$ will not give a normal distribution, instead a Student’s t-distribution with $n-1$ degrees of freedom:
$$
t=\frac{\bar Y - \mu}{SE_{\bar Y}}
$$

Plot of the density function for several members of the Student’s t-family characterised by different degrees of freedom
- *t-*distribution resembles standard normal distribution $Z$ such as symmetric around a mean of 0 and roughly bell-shaped, with an important distinction: fatter in the tails towards $\pm\infin$
- Confidence intervals and hypothesis testing relies on the tails!
Confidence interval for the mean of a normal population
-
Similar to a standard normal distribution, t-distributions (with different degrees of freedom) have its own PDF and CDF
-
A confidence interval is the probability that a population parameter will fall between a set of values for a certain proportion of times, which is the reverse of critical value $\alpha$ (i.e. $1-\alpha$).
-
A 95% confidence interval for the mean of a normal population $\mu$ can be defined from the t-values which bound 95% of the area under the curve of the *t-*distribution for sample means $\bar Y$
- Those t-values are also then called the 5% critical t-value, $\alpha=0.05$
- Notated as $t_{\alpha(2),df}$, the $(2)$ indicates that the 5% area is divided between the two tails of the t-distribution
- Confidence interval = $\bar Y\pm t_{\alpha,df}SE_{\bar Y}$
![$t_{0.05(2),4}=\pm2.78$, as 95% of the area under the curve for this *t-*distribution is bounded by [-2.78, 2.78].](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/789f9e08-3bd2-4fcd-b1d7-1429f51829f5/Untitled.png)
$t_{0.05(2),4}=\pm2.78$, as 95% of the area under the curve for this *t-*distribution is bounded by [-2.78, 2.78].
- $\alpha$ is usually 0.05 which defines the 95% confidence interval, but it can be any arbitrary number
- It can never be known that if $\mu$ must fall within those critical t-values
- It only implies that in 95% of the samples from the population, the sample mean will fall within the confidence interval