**We have already known that Student’s *t-*distribution and normal distribution are similar but t-distribution is fatter in tails. This has important implications in calculating confidence interval (estimating true mean) and hypothesis testing.

  1. Similar for the normal distribution, R has in-built functions for t-distribution, which are dt(), pt(), qt(), rt().
    1. By using the appropriate function(s), calculate the area under the curve of the t-distribution ($df=4$) between -2.78 and 2.78.
    2. By using the appropriate function(s), calculate the area under the curve of the normal distribution between -1.95 and 1.95.
    3. What are the implications between (a) and (b) on the uncertainty about the true standard error between t- and normal distribution?
  2. Now we work the other way round. What are the values that bound 95% of the area under the curve of the t-distribution ($df=4$)? We will implement a trial-and-error approach using a for-loop.
    1. Compute the values from pt(0, 4), pt(1, 4), pt(2, 4), pt(3, 4) ... pt(+inf, 4) (which will approach 1, but please don’t go on indefinitely). From this approach, can you narrow down what the values that bound 95% of the area under the curve might be?
    2. This forms the logic behind the for-loop. Now, build a for loop, which iterates the pt(q, 4) process where q (the t-value) is between 0 and 3 and increments by 0.01. In each iteration, store the value to a variable v.
    3. Set a if function within the for-loop with the condition that if v is still smaller than our target value 0.975 (not 0.95, why?). Copy the value of q in the current iteration to a new variable w.
    4. Set another if function within the for-loop with the condition that if v is larger than our target value 0.975. Copt the value of q in the current iteration to a new variable x. Print the values of both w and x. Break the entire for-loop by using break.
    5. Do you get an estimate close to the values in 1(a)?
    6. By making the increment of the for-loop smaller, you can get even more accurate estimate.
    7. ***** (Challenge)** Obviously, the for-loop will waste too much time trying q that are completely far from the true value, when the increment becomes very small (e.g. 10 decimal places). How can we better optimise this calculation?

Now, we will move on to some real-life example which is approximately normally distributed and investigate their parameters.

  1. We will generate a dummy sample containing the weight of 50 mice using the code below. We want to know, if the average weight of the mice differs from 25 g.

    1. The code to generate the dummy sample is:

      set.seed(1234)
      df <- data.frame(
        name = paste0(rep("M_", 20), 1:20),
        weight = round(rnorm(20, 20, 2), 1)
      )
      
    2. Try View(), head() and tail()on df. What do these functions do? What does our sample look like?

    3. Apply summary() on the weight vector in the data frame df.

    4. Visualise the data using boxplot(). Add an appropriate title and axis labels to the figure. Does the sample look like the population might deviate from normality?

    5. Now, conduct a one-sample two-tailed t-test using the function t.test(). To set the value of the population mean, use the argument mu = 25. Store this to an object res.

    6. print(res) (or simply res). Look for the t-statistic, the degree of freedom, the p-value, the 95% confidence interval of the sample mean, and the value of the sample mean.

    7. These values are stored as attributes of res. You can return these values by res$statistic, res$p.value, res$conf.int, res$estimate.

  2. Without external cues such as the sun, people attempting to walk in a straight line tend to walk in circles. One idea is that most individuals have a tendency to turn in one direction because of internal physiological asymmetries, or because of differences between legs in length or strength. Researchers tested for a directional tendency by blindfolding 15 participants in a large field and asking them to walk in a straight line. The data contain the median change in direction (turning angle) of each of the 15 participants measured in degrees per second. A negative angle refers to a left turn, whereas a positive number indicates a right turn.

    P4Q4

    1. Draw a graph showing the frequency distribution of the data. Is a trend in the mean angle suggested?
    2. Do people tend to turn in one direction (e.g., left) more on average than the other direction (e.g., right)? Test whether the mean angle differs from zero.
    3. Based on your results in b, is the following statement justified? “People do not have a tendency to turn more in one direction, on average, than the other direction.” Explain.