R {base} has four built-in functions to generate normal distribution. We will first try these functions to explore the properties of a normal distribution.

  1. The function dnorm() gives height of the probability distribution at each point for a given mean and standard deviation, i.e. a PDF.
    1. By using seq(), create a vector called x which contains a sequence of numbers between -10 and 10 incrementing by 0.2.
    2. By using dnorm(), create a vector called y which contains a series of probability density values along the vector x, with a mean of 2.5 and a standard deviation of 0.5.
    3. Plot the PDF from x and y.
  2. The function pnorm() gives the probability of a normally distributed random number to be less than the value of a given name, i.e. a CDF.
    1. Create a vector called x which contains a sequence of numbers between -10 and 10 incrementing by 0.2.
    2. By using pnorm(), create a vector called y which contains a series of cumulative distribution probability along the vector x, with a mean of 2.5 and a standard deviation of 0.5.
    3. Plot the CDF from x and y.
  3. The function qnorm() takes the probability value and gives a number whose cumulative value matches the probability value.
    1. What is the minimum and the maximum of probability values? Create a vector called x which contains a sequence of values from the minimum to the maximum, incrementing by 0.02.
    2. By using qnorm(), choose the mean as 2 and standard deviation as 3, and store this vector to y.
    3. Plot the graph from x and y.
    4. Now, what is the value $x$ that defines $P(X <x)=0.85$, meaning 85% of the values in this population will lie below this value $x$. You can either get the answer from the plot in 3c or use qnorm(0.85, mean = 2, sd = 3) to find.
  4. The function rnorm() can be used to generate random numbers whose distribution is normal.
    1. By using rnorm(), create a vector called x which contains a sample of 50 numbers which are normally distributed.
    2. Plot a histogram for x. Choose a smaller bin size to visualise the distribution better.
    3. Use a Shapiro test to test whether this sample is normally distributed.
    4. Find the $\bar{X}$ and $s$ of x.
    5. What is the value of $\bar{X}-2s$ and $\bar{X}+2s$?
    6. By using density(), create a vector called d which contains the probability density values for x. Plot d.
    7. What does $Pr(\bar{X}-2s<x<\bar{X}+2s)$ mean mathematically? How can we compute its value?
      • Answer:

We will examine some real-world data that is roughly normal distributed and understand the properties of a normal distribution.

  1. The following data contain measurements of diameter growth rate of the tropical tree Dipteryx panamensis from a long-term study at La Selva, Costa Rica. The data are log-transformed, with the original units in milimetres.

    P3Q5.csv

    1. Load the data into R using read.csv(), make sure the argument header is set to TRUE.
    2. By using plot(), plot the data. Add an appropriate title and axis labels.
    3. What are the mean, median, and mode of the data set?
    4. What are the standard deviation, variance, and standard error?
    5. Examine the graph visually. Do you think the hypothesis that “the data are sampled from a population having a normal distribution” is true or not?
    6. By using shapiro.test(), test the above hypothesis statistically.
  2. The crab spider sits on flowers and preys upon visiting honeybees. Do honeybees distinguish between flowers that have crab spiders and flowers that do not? To test this, researchers gave 33 bees a choice between two flowers: one had a crab spider and the other did not. In 24 of the 33 trials, the bees picked the flower that had the spider. In the remaining 9 trials.

    P3Q6

    1. What do you think the distribution of the data set could be?
    2. By using a binomial simulation of rbinom(1000, 33, 24/33), plot the histogram of 1,000 trials using hist().
    3. By applying a normal approximation, what would be the mean and standard deviation?
    4. By using rnorm(), simulate a normal distribution with 1,000 observations with the appropriate normal approximation from 6c to the spider data.
    5. By using hist(), plot the normal distribution. Do 6b and 6e look similar?