Getting started with R (and RStudio)

  1. In your favourite operating system (e.g. Linux, Windows, MacOS), download and install the latest stable version of R: The R Project for Statistical Computing.
  2. Download and install the RStudio - Posit Desktop version (free) if you are using an operating system with graphical user interface (e.g. Windows and MacOS).
  3. Open RStudio. Click on File > New Project > New Directory, and choose the top directory you want the RStudio to create a directory (and a Project) in.

Untitled

  1. You will enter a new session, with the working directory set as the directory your R Project is in you specified above. The left panel is the Console, where you can input your commands and view the responses. The bottom right panel is your current working directory (it is empty there now).

    Untitled

  2. Look at the Console, the > is called a prompt. It is R’s way of inviting you to give instructions. You can check again your current working directory is correct by inputting getwd(), then hit return. (Uh oh, I typed a wrong character in the first attempt and it showed an error.)

    Untitled

  3. It is important to save your code so that you can reproduce, re-run, or modify your analysis easily. File > New File > New R Script. You now have a unsaved R script file. File > Save As, and save this R script as hello.R. And of course, now you can see this file in the working directory.

    Untitled

  4. getwd() in 5 is what we called a function. It takes in input and returns an output. getwd() is a special case because the input is intrinsic (the information of working directory is already stored in the program). Next we will try on some simple functions:

    1. sqrt(4)
    2. (1/(sqrt(2 * pi * (3.1)^2))) * exp(-(12-10.7)^2/(2*3.1))
    3. log(4, base = 10)
  5. In your hello.R, type the following and click Source

    # print("Hello Earth")
    # print("Hello Mars")
    print("Hello World")
    

    Only the “Hello World” is returned in the console. # works as commenting: any line starting with # will be ignored by R.

  6. Defining objects is important as we want to store values or data and retrieve them later. We use the syntax <- or =. I tend to use <- when assigning objects, and = when specifying a value for an argument in the function, but these two operators are completely identical.

    1. x <- 4
    2. print(x)
    3. x == 4
    4. x <- 32
    5. print(x)
    6. x == 4
  7. Next we will take on something slightly bizarre. Values with decimal places are known as floating points in computer science.

    1. x <- 0.1 + 0.2
    2. x == 0.3

    Either you just made a new breakthrough in mathematics or there is something wrong intrinsically. Take a look:

    Basic Answers

    Now, try all.equal(x, 0.3). What does it do? Input ?all.equal and you can access the help page of this function and read about it.

Vectors and data frames

  1. A vector is a set of numbers, for example c(63, 93, 27, 90, 18, 3, 48) is a vector containing 7 numbers. Many functions can take in both a vector and a number.

    1. multiples_of_three <- c(63, 93, 27, 90, 18, 3, 48)
    2. 63 / 3
    3. multiples_of_three / 3
    4. sqrt(multiples_of_three)
    5. print(multiples_of_three)
  2. If two vectors have the same length, they usually can be operated together.

    1. multiples_of_two <- c(10, 98, 46, 36, 22, 68, 2)
    2. multiples_of_three - multiples_of_two
    3. abs(multiples_of_three - multiples_of_two)
  3. Each member (sometimes not a number) in a vector is an element. They are indexed and can be recalled.

    1. multiples_of_three[3]
    2. multiples_of_three[3:7]
    3. multiples_of_three[c(3, 4, 5, 6, 7)]
    4. multiples_of_three[3:length(multiples_of_three)
  4. Vector is a 1D-like object, while data frame is a 2D-like object. We can create a data frame by hand:

    1. name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
    2. age <- c(23, 41, 32, 58, 26)
    3. df <- data.frame(name, age)
    4. Using View(), head(), tail(), summary(), investigate this data frame df.
  5. You can also import a data frame from a comma-separate values (.csv) files

    P1Q15

    1. Download and move this file to your current working directory. You should be able to see the file in the Files panel.
    2. titanic <- read.csv("P1Q15.csv")
    3. Using View(), head(), tail(), summary(), investigate this data frame titanic.
    4. To show a list of the age of all the passengers on Titanic, titanic$age will grab and return the age column, which is also a vector.