+ - 0:00:00
Notes for current slide
Notes for next slide



Bayesian Statistics and Computing

Lecture 1: R Basics

Yanfei Kang | BSC | Beihang University

1 / 32

Objectives

  • Overview of R

  • R nuts and bolts

  • Getting data in and out of R

  • Subsetting R objects

2 / 32

Overview of R

3 / 32

What is R?

  • A freely available language and environment.

  • Statistical computing and graphics.

  • Linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.

4 / 32

Installation

5 / 32

Why Rstudio?

  • Syntax highlighting

  • Able to evaluate R code

    • by line
    • by selection
    • entire file
  • Command auto-completion

6 / 32

Design of the R System

  • When you download R from CRAN, you get the "base" system - a substantial amount of functionality.

  • 10,000 packages on CRAN that have been developed by users and programmers around the world.

  • People often make packages available on their personal websites.

  • There are a number of packages being developed on repositories like GitHub and BitBucket.

7 / 32

R Nuts and Bolts

8 / 32

Basic Operations

1 + 2 + 3
1 + 2 * 3
x <- 1
y <- 2
z <- c(x,y)
z
exp(1)
cos(3.141593)
log2(1)
9 / 32

R Objects

R has five basic classes of objects:

  1. character
  2. numeric (real numbers)
  3. integer
  4. complex
  5. logical (True/False)
10 / 32

Numbers

  • Numbers in R are generally treated as numeric objects.

  • Difference of 1 and 1L?

  • Special number Inf. Try 1/Inf.

  • NaN: an undefined value (not a number). Try 0/0. It can also be thought of as a missing value.

11 / 32

Attributes

Attributes can be accessed by attributes(). Some examples of R object attributes are:

  • names, dimnames
  • dimensions (e.g. matrices, arrays)
  • class (e.g. integer, numeric)
  • length
12 / 32

Vectors

The c() function can be used to create vectors of objects by concatenating things together.

x <- c(0.5, 0.6) # numeric
x <- c(TRUE, FALSE) # logical
x <- c(T, F) # logical
x <- c("a", "b", "c") # character
x <- 9:29 # integer
x <- c(1+0i, 2+4i) # complex

You can also use the vector() function to initialize vectors.

x <- vector("numeric", length = 10)
x
#> [1] 0 0 0 0 0 0 0 0 0 0
13 / 32

Matrices

Matrices are vectors with a dimension attribute.

m <- matrix(c(1:6), 2, 3)
attributes(m)
dim(m)
t(m)
m[1, 2]
m[1, ]
n <- matrix(c(8:13), 2, 3)
cbind(m, n)
rbind(m, n)
14 / 32

Lists

  • Special data structure that matrix could not handle.
    • Data length are not the same.
    • Data type are not the same.
l <- list(a = c(1, 2), b = 'apple')
attributes(l)
#> $names
#> [1] "a" "b"
15 / 32

Factors

Factors are used to represent categorical data.

f <- factor(c("yes", "yes", "no", "yes", "no"))
attributes(f)
#> $levels
#> [1] "no" "yes"
#>
#> $class
#> [1] "factor"
16 / 32

Data Frames

  • A special type of list.
  • Unlike matrices -- data frames can store different classes of objects in each column.
  • They have column names and row names.
d <- data.frame(x = 1:10, y = letters[1:10])
attributes(d)
names(d)
row.names(d)
17 / 32

Getting Data in and out of R

18 / 32

Reading Data

There are a few principal functions reading data into R.

  • read.table, read.csv, for reading tabular data
  • readLines, for reading lines of a text file
  • source, for reading in R code files (inverse of dump)
  • dget, for reading in R code files (inverse of dput)
  • load, for reading in saved workspaces
19 / 32

Writing Data

There are analogous functions for writing data to files.

  • write.table, for writing tabular data to text files (i.e. CSV) or connections
  • writeLines, for writing character data line-by-line to a file or connection
  • dump, for dumping a textual representation of multiple R objects
  • dput, for outputting a textual representation of an R object
  • save, for saving an arbitrary number of R objects in binary format (possibly compressed) to a files

There are many R packages that have been developed to read in all kinds of other datasets (e.g., the readr package).

20 / 32

Subsetting R objects

21 / 32

How to Subset?

There are three operators that can be used to extract subsets of R objects.

  • The [ operator always returns an object of the same class as the original. It can be used to select multiple elements of an object

  • The [[ operator is used to extract elements of a list or a data frame. It can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame.

  • The $ operator is used to extract elements of a list or data frame by literal name. Its semantics are similar to that of [[.

22 / 32

Subsetting a Vector

Vectors are basic objects in R and they can be subsetted using the [ operator.

x <- c("a", "b", "c", "c", "d", "a")
x[1] # Extract the first element
x[2] # Extract the second element

The [ operator can be used to extract multiple elements of a vector by passing the operator an integer sequence. Here we extract the first four elements of the vector.

x[1:4]
x[c(1, 3, 4)]
x[x > 2]
23 / 32

Subsetting a Matrix

Matrices can be subsetted in the usual way with (i, j) type indices.

x <- matrix(1:6, 2, 3)
x

We can access the (1,2) or the (2,1) element of this matrix using the appropriate indices.

x[1, 2]
x[2, 1]

Indices can also be missing. This behavior is used to access entire rows or columns of a matrix.

x[1, ] # Extract the first row
x[, 2] # Extract the second column
24 / 32

Subsetting Lists

Lists in R can be subsetted using all three of the operators mentioned above, and all three are used for different purposes.

x <- list(foo = 1:4, bar = 0.6)

The [[ operator can be used to extract single elements from a list. Here we extract the first element of the list.

x[[1]]

The [[ operator can also use named indices so that you don't have to remember the exact ordering of every element of the list. You can also use the $ operator to extract elements by name.

x[["bar"]]
x$bar
25 / 32

Subsetting Nested Elements of a List

The [[ operator can take an integer sequence if you want to extract a nested element of a list.

x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))
# Get the 3rd element of the 1st element
x[[c(1, 3)]]
# Same as above
x[[1]][[3]]
# 1st element of the 2nd element
x[[c(2, 1)]]
26 / 32

Extracting Multiple Elements of a List

The [ operator can be used to extract multiple elements from a list.

x <- list(foo = 1:4, bar = 0.6, baz = "hello")
x[c(1, 3)]

Note that x[c(1, 3)] is NOT the same as x[[c(1, 3)]].

Remember that the [ operator always returns an object of the same class as the original. Since the original object was a list, the [ operator returns a list. In the above code, we returned a list with two elements (the first and the third).

27 / 32

Removing NA Values

A common task in data analysis is removing missing values (NAs).

x <- c(1, 2, NA, 4, NA, 5)
bad <- is.na(x)
print(bad)
x[!bad]

What if there are multiple R objects and you want to take the subset with no missing values in any of those objects?

head(airquality)
good <- complete.cases(airquality)
head(airquality[good, ])
28 / 32

Review of this lecture

  • Overview of R

  • R nuts and bolts

  • Getting data in and out of R

  • Subsetting R objects

29 / 32

Lab Session 1

30 / 32

Read and Write Data in R

You'll be working with swimming_pools.csv; it contains data on swimming pools in Brisbane, Australia (Source: data.gov.au). The file contains the column names in the first row. It uses a comma to separate values within rows.

  1. Try read.csv() and read.table() to import "swimming_pools.csv" as a data frame with the name pools.

  2. Try write.table(), dput(), and save() functions to write pools to files.

  3. Restart R and read your saved data in R.

  4. Practice subsetting of a data frame.

31 / 32

References

32 / 32

Objectives

  • Overview of R

  • R nuts and bolts

  • Getting data in and out of R

  • Subsetting R objects

2 / 32
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow