- Overview of R
- R nuts and bolts
- Getting data in and out of R
- Subsetting R objects
School of Economics and Management
Beihang University
http://yanfei.site
Why Rstudio?
10,000 packages on CRAN that have been developed by users and programmers around the world.
1 + 2 + 3 ## [1] 6 1 + 2 * 3 ## [1] 7 x <- 1 y <- 2 z <- c(x, y) z ## [1] 1 2 exp(1) ## [1] 2.718282 cos(3.141593) ## [1] -1 log2(1) ## [1] 0
R has five basic classes of objects:
1
and 1L
?Inf
. Try 1/Inf
.NaN
: an undefined value (not a number). Try 0/0
. It can also be thought of as a missing value.Attributes can be accessed by attributes()
. Some examples of R object attributes are:
The c()
function can be used to create vectors of objects by concatenating things together.
x <- c(0.5, 0.6) ## numeric x <- c(TRUE, FALSE) ## logical x <- c(T, F) ## logical x <- c("a", "b", "c") ## character x <- 9:29 ## integer x <- c(1 + (0+0i), 2 + (0+4i)) ## complex
You can also use the vector()
function to initialize vectors.
x <- vector("numeric", length = 10) x ## [1] 0 0 0 0 0 0 0 0 0 0
m <- matrix(c(1:6), 2, 3) attributes(m) ## $dim ## [1] 2 3 dim(m) ## [1] 2 3 t(m) ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ## [3,] 5 6 m[1, 2] ## [1] 3 m[1, ] ## [1] 1 3 5 n <- matrix(c(8:13), 2, 3) cbind(m, n) ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1 3 5 8 10 12 ## [2,] 2 4 6 9 11 13 rbind(m, n) ## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6 ## [3,] 8 10 12 ## [4,] 9 11 13
l <- list(a = c(1, 2), b = "apple") attributes(l) ## $names ## [1] "a" "b"
Factors are used to represent categorical data.
f <- factor(c("yes", "yes", "no", "yes", "no")) attributes(f) ## $levels ## [1] "no" "yes" ## ## $class ## [1] "factor"
d <- data.frame(x = 1:10, y = letters[1:10]) attributes(d) ## $names ## [1] "x" "y" ## ## $class ## [1] "data.frame" ## ## $row.names ## [1] 1 2 3 4 5 6 7 8 9 10 names(d) ## [1] "x" "y" row.names(d) ## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
Names are very useful for writing readable code and self-describing objects.
x <- 1:3 names(x) ## NULL names(x) <- c("New York", "Seattle", "Los Angeles") x ## New York Seattle Los Angeles ## 1 2 3 names(x) ## [1] "New York" "Seattle" "Los Angeles"
Lists can also have names, which is often very useful.
x <- list(`Los Angeles` = 1, Boston = 2, London = 3) x ## $`Los Angeles` ## [1] 1 ## ## $Boston ## [1] 2 ## ## $London ## [1] 3 names(x) ## [1] "Los Angeles" "Boston" "London"
There are a few principal functions reading data into R.
read.table
, read.csv
, for reading tabular datareadLines
, for reading lines of a text filesource
, for reading in R code files (inverse
of dump
)dget
, for reading in R code files (inverse
of dput
)load
, for reading in saved workspacesThere are analogous functions for writing data to files.
write.table
, for writing tabular data to text files (i.e. CSV) or connectionswriteLines
, for writing character data line-by-line to a file or connectiondump
, for dumping a textual representation of multiple R objectsdput
, for outputting a textual representation of an R objectsave
, for saving an arbitrary number of R objects in binary format (possibly compressed) to a filesThere are many R packages that have been developed to read in all kinds of other datasets (e.g., the readr
package).
There are three operators that can be used to extract subsets of R objects.
The [
operator always returns an object of the same class as the original. It can be used to select multiple elements of an object
The [[
operator is used to extract elements of a list or a data frame. It can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame.
The $
operator is used to extract elements of a list or data frame by literal name. Its semantics are similar to that of [[
.
Vectors are basic objects in R and they can be subsetted using the [
operator.
x <- c("a", "b", "c", "c", "d", "a") x[1] ## Extract the first element ## [1] "a" x[2] ## Extract the second element ## [1] "b"
The [
operator can be used to extract multiple elements of a vector by passing the operator an integer sequence. Here we extract the first four elements of the vector.
x[1:4] ## [1] "a" "b" "c" "c" x[c(1, 3, 4)] ## [1] "a" "c" "c" x[x > 2] ## [1] "a" "b" "c" "c" "d" "a"
Matrices can be subsetted in the usual way with (i,j) type indices.
x <- matrix(1:6, 2, 3) x ## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6
We can access the \((1,2)\) or the \((2,1)\) element of this matrix using the appropriate indices.
x[1, 2] ## [1] 3 x[2, 1] ## [1] 2
Indices can also be missing. This behavior is used to access entire rows or columns of a matrix.
x[1, ] ## Extract the first row ## [1] 1 3 5 x[, 2] ## Extract the second column ## [1] 3 4
ists in R can be subsetted using all three of the operators mentioned above, and all three are used for different purposes.
x <- list(foo = 1:4, bar = 0.6) x ## $foo ## [1] 1 2 3 4 ## ## $bar ## [1] 0.6
The [[
operator can be used to extract single elements from a list. Here we extract the first element of the list.
x[[1]] ## [1] 1 2 3 4
The [[
operator can also use named indices so that you don't have to remember the exact ordering of every element of the list. You can also use the $
operator to extract elements by name.
x[["bar"]] ## [1] 0.6 x$bar ## [1] 0.6
The [[
operator can take an integer sequence if you want to extract a nested element of a list.
x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) ## Get the 3rd element of the 1st element x[[c(1, 3)]] ## [1] 14 ## Same as above x[[1]][[3]] ## [1] 14 ## 1st element of the 2nd element x[[c(2, 1)]] ## [1] 3.14
The [
operator can be used to extract multiple elements from a list. For example, if you wanted to extract the first and third elements of a list, you would do the following
x <- list(foo = 1:4, bar = 0.6, baz = "hello") x[c(1, 3)] ## $foo ## [1] 1 2 3 4 ## ## $baz ## [1] "hello"
Note that x[c(1, 3)]
is NOT the same as x[[c(1, 3)]]
.
Remember that the [
operator always returns an object of the same class as the original. Since the original object was a list, the [
operator returns a list. In the above code, we returned a list with two elements (the first and the third).
A common task in data analysis is removing missing values (NA
s).
x <- c(1, 2, NA, 4, NA, 5) bad <- is.na(x) print(bad) ## [1] FALSE FALSE TRUE FALSE TRUE FALSE x[!bad] ## [1] 1 2 4 5
What if there are multiple R objects and you want to take the subset with no missing values in any of those objects?
head(airquality) ## Ozone Solar.R Wind Temp Month Day ## 1 41 190 7.4 67 5 1 ## 2 36 118 8.0 72 5 2 ## 3 12 149 12.6 74 5 3 ## 4 18 313 11.5 62 5 4 ## 5 NA NA 14.3 56 5 5 ## 6 28 NA 14.9 66 5 6 good <- complete.cases(airquality) head(airquality[good, ]) ## Ozone Solar.R Wind Temp Month Day ## 1 41 190 7.4 67 5 1 ## 2 36 118 8.0 72 5 2 ## 3 12 149 12.6 74 5 3 ## 4 18 313 11.5 62 5 4 ## 7 23 299 8.6 65 5 7 ## 8 19 99 13.8 59 5 8
You'll be working with swimming_pools.csv; it contains data on swimming pools in Brisbane, Australia (Source: data.gov.au). The file contains the column names in the first row. It uses a comma to separate values within rows.
read.csv()
and read.table()
to import "swimming_pools.csv" as a data frame with the name pools
.write.table()
, dput()
, and save()
functions to write pools
to files.