Overview of R
R nuts and bolts
Getting data in and out of R
Subsetting R objects
A freely available language and environment.
Statistical computing and graphics.
Linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
Syntax highlighting
Able to evaluate R code
Command auto-completion
When you download R from CRAN, you get the "base" system - a substantial amount of functionality.
10,000 packages on CRAN that have been developed by users and programmers around the world.
People often make packages available on their personal websites.
There are a number of packages being developed on repositories like GitHub and BitBucket.
1 + 2 + 31 + 2 * 3 x <- 1y <- 2z <- c(x,y)zexp(1)cos(3.141593)log2(1)
R has five basic classes of objects:
Numbers in R are generally treated as numeric objects.
Difference of 1
and 1L
?
Special number Inf
. Try 1/Inf
.
NaN
: an undefined value (not a number). Try 0/0
. It can also be thought of as a missing value.
Attributes can be accessed by attributes()
. Some examples of R object attributes are:
The c()
function can be used to create vectors of objects by concatenating things together.
x <- c(0.5, 0.6) # numericx <- c(TRUE, FALSE) # logicalx <- c(T, F) # logicalx <- c("a", "b", "c") # characterx <- 9:29 # integerx <- c(1+0i, 2+4i) # complex
You can also use the vector()
function to initialize vectors.
x <- vector("numeric", length = 10) x#> [1] 0 0 0 0 0 0 0 0 0 0
Matrices are vectors with a dimension attribute.
m <- matrix(c(1:6), 2, 3)attributes(m)dim(m)t(m)m[1, 2]m[1, ]n <- matrix(c(8:13), 2, 3)cbind(m, n)rbind(m, n)
l <- list(a = c(1, 2), b = 'apple')attributes(l)#> $names#> [1] "a" "b"
Factors are used to represent categorical data.
f <- factor(c("yes", "yes", "no", "yes", "no")) attributes(f)#> $levels#> [1] "no" "yes"#> #> $class#> [1] "factor"
d <- data.frame(x = 1:10, y = letters[1:10])attributes(d)names(d)row.names(d)
There are a few principal functions reading data into R.
read.table
, read.csv
, for reading tabular data readLines
, for reading lines of a text filesource
, for reading in R code files (inverse
of dump
) dget
, for reading in R code files (inverse
of dput
)load
, for reading in saved workspacesThere are analogous functions for writing data to files.
write.table
, for writing tabular data to text files (i.e. CSV) or
connectionswriteLines
, for writing character data line-by-line to a file or
connectiondump
, for dumping a textual representation of multiple R objectsdput
, for outputting a textual representation of an R objectsave
, for saving an arbitrary number of R objects in binary format
(possibly compressed) to a filesThere are many R packages that have been developed to read
in all kinds of other datasets (e.g., the readr
package).
There are three operators that can be used to extract subsets of R objects.
The [
operator always returns an object of the same class as the
original. It can be used to select multiple elements of an object
The [[
operator is used to extract elements of a list or a data
frame. It can only be used to extract a single element and the class
of the returned object will not necessarily be a list or data frame.
The $
operator is used to extract elements of a list or data frame
by literal name. Its semantics are similar to that of [[
.
Vectors are basic objects in R and they can be subsetted using the [
operator.
x <- c("a", "b", "c", "c", "d", "a") x[1] # Extract the first elementx[2] # Extract the second element
The [
operator can be used to extract multiple elements of a vector
by passing the operator an integer sequence. Here we extract the first
four elements of the vector.
x[1:4]x[c(1, 3, 4)]x[x > 2]
Matrices can be subsetted in the usual way with (i, j) type indices.
x <- matrix(1:6, 2, 3)x
We can access the (1,2) or the (2,1) element of this matrix using the appropriate indices.
x[1, 2]x[2, 1]
Indices can also be missing. This behavior is used to access entire rows or columns of a matrix.
x[1, ] # Extract the first rowx[, 2] # Extract the second column
Lists in R can be subsetted using all three of the operators mentioned above, and all three are used for different purposes.
x <- list(foo = 1:4, bar = 0.6)
The [[
operator can be used to extract single elements from a
list. Here we extract the first element of the list.
x[[1]]
The [[
operator can also use named indices so that you don't have to
remember the exact ordering of every element of the list. You can also
use the $
operator to extract elements by name.
x[["bar"]]x$bar
The [[
operator can take an integer sequence if you want to extract
a nested element of a list.
x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))# Get the 3rd element of the 1st elementx[[c(1, 3)]] # Same as abovex[[1]][[3]] # 1st element of the 2nd elementx[[c(2, 1)]]
The [
operator can be used to extract multiple elements from a
list.
x <- list(foo = 1:4, bar = 0.6, baz = "hello")x[c(1, 3)]
Note that x[c(1, 3)]
is NOT the same as x[[c(1, 3)]]
.
Remember that the [
operator always returns an object of the same
class as the original. Since the original object was a list, the [
operator returns a list. In the above code, we returned a list with
two elements (the first and the third).
A common task in data analysis is removing missing values (NA
s).
x <- c(1, 2, NA, 4, NA, 5)bad <- is.na(x)print(bad)x[!bad]
What if there are multiple R objects and you want to take the subset with no missing values in any of those objects?
head(airquality)good <- complete.cases(airquality)head(airquality[good, ])
Overview of R
R nuts and bolts
Getting data in and out of R
Subsetting R objects
You'll be working with swimming_pools.csv; it contains data on swimming pools in Brisbane, Australia (Source: data.gov.au). The file contains the column names in the first row. It uses a comma to separate values within rows.
Try read.csv()
and read.table()
to import "swimming_pools.csv" as a data frame with the name pools
.
Try write.table()
, dput()
, and save()
functions to write pools
to files.
Restart R and read your saved data in R.
Practice subsetting of a data frame.
Chapter 1 of my book in progress.
Chapters 3-10 of the book "R programming for data science".
Overview of R
R nuts and bolts
Getting data in and out of R
Subsetting R objects
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |