layout: true --- class: inverse, center, middle background-image: url(../figs/titlepage16-9.png) background-size: cover <br> <br> # Bayesian Statistics and Computing ## Lecture 1: R Basics <img src="../figs/slides.png" width="150px"/> #### *Yanfei Kang | BSC | Beihang University* --- # Objectives - Overview of R - R nuts and bolts - Getting data in and out of R - Subsetting R objects --- class: inverse, center, middle # Overview of R --- # What is R? - A freely available language and environment. - Statistical computing and graphics. - Linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. --- # Installation - [Install R](https://cran.r-project.org/) - [Install Rstudio](https://www.rstudio.com/products/rstudio/download/#download) --- # Why Rstudio? - Syntax highlighting - Able to evaluate R code + by line + by selection + entire file - Command auto-completion --- # Design of the R System - When you download R from CRAN, you get the "base" system - a substantial amount of functionality. - 10,000 packages on CRAN that have been developed by users and programmers around the world. - People often make packages available on their personal websites. - There are a number of packages being developed on repositories like GitHub and BitBucket. --- class: inverse, center, middle # R Nuts and Bolts --- # Basic Operations ```r 1 + 2 + 3 1 + 2 * 3 x <- 1 y <- 2 z <- c(x,y) z exp(1) cos(3.141593) log2(1) ``` --- # R Objects R has five basic classes of objects: 1. character 2. numeric (real numbers) 3. integer 4. complex 5. logical (True/False) --- # Numbers - Numbers in R are generally treated as numeric objects. - Difference of `1` and `1L`? - Special number `Inf`. Try `1/Inf`. - `NaN`: an undefined value (not a number). Try `0/0`. It can also be thought of as a missing value. --- # Attributes Attributes can be accessed by `attributes()`. Some examples of R object attributes are: - names, dimnames - dimensions (e.g. matrices, arrays) - class (e.g. integer, numeric) - length --- # Vectors The `c()` function can be used to create vectors of objects by concatenating things together. ```r x <- c(0.5, 0.6) # numeric x <- c(TRUE, FALSE) # logical x <- c(T, F) # logical x <- c("a", "b", "c") # character x <- 9:29 # integer x <- c(1+0i, 2+4i) # complex ``` You can also use the `vector()` function to initialize vectors. ```r x <- vector("numeric", length = 10) x #> [1] 0 0 0 0 0 0 0 0 0 0 ``` --- # Matrices Matrices are vectors with a *dimension* attribute. ```r m <- matrix(c(1:6), 2, 3) attributes(m) dim(m) t(m) m[1, 2] m[1, ] n <- matrix(c(8:13), 2, 3) cbind(m, n) rbind(m, n) ``` --- # Lists - Special data structure that matrix could not handle. - Data length are not the same. - Data type are not the same. ```r l <- list(a = c(1, 2), b = 'apple') attributes(l) #> $names #> [1] "a" "b" ``` --- # Factors Factors are used to represent categorical data. ```r f <- factor(c("yes", "yes", "no", "yes", "no")) attributes(f) #> $levels #> [1] "no" "yes" #> #> $class #> [1] "factor" ``` --- # Data Frames - A special type of list. - Unlike matrices -- data frames can store different classes of objects in each column. - They have column names and row names. ```r d <- data.frame(x = 1:10, y = letters[1:10]) attributes(d) names(d) row.names(d) ``` --- class: inverse, center, middle # Getting Data in and out of R --- # Reading Data There are a few principal functions reading data into R. * `read.table`, `read.csv`, for reading tabular data * `readLines`, for reading lines of a text file * `source`, for reading in R code files (`inverse` of `dump`) * `dget`, for reading in R code files (`inverse` of `dput`) * `load`, for reading in saved workspaces --- # Writing Data There are analogous functions for writing data to files. * `write.table`, for writing tabular data to text files (i.e. CSV) or connections * `writeLines`, for writing character data line-by-line to a file or connection * `dump`, for dumping a textual representation of multiple R objects * `dput`, for outputting a textual representation of an R object * `save`, for saving an arbitrary number of R objects in binary format (possibly compressed) to a files There are many R packages that have been developed to read in all kinds of other datasets (e.g., the `readr` package). --- class: inverse, center, middle # Subsetting R objects --- # How to Subset? There are three operators that can be used to extract subsets of R objects. - The `[` operator always returns an object of the same class as the original. It can be used to select multiple elements of an object - The `[[` operator is used to extract elements of a list or a data frame. It can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame. - The `$` operator is used to extract elements of a list or data frame by literal name. Its semantics are similar to that of `[[`. --- # Subsetting a Vector Vectors are basic objects in R and they can be subsetted using the `[` operator. ```r x <- c("a", "b", "c", "c", "d", "a") x[1] # Extract the first element x[2] # Extract the second element ``` The `[` operator can be used to extract multiple elements of a vector by passing the operator an integer sequence. Here we extract the first four elements of the vector. ```r x[1:4] x[c(1, 3, 4)] x[x > 2] ``` --- # Subsetting a Matrix Matrices can be subsetted in the usual way with (i, j) type indices. ```r x <- matrix(1:6, 2, 3) x ``` We can access the (1,2) or the (2,1) element of this matrix using the appropriate indices. ```r x[1, 2] x[2, 1] ``` Indices can also be missing. This behavior is used to access entire rows or columns of a matrix. ```r x[1, ] # Extract the first row x[, 2] # Extract the second column ``` --- # Subsetting Lists Lists in R can be subsetted using all three of the operators mentioned above, and all three are used for different purposes. ```r x <- list(foo = 1:4, bar = 0.6) ``` The `[[` operator can be used to extract *single* elements from a list. Here we extract the first element of the list. ```r x[[1]] ``` The `[[` operator can also use named indices so that you don't have to remember the exact ordering of every element of the list. You can also use the `$` operator to extract elements by name. ```r x[["bar"]] x$bar ``` --- # Subsetting Nested Elements of a List The `[[` operator can take an integer sequence if you want to extract a nested element of a list. ```r x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) # Get the 3rd element of the 1st element x[[c(1, 3)]] # Same as above x[[1]][[3]] # 1st element of the 2nd element x[[c(2, 1)]] ``` --- # Extracting Multiple Elements of a List The `[` operator can be used to extract *multiple* elements from a list. ```r x <- list(foo = 1:4, bar = 0.6, baz = "hello") x[c(1, 3)] ``` Note that `x[c(1, 3)]` is NOT the same as `x[[c(1, 3)]]`. Remember that the `[` operator always returns an object of the same class as the original. Since the original object was a list, the `[` operator returns a list. In the above code, we returned a list with two elements (the first and the third). --- # Removing NA Values A common task in data analysis is removing missing values (`NA`s). ```r x <- c(1, 2, NA, 4, NA, 5) bad <- is.na(x) print(bad) x[!bad] ``` What if there are multiple R objects and you want to take the subset with no missing values in any of those objects? ```r head(airquality) good <- complete.cases(airquality) head(airquality[good, ]) ``` --- # Review of this lecture - Overview of R - R nuts and bolts - Getting data in and out of R - Subsetting R objects --- class: inverse, center, middle # Lab Session 1 --- # Read and Write Data in R You'll be working with [swimming_pools.csv](http://s3.amazonaws.com/assets.datacamp.com/production/course_1477/datasets/swimming_pools.csv); it contains data on swimming pools in Brisbane, Australia (Source: [data.gov.au](https://data.gov.au/)). The file contains the column names in the first row. It uses a comma to separate values within rows. 1. Try `read.csv()` and `read.table()` to import "swimming_pools.csv" as a data frame with the name `pools`. 2. Try `write.table()`, `dput()`, and `save()` functions to write `pools` to files. 3. Restart R and read your saved data in R. 4. Practice subsetting of a data frame. --- # References - [Chapter 1 of my book](https://yanfei.site/docs/statscompbook/R.html) in progress. - Chapters 3-10 of the book "R programming for data science".