layout: true --- class: inverse, center, middle background-image: url(../figs/titlepage16-9.png) background-size: cover <br> <br> # Bayesian Statistics and Computing ## Lecture 3: Control structures and functions <img src="../figs/slides.png" width="150px"/> #### *Yanfei Kang | BSC | 2021 Spring* --- class: inverse, center, middle # Control structures --- # Commonly used control structures - `if` and `else`: testing a condition and acting on it - `for`: execute a loop a fixed number of times - `while`: execute a loop _while_ a condition is true - `repeat`: execute an infinite loop (must `break` out of it to stop) - `break`: break the execution of a loop - `next`: skip an interation of a loop --- # `if`-`else` The `if`-`else` combination is probably the most commonly used control structure in R (or perhaps any language). For starters, you can just use the `if` statement. ```r if(<condition>) { ## do something } ## Continue with rest of code ``` --- # `if`-`else` If you have an action you want to execute when the condition is false, then you need an `else` clause. ```r if(<condition>) { ## do something } else { ## do something else } ``` --- # `if`-`else` You can have a series of tests by following the initial `if` with any number of `else if`s. ```r if(<condition1>) { ## do something } else if(<condition2>) { ## do something different } else { ## do something different } ``` --- # `if`-`else` Example .scroll-output[ ```r ## Generate a uniform random number x <- runif(1, 0, 10) if (x > 3) { y <- 10 } else { y <- 0 } ``` Or you can write: ```r y <- if (x > 3) { 10 } else { 0 } ``` ] --- # `for` Loops .scroll-output[ For loops are most commonly used for iterating over the elements of an object (list, vector, etc.) ```r for (i in 1:10) { print(i) } #> [1] 1 #> [1] 2 #> [1] 3 #> [1] 4 #> [1] 5 #> [1] 6 #> [1] 7 #> [1] 8 #> [1] 9 #> [1] 10 ``` ] --- # `for` Loops Example ```r x <- c("a", "b", "c", "d") for (i in 1:4) { ## Print out each element of 'x' print(x[i]) } #> [1] "a" #> [1] "b" #> [1] "c" #> [1] "d" ``` --- # `for` Loops Example The `seq_along()` function is commonly used in conjunction with for loops in order to generate an integer sequence based on the length of an object (in this case, the object `x`). ```r ## Generate a sequence based on length of 'x' for (i in seq_along(x)) { print(x[i]) } #> [1] "a" #> [1] "b" #> [1] "c" #> [1] "d" ``` --- # `for` Loops Example It is not necessary to use an index-type variable. ```r for (letter in x) { print(letter) } #> [1] "a" #> [1] "b" #> [1] "c" #> [1] "d" ``` --- # Nested `for` loops `for` loops can be nested inside of each other. ```r x <- matrix(1:6, 2, 3) for(i in seq_len(nrow(x))) { for(j in seq_len(ncol(x))) { print(x[i, j]) } } ``` Nested loops are commonly needed for multidimensional or hierarchical data structures (e.g. matrices, lists). --- # `while` Loops While loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth, until the condition is false, after which the loop exits. ```r count <- 0 while (count < 10) { print(count) count <- count + 1 } #> [1] 0 #> [1] 1 #> [1] 2 #> [1] 3 #> [1] 4 #> [1] 5 #> [1] 6 #> [1] 7 #> [1] 8 #> [1] 9 ``` --- # `repeat` Loops `repeat` initiates an infinite loop right from the start. The only way to exit a `repeat` loop is to call `break`. ```r x0 <- 1 tol <- 1e-08 repeat { x1 <- computeEstimate() if (abs(x1 - x0) < tol) { ## Close enough? break } else { x0 <- x1 } } ``` --- # `break` `next` is used to skip an iteration of a loop. ```r for (i in 1:100) { if (i <= 20) { ## Skip the first 20 iterations next } ## Do something here } ``` --- # `break` `break` is used to exit a loop immediately, regardless of what iteration the loop may be on. ```r for (i in 1:100) { print(i) if (i > 20) { ## Stop loop after 20 iterations break } } ``` --- # Summary - Control structures like `if`, `while`, and `for` allow you to control the flow of an R program - Infinite loops should generally be avoided, even if (you believe) they are theoretically correct. - Control structures mentioned here are primarily useful for writing programs; for command-line interactive work, the "apply" functions are more useful. --- class: inverse, center, middle # Functions --- # Functions - A transition from a mere "user" to a developer! - Often used to encapsulate a sequence of expressions that need to be executed numerous times, perhaps under slightly different conditions. - Often written when code must be shared with others or the public. - When to Write a Function? --- # Your First Function Functions are defined using the `function()` directive and are stored as R objects just like anything else. In particular, they are R objects of class "function". ```r f <- function() { cat("Hello, world!\n") } f() #> Hello, world! ``` --- # Function arguments The last aspect of a basic function is the *function arguments*. ```r f <- function(num) { for (i in seq_len(num)) { cat("Hello, world!\n") } } f(3) #> Hello, world! #> Hello, world! #> Hello, world! ``` --- # Example ```r f <- function(num) { hello <- "Hello, world!\n" for (i in seq_len(num)) { cat(hello) } chars <- nchar(hello) * num chars } meaningoflife <- f(3) #> Hello, world! #> Hello, world! #> Hello, world! print(meaningoflife) #> [1] 42 ``` --- # Default Values Try this: ```r f() ``` We can modify this behavior by setting a *default value* for the argument `num`. ```r f <- function(num = 1) { hello <- "Hello, world!\n" for (i in seq_len(num)) { cat(hello) } chars <- nchar(hello) * num chars } f() ## Use default value for 'num' f(2) ## Use user-specified value ``` --- # Default Values At this point, we have written a function that * has one *formal argument* named `num` with a *default value* of 1. The _formal arguments_ are the arguments included in the function definition. The `formals()` function returns a list of all the formal arguments of a function * prints the message "Hello, world!" to the console a number of times indicated by the argument `num` * *returns* the number of characters printed to the console --- # Argument Matching R assigns the first value to the first argument, the second value to second argument, etc. So in the following call to `rnorm()` ```r str(rnorm) #> function (n, mean = 0, sd = 1) mydata <- rnorm(100, 2, 1) ## Generate some data ``` 100 is assigned to the `n` argument, 2 is assigned to the `mean` argument, and 1 is assigned to the `sd` argument, all by positional matching. ```r ## Positional match first argument, default for 'na.rm' sd(mydata) #> [1] 1.036166 ## Specify 'x' argument by name, default for 'na.rm' sd(x = mydata) #> [1] 1.036166 ``` --- # Argument Matching When specifying the function arguments by name, it doesn't matter in what order you specify them. ```r ## Specify both arguments by name sd(na.rm = FALSE, x = mydata) #> [1] 1.036166 ``` You can mix positional matching with matching by name. ```r sd(na.rm = FALSE, mydata) #> [1] 1.036166 ``` Here, the `mydata` object is assigned to the `x` argument, because it's the only argument not yet specified. --- # Argument Matching Function arguments can also be _partially_ matched, which is useful for interactive work. The order of operations when given an argument is 1. Check for exact match for a named argument. 2. Check for a partial match. 3. Check for a positional match. --- # The `...` Argument - There is a special argument in R known as the `...` argument, which indicate a variable number of arguments that are usually passed on to other functions. - The `...` argument is often used when extending another function and you don’t want to copy the entire argument list of the original function - For example, a custom mean function may want to make use of the default `mean()` function along with its entire argument list. The function below changes the default for the `na.rm` argument to the value `na.rm = "TRUE"` (the original default was `na.rm = "FALSE"`). ```r mymean <- function(x, na.rm = TRUE, ...) { mean(x, na.rm = na.rm, ...) ## Pass '...' to 'mean?rnorm' function } x <- c(1, 2, NA) mean(x) mymean(x) ``` --- # Summary * Functions can be defined using the `function()` directive and are assigned to R objects just like any other R object * Functions have can be defined with named arguments; these function arguments can have default values * Functions arguments can be specified by name or by position in the argument list * Functions always return the last expression evaluated in the function body * A variable number of arguments can be specified using the special `...` argument in a function definition. --- class: inverse, center, middle # Looping Functions --- # Looping Functions Writing `for` and `while` loops is useful when programming but not particularly easy when working interactively on the command line. Functions that make your life easier: - `lapply()`: Loop over a list and evaluate a function on each element - `sapply()`: Same as `lapply` but try to simplify the result - `apply()`: Apply a function over the margins of an array - `tapply()`: Apply a function over subsets of a vector - `mapply()`: Multivariate version of `lapply` --- # `lapply()` The `lapply()` function does the following simple series of operations: 1. it loops over a list, iterating over each element in that list. 2. it applies a *function* to each element of the list (a function that you specify). 3. and returns a list (the `l` is for "list"). This function takes three arguments: 1. a list `X`. 2. a function (or the name of a function) `FUN`. 3. other arguments via its `...` argument. If `X` is not a list, it will be coerced to a list using `as.list()`. --- # `lapply()` Example 1 Here's an example of applying the `mean()` function to all elements of a list. ```r x <- list(a = 1:5, b = rnorm(10)) lapply(x, mean) #> $a #> [1] 3 #> #> $b #> [1] -0.1000435 ``` --- # `lapply()` Example 2 ```r x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean) #> $a #> [1] 2.5 #> #> $b #> [1] -0.5675725 #> #> $c #> [1] 1.416292 #> #> $d #> [1] 4.90232 ``` --- # `lapply()` Example 3 ```r x <- 1:4 lapply(x, runif) #> [[1]] #> [1] 0.6485655 #> #> [[2]] #> [1] 0.8712539 0.9314336 #> #> [[3]] #> [1] 0.4651773 0.2021731 0.4976222 #> #> [[4]] #> [1] 0.06183679 0.10366678 0.01746363 0.07295415 ``` --- # `lapply()` Now how about other arguments? ```r x <- 1:4 lapply(x, runif, min = 0, max = 10) #> [[1]] #> [1] 8.221799 #> #> [[2]] #> [1] 4.7108634 0.4580967 #> #> [[3]] #> [1] 5.998154 1.579201 5.045083 #> #> [[4]] #> [1] 0.8484653 8.7082456 5.4318032 7.3086981 ``` --- # `lapply()` Example 4 - Now we create a list that contains two matrices. ```r x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2)) x ``` - How do you extract the first column of each matrix in the list? - Now you need an anonymous function for extracting the first column of each matrix. ```r lapply(x, function(elt) { elt[, 1] }) #> $a #> [1] 1 2 #> #> $b #> [1] 1 2 3 ``` --- # `sapply()` .scroll-output[ - The `sapply()` function behaves similarly to `lapply()`; the only real difference is in the return value. - `sapply()` will try to simplify the result of `lapply()` if possible. - Compare `lapply()` and `sapply`. ```r x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean) sapply(x, mean) #> $a #> [1] 2.5 #> #> $b #> [1] -0.2040602 #> #> $c #> [1] 0.7473551 #> #> $d #> [1] 4.998614 #> #> a b c d #> 2.5000000 -0.2040602 0.7473551 4.9986140 ``` ] --- # `apply()` - Used to a evaluate a function over the margins of an array. - Often used to apply a function to the rows or columns of a matrix. - Using `apply()` is not really faster than writing a loop, but it works in one line and is highly compact. ```r str(apply) #> function (X, MARGIN, FUN, ...) ``` The arguments to `apply()` are - `X` is an array - `MARGIN` is an integer vector indicating which margins should be “retained”. - `FUN` is a function to be applied - `...` is for other arguments to be passed to `FUN` --- # `apply()` Example ```r x <- matrix(rnorm(200), 20, 10) apply(x, 2, mean) ## Take the mean of each column #> [1] -0.286316012 0.115644854 -0.001762943 0.251713612 -0.096297975 -0.088209138 -0.077962401 -0.367332965 #> [9] -0.141669153 -0.024868715 apply(x, 1, sum) ## Take the sum of each row #> [1] -1.55065950 -1.40328132 -3.59529339 -0.04404654 -5.87146664 2.52399690 -7.93115986 -0.71577629 0.32184832 #> [10] -2.53119217 2.14521484 1.01312004 -2.71837139 -6.85548022 -0.88882545 -0.98437231 5.56520753 6.44686949 #> [19] 2.00105067 0.73140056 ``` - Note that in both calls to `apply()`, the return value was a vector of numbers. - The `MARGIN` argument essentially indicates to `apply()` which dimension of the array you want to preserve or retain. --- # Col/Row Sums and Means For the special case of column/row sums and column/row means of matrices, we have some useful shortcuts. - `rowSums` = `apply(x, 1, sum)` - `rowMeans` = `apply(x, 1, mean)` - `colSums` = `apply(x, 2, sum)` - `colMeans` = `apply(x, 2, mean)` The shortcut functions are heavily optimized and hence are _much_ faster, but you probably won’t notice unless you’re using a large matrix. Another nice aspect of these functions is that they are a bit more descriptive. It's arguably more clear to write `colMeans(x)` in your code than `apply(x, 2, mean)`. --- # Other Ways to Apply You can do more than take sums and means with the `apply()` function. For example, you can compute quantiles of the rows of a matrix using the `quantile()` function. ```r x <- matrix(rnorm(200), 20, 10) ## Get row quantiles apply(x, 1, quantile, probs = c(0.25, 0.75)) #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> 25% 0.01130106 -0.01995317 -0.2378423 -0.8439313 -0.8822925 -1.1280650 -0.295448 -0.1768166 -0.7035448 -0.6820594 #> 75% 0.62482585 0.98427844 0.8321674 0.9929153 0.4540076 0.4197237 1.214313 0.7833231 0.1471374 0.5762199 #> [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] #> 25% -0.4848463 -0.3366767 -0.8381989 -0.2835923 -0.7302496 -1.3950466 -1.05817115 -1.2686266 0.4284382 -0.54293782 #> 75% 0.4395252 0.3859112 0.5867467 1.2757423 0.8432561 0.5818128 0.09094471 0.1413441 1.1834047 0.08463423 ``` Notice that I had to pass the `probs = c(0.25, 0.75)` argument to `quantile()` via the `...` argument to `apply()`. --- # Summary * The loop functions in R are very powerful because they allow you to conduct a series of operations on data using a compact form. * The operation of a loop function involves iterating over an R object (e.g. a list or vector or matrix), applying a function to each element of the object, and the collating the results and returning the collated results. * Loop functions make heavy use of anonymous functions, which exist for the life of the loop function but are not stored anywhere. * Other loop functions like `tapply()` and `mapply()` are also very useful (see the references). --- # Lab Session 3 In this lab, you will use the temperature data in four cities: Melbourne, Sydney, Brisbane and Cairns. You can download them from https://yanfei.site/docs/sc/data/temp.zip. 1. Please make a function `load.file()` to read a .csv file and transform the first column (a character representing date and time) using `as.POSIXlt` into R time format. 2. Then apply `load.file()` to each filename using `lapply()`. 3. How many rows of data are there for each city? 4. What is the hottest temperature recorded by city? 5. Estimate the autocorrelation function for each city. --- # References - [Chapter 1 of my book](https://yanfei.site/docs/statscompbook/R.html) in progress. - Chapters 14, 15 and 18 of the book "R programming for data science".