layout: true --- class: inverse, center, middle background-image: url(../figs/titlepage16-9.png) background-size: cover <br> <br> # Bayesian Statistics and Computing ## Lecture 3: Control structures and functions <img src="../figs/slides.png" width="150px"/> #### *Yanfei Kang | BSC | Beihang University* --- class: inverse, center, middle # Control structures --- # Commonly used control structures - `if` and `else`: testing a condition and acting on it - `for`: execute a loop a fixed number of times - `while`: execute a loop _while_ a condition is true - `repeat`: execute an infinite loop (must `break` out of it to stop) - `break`: break the execution of a loop - `next`: skip an interation of a loop --- # `if`-`else` The `if`-`else` combination is probably the most commonly used control structure in R (or perhaps any language). For starters, you can just use the `if` statement. ```r if(<condition>) { ## do something } ## Continue with rest of code ``` --- # `if`-`else` If you have an action you want to execute when the condition is false, then you need an `else` clause. ```r if(<condition>) { ## do something } else { ## do something else } ``` --- # `if`-`else` You can have a series of tests by following the initial `if` with any number of `else if`s. ```r if(<condition1>) { ## do something } else if(<condition2>) { ## do something different } else { ## do something different } ``` --- # `if`-`else` Example .scroll-output[ ```r ## Generate a uniform random number x <- runif(1, 0, 10) if(x > 3) { y <- 10 } else { y <- 0 } ``` Or you can write: ```r y <- if(x > 3) { 10 } else { 0 } ``` ] --- # `for` Loops .scroll-output[ For loops are most commonly used for iterating over the elements of an object (list, vector, etc.) ```r for(i in 1:10) { print(i) } #> [1] 1 #> [1] 2 #> [1] 3 #> [1] 4 #> [1] 5 #> [1] 6 #> [1] 7 #> [1] 8 #> [1] 9 #> [1] 10 ``` ] --- # `for` Loops Example ```r x <- c("a", "b", "c", "d") for(i in 1:4) { ## Print out each element of 'x' print(x[i]) } #> [1] "a" #> [1] "b" #> [1] "c" #> [1] "d" ``` --- # `for` Loops Example The `seq_along()` function is commonly used in conjunction with for loops in order to generate an integer sequence based on the length of an object (in this case, the object `x`). ```r ## Generate a sequence based on length of 'x' for(i in seq_along(x)) { print(x[i]) } #> [1] "a" #> [1] "b" #> [1] "c" #> [1] "d" ``` --- # `for` Loops Example It is not necessary to use an index-type variable. ```r for(letter in x) { print(letter) } #> [1] "a" #> [1] "b" #> [1] "c" #> [1] "d" ``` --- # Nested `for` loops `for` loops can be nested inside of each other. ```r x <- matrix(1:6, 2, 3) for(i in seq_len(nrow(x))) { for(j in seq_len(ncol(x))) { print(x[i, j]) } } ``` Nested loops are commonly needed for multidimensional or hierarchical data structures (e.g. matrices, lists). --- # `while` Loops While loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth, until the condition is false, after which the loop exits. ```r count <- 0 while(count < 10) { print(count) count <- count + 1 } #> [1] 0 #> [1] 1 #> [1] 2 #> [1] 3 #> [1] 4 #> [1] 5 #> [1] 6 #> [1] 7 #> [1] 8 #> [1] 9 ``` --- # `repeat` Loops `repeat` initiates an infinite loop right from the start. The only way to exit a `repeat` loop is to call `break`. ```r x0 <- 1 tol <- 1e-8 repeat { x1 <- computeEstimate() if(abs(x1 - x0) < tol) { ## Close enough? break } else { x0 <- x1 } } ``` --- # `break` `next` is used to skip an iteration of a loop. ```r for(i in 1:100) { if(i <= 20) { ## Skip the first 20 iterations next } ## Do something here } ``` --- # `break` `break` is used to exit a loop immediately, regardless of what iteration the loop may be on. ```r for(i in 1:100) { print(i) if(i > 20) { ## Stop loop after 20 iterations break } } ``` --- # Summary - Control structures like `if`, `while`, and `for` allow you to control the flow of an R program - Infinite loops should generally be avoided, even if (you believe) they are theoretically correct. - Control structures mentioned here are primarily useful for writing programs; for command-line interactive work, the "apply" functions are more useful. --- class: inverse, center, middle # Functions --- # Functions - A transition from a mere "user" to a developer! - Often used to encapsulate a sequence of expressions that need to be executed numerous times, perhaps under slightly different conditions. - Often written when code must be shared with others or the public. - When to Write a Function? --- # Your First Function Functions are defined using the `function()` directive and are stored as R objects just like anything else. In particular, they are R objects of class "function". ```r f <- function() { cat("Hello, world!\n") } f() #> Hello, world! ``` --- # Function arguments The last aspect of a basic function is the *function arguments*. ```r f <- function(num) { for(i in seq_len(num)) { cat("Hello, world!\n") } } f(3) #> Hello, world! #> Hello, world! #> Hello, world! ``` --- # Example ```r f <- function(num) { hello <- "Hello, world!\n" for(i in seq_len(num)) { cat(hello) } chars <- nchar(hello) * num chars } meaningoflife <- f(3) #> Hello, world! #> Hello, world! #> Hello, world! print(meaningoflife) #> [1] 42 ``` --- # Default Values Try this: ```r f() ``` We can modify this behavior by setting a *default value* for the argument `num`. ```r f <- function(num = 1) { hello <- "Hello, world!\n" for(i in seq_len(num)) { cat(hello) } chars <- nchar(hello) * num chars } f() ## Use default value for 'num' f(2) ## Use user-specified value ``` --- # Default Values At this point, we have written a function that * has one *formal argument* named `num` with a *default value* of 1. The _formal arguments_ are the arguments included in the function definition. The `formals()` function returns a list of all the formal arguments of a function * prints the message "Hello, world!" to the console a number of times indicated by the argument `num` * *returns* the number of characters printed to the console --- # Argument Matching R assigns the first value to the first argument, the second value to second argument, etc. So in the following call to `rnorm()` ```r str(rnorm) #> function (n, mean = 0, sd = 1) mydata <- rnorm(100, 2, 1) ## Generate some data ``` 100 is assigned to the `n` argument, 2 is assigned to the `mean` argument, and 1 is assigned to the `sd` argument, all by positional matching. ```r ## Positional match first argument, default for 'na.rm' sd(mydata) #> [1] 0.9129092 ## Specify 'x' argument by name, default for 'na.rm' sd(x = mydata) #> [1] 0.9129092 ``` --- # Argument Matching When specifying the function arguments by name, it doesn't matter in what order you specify them. ```r ## Specify both arguments by name sd(na.rm = FALSE, x = mydata) #> [1] 0.9129092 ``` You can mix positional matching with matching by name. ```r sd(na.rm = FALSE, mydata) #> [1] 0.9129092 ``` Here, the `mydata` object is assigned to the `x` argument, because it's the only argument not yet specified. --- # Argument Matching Function arguments can also be _partially_ matched, which is useful for interactive work. The order of operations when given an argument is 1. Check for exact match for a named argument. 2. Check for a partial match. 3. Check for a positional match. --- # The `...` Argument - There is a special argument in R known as the `...` argument, which indicate a variable number of arguments that are usually passed on to other functions. - The `...` argument is often used when extending another function and you don’t want to copy the entire argument list of the original function - For example, a custom mean function may want to make use of the default `mean()` function along with its entire argument list. The function below changes the default for the `na.rm` argument to the value `na.rm = "TRUE"` (the original default was `na.rm = "FALSE"`). ```r mymean <- function(x, na.rm = TRUE, ...) { mean(x, na.rm = na.rm, ...) ## Pass '...' to 'mean?rnorm' function } x <- c(1, 2, NA) mean(x) mymean(x) ``` --- # Summary * Functions can be defined using the `function()` directive and are assigned to R objects just like any other R object * Functions have can be defined with named arguments; these function arguments can have default values * Functions arguments can be specified by name or by position in the argument list * Functions always return the last expression evaluated in the function body * A variable number of arguments can be specified using the special `...` argument in a function definition. --- class: inverse, center, middle # Looping Functions --- # Looping Functions Writing `for` and `while` loops is useful when programming but not particularly easy when working interactively on the command line. Functions that make your life easier: - `lapply()`: Loop over a list and evaluate a function on each element - `sapply()`: Same as `lapply` but try to simplify the result - `apply()`: Apply a function over the margins of an array - `tapply()`: Apply a function over subsets of a vector - `mapply()`: Multivariate version of `lapply` --- # `lapply()` The `lapply()` function does the following simple series of operations: 1. it loops over a list, iterating over each element in that list. 2. it applies a *function* to each element of the list (a function that you specify). 3. and returns a list (the `l` is for "list"). This function takes three arguments: 1. a list `X`. 2. a function (or the name of a function) `FUN`. 3. other arguments via its `...` argument. If `X` is not a list, it will be coerced to a list using `as.list()`. --- # `lapply()` Example 1 Here's an example of applying the `mean()` function to all elements of a list. ```r x <- list(a = 1:5, b = rnorm(10)) lapply(x, mean) #> $a #> [1] 3 #> #> $b #> [1] -0.2310251 ``` --- # `lapply()` Example 2 ```r x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean) #> $a #> [1] 2.5 #> #> $b #> [1] -0.6603277 #> #> $c #> [1] 1.375513 #> #> $d #> [1] 5.067981 ``` --- # `lapply()` Example 3 ```r x <- 1:4 lapply(x, runif) #> [[1]] #> [1] 0.820082 #> #> [[2]] #> [1] 0.5053963 0.2943039 #> #> [[3]] #> [1] 0.8181448 0.7061302 0.9163055 #> #> [[4]] #> [1] 0.4902765 0.2215913 0.1520512 0.4420643 ``` --- # `lapply()` Now how about other arguments? ```r x <- 1:4 lapply(x, runif, min = 0, max = 10) #> [[1]] #> [1] 4.727666 #> #> [[2]] #> [1] 6.335041 4.961210 #> #> [[3]] #> [1] 7.6208627 0.7170543 9.4952564 #> #> [[4]] #> [1] 1.099728 8.152994 8.465917 4.898108 ``` --- # `lapply()` Example 4 - Now we create a list that contains two matrices. ```r x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2)) x ``` - How do you extract the first column of each matrix in the list? - Now you need an anonymous function for extracting the first column of each matrix. ```r lapply(x, function(elt) { elt[,1] }) #> $a #> [1] 1 2 #> #> $b #> [1] 1 2 3 ``` --- # `sapply()` .scroll-output[ - The `sapply()` function behaves similarly to `lapply()`; the only real difference is in the return value. - `sapply()` will try to simplify the result of `lapply()` if possible. - Compare `lapply()` and `sapply`. ```r x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean) sapply(x, mean) #> $a #> [1] 2.5 #> #> $b #> [1] 0.6553135 #> #> $c #> [1] 0.9797015 #> #> $d #> [1] 4.849482 #> #> a b c d #> 2.5000000 0.6553135 0.9797015 4.8494818 ``` ] --- # `apply()` - Used to a evaluate a function over the margins of an array. - Often used to apply a function to the rows or columns of a matrix. - Using `apply()` is not really faster than writing a loop, but it works in one line and is highly compact. ```r str(apply) #> function (X, MARGIN, FUN, ..., simplify = TRUE) ``` The arguments to `apply()` are - `X` is an array - `MARGIN` is an integer vector indicating which margins should be “retained”. - `FUN` is a function to be applied - `...` is for other arguments to be passed to `FUN` --- # `apply()` Example ```r x <- matrix(rnorm(200), 20, 10) apply(x, 2, mean) ## Take the mean of each column #> [1] -0.284489060 -0.136846481 0.047266859 -0.120402498 0.239887540 -0.057754509 0.115764520 0.007488874 #> [9] 0.152149542 0.127937661 apply(x, 1, sum) ## Take the sum of each row #> [1] 0.61509317 -2.32550976 -1.40095279 -4.65512633 -2.68393067 -5.38526811 -0.01722650 -3.14950520 2.49976248 #> [10] -2.45571181 3.31807769 -1.07169324 7.77476736 0.20359527 0.05222068 8.35084299 -2.03434083 -0.06800384 #> [19] 2.11837902 2.13457936 ``` - Note that in both calls to `apply()`, the return value was a vector of numbers. - The `MARGIN` argument essentially indicates to `apply()` which dimension of the array you want to preserve or retain. --- # Col/Row Sums and Means For the special case of column/row sums and column/row means of matrices, we have some useful shortcuts. - `rowSums` = `apply(x, 1, sum)` - `rowMeans` = `apply(x, 1, mean)` - `colSums` = `apply(x, 2, sum)` - `colMeans` = `apply(x, 2, mean)` The shortcut functions are heavily optimized and hence are _much_ faster, but you probably won’t notice unless you’re using a large matrix. Another nice aspect of these functions is that they are a bit more descriptive. It's arguably more clear to write `colMeans(x)` in your code than `apply(x, 2, mean)`. --- # Other Ways to Apply You can do more than take sums and means with the `apply()` function. For example, you can compute quantiles of the rows of a matrix using the `quantile()` function. ```r x <- matrix(rnorm(200), 20, 10) ## Get row quantiles apply(x, 1, quantile, probs = c(0.25, 0.75)) #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> 25% -1.1357814 0.3882964 -0.3135551 -0.97686215 -0.6176911 0.3508623 -0.6064200 -1.168140 -0.3047590 -0.1910554 #> 75% 0.3679815 0.9727827 0.6410333 -0.08947212 0.8031256 1.2614098 0.1653583 0.929492 0.3605668 0.9078301 #> [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] #> 25% -0.6115150 -0.7865255 -1.0873096 0.07122323 -0.7648704 -0.7381068 -0.34076767 -0.7444912 -0.8504503 -0.9547083 #> 75% 0.5331629 0.9188060 0.1335317 0.48980088 0.1831285 0.5776408 0.01752554 0.7547098 0.9398709 0.4441905 ``` Notice that I had to pass the `probs = c(0.25, 0.75)` argument to `quantile()` via the `...` argument to `apply()`. --- # Summary * The loop functions in R are very powerful because they allow you to conduct a series of operations on data using a compact form. * The operation of a loop function involves iterating over an R object (e.g. a list or vector or matrix), applying a function to each element of the object, and the collating the results and returning the collated results. * Loop functions make heavy use of anonymous functions, which exist for the life of the loop function but are not stored anywhere. * Other loop functions like `tapply()` and `mapply()` are also very useful (see the references). --- # Lab Session 3 In this lab, you will use the temperature data in four cities: Melbourne, Sydney, Brisbane and Cairns. You can download them from https://yanfei.site/docs/sc/data/temp.zip. 1. Please make a function `load.file()` to read a .csv file and transform the first column (a character representing date and time) using `as.POSIXlt` into R time format. 2. Then apply `load.file()` to each filename using `lapply()`. 3. How many rows of data are there for each city? 4. What is the hottest temperature recorded by city? 5. Estimate the autocorrelation function for each city. --- # References - [Chapter 1 of my book](https://yanfei.site/docs/statscompbook/R.html) in progress. - Chapters 14, 15 and 18 of the book "R programming for data science".