Bayesian Statistics and Computing

---

# Bayesian Statistics and Computing

## Lecture 3: Control structures and functions

#### *Yanfei Kang | BSC | Beihang University*

---

---
# Commonly used control structures

- `if` and `else`: testing a condition and acting on it

- `for`: execute a loop a fixed number of times

- `while`: execute a loop _while_ a condition is true

- `repeat`: execute an infinite loop (must `break` out of it to stop)

- `break`: break the execution of a loop

- `next`: skip an interation of a loop

---
# `if`-`else`

The `if`-`else` combination is probably the most commonly used control
structure in R (or perhaps any language). For starters, you can just use the `if` statement.

```r
if(<condition>) {
        ## do something
} 
## Continue with rest of code
```
---
# `if`-`else`

If you have an
action you want to execute when the condition is false, then you need
an `else` clause.
```r
if(<condition>) {
        ## do something
} 
else {
        ## do something else
}
```

---
# `if`-`else`

You can have a series of tests by following the initial `if` with any
number of `else if`s.
```r
if(<condition1>) {
        ## do something
} else if(<condition2>)  {
        ## do something different
} else {
        ## do something different
}
```
---
# `if`-`else` Example

```r
## Generate a uniform random number
x <- runif(1, 0, 10)  
if(x > 3) {
        y <- 10
} else {
        y <- 0
}
```

Or you can write:

```r
y <- if(x > 3) {
        10
} else { 
        0
}
```
]

---
# `for` Loops

.scroll-output[
For loops are most commonly used for
iterating over the elements of an object (list, vector, etc.)

```r
for(i in 1:10) {
        print(i)
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10
```
]

---
# `for` Loops Example

```r
x <- c("a", "b", "c", "d")
for(i in 1:4) {
        ## Print out each element of 'x'
        print(x[i])  
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
```

---
# `for` Loops Example

The `seq_along()` function is commonly used in conjunction with for
loops in order to generate an integer sequence based on the length of
an object (in this case, the object `x`).

```r
## Generate a sequence based on length of 'x'
for(i in seq_along(x)) {   
        print(x[i])
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
```

---
# `for` Loops Example

It is not necessary to use an index-type variable.

```r
for(letter in x) {
        print(letter)
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
```

---
# Nested `for` loops

`for` loops can be nested inside of each other.

```r
x <- matrix(1:6, 2, 3)
for(i in seq_len(nrow(x))) {
        for(j in seq_len(ncol(x))) {
                print(x[i, j])
        }   
}
```

Nested loops are commonly needed for multidimensional or hierarchical
data structures (e.g. matrices, lists).

---
# `while` Loops

While loops begin by testing a condition. If it is true, then they
execute the loop body. Once the loop body is executed, the condition
is tested again, and so forth, until the condition is false, after
which the loop exits.

```r
count <- 0
while(count < 10) {
        print(count)
        count <- count + 1
}
#> [1] 0
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
```

---
# `repeat` Loops

`repeat` initiates an infinite loop right from the start. The only way to exit a `repeat` loop is to
call `break`.

```r
x0 <- 1
tol <- 1e-8
repeat {
        x1 <- computeEstimate()
        
        if(abs(x1 - x0) < tol) {  ## Close enough?
                break
        } else {
                x0 <- x1
        } 
}
```

---
# `break`

`next` is used to skip an iteration of a loop.

```r
for(i in 1:100) {
        if(i <= 20) {
                ## Skip the first 20 iterations
                next                 
        }
        ## Do something here
}
```

---
# `break`

`break` is used to exit a loop immediately, regardless of what
iteration the loop may be on.

```r
for(i in 1:100) {
      print(i)
      if(i > 20) {
              ## Stop loop after 20 iterations
              break  
      }		
}
```

---
# Summary

- Control structures like `if`, `while`, and `for` allow you to
  control the flow of an R program

- Infinite loops should generally be avoided, even if (you believe)
  they are theoretically correct.

- Control structures mentioned here are primarily useful for writing
  programs; for command-line interactive work, the "apply" functions
  are more useful.
 
---
class: inverse, center, middle
# Functions

---
# Functions

- A transition from a mere "user" to a developer!

- Often used to
encapsulate a sequence of expressions that need to be executed numerous times, perhaps under slightly different conditions.

- Often written when code must be shared with others or the public.

- When to Write a Function?

---
# Your First Function

Functions are defined using the `function()` directive and are stored
as R objects just like anything else. In particular, they are R
objects of class "function".

```r
f <- function() {
        cat("Hello, world!\n")
}
f()
#> Hello, world!
```

---
# Function arguments

The last aspect of a basic function is the *function arguments*.

```r
f <- function(num) {
        for(i in seq_len(num)) {
                cat("Hello, world!\n")
        }
}
f(3)
#> Hello, world!
#> Hello, world!
#> Hello, world!
```

---
# Example

```r
f <- function(num) {
        hello <- "Hello, world!\n"
        for(i in seq_len(num)) {
                cat(hello)
        }
        chars <- nchar(hello) * num
        chars
}
meaningoflife <- f(3)
#> Hello, world!
#> Hello, world!
#> Hello, world!
print(meaningoflife)
#> [1] 42
```

---
# Default Values

Try this:

```r
f()
```

We can modify this behavior by setting a *default value* for the argument `num`.

```r
f <- function(num = 1) {
        hello <- "Hello, world!\n"
        for(i in seq_len(num)) {
                cat(hello)
        }
        chars <- nchar(hello) * num
        chars
}
f()    ## Use default value for 'num'
f(2)   ## Use user-specified value
```

---
# Default Values

At this point, we have written a function that

* has one *formal argument* named `num` with a *default value* of 1. The _formal arguments_ are the arguments included in the function definition. The `formals()` function returns a list of all the formal arguments of a function

* prints the message "Hello, world!" to the console a number of times indicated by the argument `num`

* *returns* the number of characters printed to the console

---
# Argument Matching

R assigns the first value to the first argument, the second value to second argument, etc. So in the following call to `rnorm()`

```r
str(rnorm)
#> function (n, mean = 0, sd = 1)
mydata <- rnorm(100, 2, 1)              ## Generate some data
```

100 is assigned to the `n` argument, 2 is assigned to the `mean` argument, and 1 is assigned to the `sd` argument, all by positional matching.

```r
## Positional match first argument, default for 'na.rm'
sd(mydata)                     
#> [1] 0.9129092
## Specify 'x' argument by name, default for 'na.rm'
sd(x = mydata)                 
#> [1] 0.9129092
```

---
# Argument Matching

When specifying the function arguments by name, it doesn't matter in what order you specify them.

```r
## Specify both arguments by name
sd(na.rm = FALSE, x = mydata)     
#> [1] 0.9129092
```

You can mix positional matching with matching by name.

```r
sd(na.rm = FALSE, mydata)
#> [1] 0.9129092
```

Here, the `mydata` object is assigned to the `x` argument, because it's the only argument not yet specified.

---
# Argument Matching

Function arguments can also be _partially_ matched, which is useful for interactive work. The order of operations when given an argument is

1. Check for exact match for a named argument.

2. Check for a partial match.

3. Check for a positional match.

---
# The `...` Argument

- There is a special argument in R known as the `...` argument, which indicate a variable number of arguments that are usually passed on to other functions. 
- The `...` argument is often used when extending another function and you don’t want to copy the entire argument list of the original function
- For example, a custom mean function may want to make use of the default `mean()` function along with its entire argument list. The function below changes the default for the `na.rm` argument to the value `na.rm = "TRUE"` (the original default was `na.rm = "FALSE"`).

```r
mymean <- function(x, na.rm = TRUE, ...) {
        mean(x, na.rm = na.rm, ...)         ## Pass '...' to 'mean?rnorm' function
}
x <- c(1, 2, NA)
mean(x)
mymean(x)
```

---
# Summary

* Functions can be defined using the `function()` directive and are assigned to R objects just like any other R object

* Functions have can be defined with named arguments; these function arguments can have default values

* Functions arguments can be specified by name or by position in the argument list

* Functions always return the last expression evaluated in the function body

* A variable number of arguments can be specified using the special `...` argument in a function definition.

---
class: inverse, center, middle

# Looping Functions

---
# Looping Functions

Writing `for` and `while` loops is useful when programming but not
particularly easy when working interactively on the command
line.  Functions that make your life
easier:

- `lapply()`: Loop over a list and evaluate a function on each element

- `sapply()`: Same as `lapply` but try to simplify the result

- `apply()`: Apply a function over the margins of an array

- `tapply()`: Apply a function over subsets of a vector

- `mapply()`: Multivariate version of `lapply`

---
# `lapply()`

The `lapply()` function does the following simple series of operations:

1. it loops over a list, iterating over each element in that list.
2. it applies a *function* to each element of the list (a function that you specify).
3. and returns a list (the `l` is for "list").

This function takes three arguments:

1. a list `X`.
2. a function (or the name of a function) `FUN`.
3. other arguments via its `...` argument. If `X` is not a list, it will be coerced to a list using `as.list()`.

---
# `lapply()` Example 1

Here's an example of applying the `mean()` function to all elements of a list.

```r
x <- list(a = 1:5, b = rnorm(10))
lapply(x, mean)
#> $a
#> [1] 3
#> 
#> $b
#> [1] -0.2310251
```

---
# `lapply()` Example 2

```r
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(x, mean)
#> $a
#> [1] 2.5
#> 
#> $b
#> [1] -0.6603277
#> 
#> $c
#> [1] 1.375513
#> 
#> $d
#> [1] 5.067981
```

---
# `lapply()` Example 3

```r
x <- 1:4
lapply(x, runif)
#> [[1]]
#> [1] 0.820082
#> 
#> [[2]]
#> [1] 0.5053963 0.2943039
#> 
#> [[3]]
#> [1] 0.8181448 0.7061302 0.9163055
#> 
#> [[4]]
#> [1] 0.4902765 0.2215913 0.1520512 0.4420643
```

---
# `lapply()`

Now how about other arguments?

```r
x <- 1:4
lapply(x, runif, min = 0, max = 10)
#> [[1]]
#> [1] 4.727666
#> 
#> [[2]]
#> [1] 6.335041 4.961210
#> 
#> [[3]]
#> [1] 7.6208627 0.7170543 9.4952564
#> 
#> [[4]]
#> [1] 1.099728 8.152994 8.465917 4.898108
```

---
# `lapply()` Example 4

- Now we create a list that contains two matrices.

```r
x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2)) 
x
```

- How do you extract the first column of each matrix in the list?  
- Now you need an anonymous function for extracting the first column of each matrix.

```r
lapply(x, function(elt) { elt[,1] })
#> $a
#> [1] 1 2
#> 
#> $b
#> [1] 1 2 3
```

---
# `sapply()`

.scroll-output[
- The `sapply()` function behaves similarly to `lapply()`; the only real difference is in the return value.

- `sapply()` will try to simplify the result of `lapply()` if possible.

- Compare `lapply()` and `sapply`.

```r
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(x, mean)
sapply(x, mean) 
#> $a
#> [1] 2.5
#> 
#> $b
#> [1] 0.6553135
#> 
#> $c
#> [1] 0.9797015
#> 
#> $d
#> [1] 4.849482
#> 
#>         a         b         c         d 
#> 2.5000000 0.6553135 0.9797015 4.8494818
```
]

---
# `apply()`

- Used to a evaluate a function over the margins of an array. 
- Often used to apply a function to the rows or columns of a matrix. 
- Using `apply()` is not really faster than writing a loop, but it works in one line and is highly compact.

```r
str(apply)
#> function (X, MARGIN, FUN, ..., simplify = TRUE)
```

The arguments to `apply()` are

- `X` is an array
- `MARGIN` is an integer vector indicating which margins should be “retained”. 
- `FUN` is a function to be applied
- `...` is for other arguments to be passed to `FUN`

---
# `apply()` Example

```r
x <- matrix(rnorm(200), 20, 10)
apply(x, 2, mean)  ## Take the mean of each column
#>  [1] -0.284489060 -0.136846481  0.047266859 -0.120402498  0.239887540 -0.057754509  0.115764520  0.007488874
#>  [9]  0.152149542  0.127937661
apply(x, 1, sum) ## Take the sum of each row
#>  [1]  0.61509317 -2.32550976 -1.40095279 -4.65512633 -2.68393067 -5.38526811 -0.01722650 -3.14950520  2.49976248
#> [10] -2.45571181  3.31807769 -1.07169324  7.77476736  0.20359527  0.05222068  8.35084299 -2.03434083 -0.06800384
#> [19]  2.11837902  2.13457936
```

- Note that in both calls to `apply()`, the return value was a vector of numbers.

- The `MARGIN` argument essentially indicates to `apply()` which dimension of the array you want to preserve or retain.

---
# Col/Row Sums and Means

For the special case of column/row sums and column/row means of matrices, we have some useful shortcuts.

- `rowSums` = `apply(x, 1, sum)`
- `rowMeans` = `apply(x, 1, mean)`
- `colSums` = `apply(x, 2, sum)`
- `colMeans` = `apply(x, 2, mean)`

The shortcut functions are heavily optimized and hence are _much_ faster, but you probably won’t notice unless you’re using a large matrix. Another nice aspect of these functions is that they are a bit more descriptive. It's arguably more clear to write `colMeans(x)` in your code than `apply(x, 2, mean)`.

---
# Other Ways to Apply

You can do more than take sums and means with the `apply()` function. For example, you can compute quantiles of the rows of a matrix using the `quantile()` function.

```r
x <- matrix(rnorm(200), 20, 10)
## Get row quantiles
apply(x, 1, quantile, probs = c(0.25, 0.75))    
#>           [,1]      [,2]       [,3]        [,4]       [,5]      [,6]       [,7]      [,8]       [,9]      [,10]
#> 25% -1.1357814 0.3882964 -0.3135551 -0.97686215 -0.6176911 0.3508623 -0.6064200 -1.168140 -0.3047590 -0.1910554
#> 75%  0.3679815 0.9727827  0.6410333 -0.08947212  0.8031256 1.2614098  0.1653583  0.929492  0.3605668  0.9078301
#>          [,11]      [,12]      [,13]      [,14]      [,15]      [,16]       [,17]      [,18]      [,19]      [,20]
#> 25% -0.6115150 -0.7865255 -1.0873096 0.07122323 -0.7648704 -0.7381068 -0.34076767 -0.7444912 -0.8504503 -0.9547083
#> 75%  0.5331629  0.9188060  0.1335317 0.48980088  0.1831285  0.5776408  0.01752554  0.7547098  0.9398709  0.4441905
```

Notice that I had to pass the `probs = c(0.25, 0.75)` argument to `quantile()` via the `...` argument to `apply()`.

---
# Summary

* The loop functions in R are very powerful because they allow you to conduct a series of operations on data using a compact form.

* The operation of a loop function involves iterating over an R object (e.g. a list or vector or matrix), applying a function to each element of the object, and the collating the results and returning the collated results.

* Loop functions make heavy use of anonymous functions, which exist for the life of the loop function but are not stored anywhere.

* Other loop functions like `tapply()` and `mapply()` are also very useful (see the references).

---
# Lab Session 3

In this lab, you will use the temperature data in four cities: Melbourne, Sydney, Brisbane and Cairns. You can download them from https://yanfei.site/docs/sc/data/temp.zip.

1. Please make a function `load.file()` to read a .csv file and transform the first column (a character representing date and time) using `as.POSIXlt` into R time format.
2. Then apply `load.file()` to each filename using `lapply()`.
3. How many rows of data are there for each city?
4. What is the hottest temperature recorded by city?
5. Estimate the autocorrelation function for each city.

---
# References

- [Chapter 1 of my book](https://yanfei.site/docs/statscompbook/R.html) in progress.
- Chapters 14, 15 and 18 of the book "R programming for data science".