Bayesian Statistics and Computing

---

# Bayesian Statistics and Computing

## Lecture 3: Control structures and functions

#### *Yanfei Kang | BSC | 2021 Spring*

---

---
# Commonly used control structures

- `if` and `else`: testing a condition and acting on it

- `for`: execute a loop a fixed number of times

- `while`: execute a loop _while_ a condition is true

- `repeat`: execute an infinite loop (must `break` out of it to stop)

- `break`: break the execution of a loop

- `next`: skip an interation of a loop

---
# `if`-`else`

The `if`-`else` combination is probably the most commonly used control
structure in R (or perhaps any language). For starters, you can just use the `if` statement.

```r
if(<condition>) {
        ## do something
} 
## Continue with rest of code
```
---
# `if`-`else`

If you have an
action you want to execute when the condition is false, then you need
an `else` clause.
```r
if(<condition>) {
        ## do something
} 
else {
        ## do something else
}
```

---
# `if`-`else`

You can have a series of tests by following the initial `if` with any
number of `else if`s.
```r
if(<condition1>) {
        ## do something
} else if(<condition2>)  {
        ## do something different
} else {
        ## do something different
}
```
---
# `if`-`else` Example

```r
## Generate a uniform random number
x <- runif(1, 0, 10)
if (x > 3) {
    y <- 10
} else {
    y <- 0
}
```

Or you can write:

```r
y <- if (x > 3) {
    10
} else {
    0
}
```
]

---
# `for` Loops

.scroll-output[
For loops are most commonly used for
iterating over the elements of an object (list, vector, etc.)

```r
for (i in 1:10) {
    print(i)
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10
```
]

---
# `for` Loops Example

```r
x <- c("a", "b", "c", "d")
for (i in 1:4) {
    ## Print out each element of 'x'
    print(x[i])
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
```

---
# `for` Loops Example

The `seq_along()` function is commonly used in conjunction with for
loops in order to generate an integer sequence based on the length of
an object (in this case, the object `x`).

```r
## Generate a sequence based on length of 'x'
for (i in seq_along(x)) {
    print(x[i])
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
```

---
# `for` Loops Example

It is not necessary to use an index-type variable.

```r
for (letter in x) {
    print(letter)
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
#> [1] "d"
```

---
# Nested `for` loops

`for` loops can be nested inside of each other.

```r
x <- matrix(1:6, 2, 3)
for(i in seq_len(nrow(x))) {
        for(j in seq_len(ncol(x))) {
                print(x[i, j])
        }   
}
```

Nested loops are commonly needed for multidimensional or hierarchical
data structures (e.g. matrices, lists).

---
# `while` Loops

While loops begin by testing a condition. If it is true, then they
execute the loop body. Once the loop body is executed, the condition
is tested again, and so forth, until the condition is false, after
which the loop exits.

```r
count <- 0
while (count < 10) {
    print(count)
    count <- count + 1
}
#> [1] 0
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
```

---
# `repeat` Loops

`repeat` initiates an infinite loop right from the start. The only way to exit a `repeat` loop is to
call `break`.

```r
x0 <- 1
tol <- 1e-08
repeat {
    x1 <- computeEstimate()
    
    if (abs(x1 - x0) < tol) {
        ## Close enough?
        break
    } else {
        x0 <- x1
    }
}
```

---
# `break`

`next` is used to skip an iteration of a loop.

```r
for (i in 1:100) {
    if (i <= 20) {
        ## Skip the first 20 iterations
        next
    }
    ## Do something here
}
```

---
# `break`

`break` is used to exit a loop immediately, regardless of what
iteration the loop may be on.

```r
for (i in 1:100) {
    print(i)
    if (i > 20) {
        ## Stop loop after 20 iterations
        break
    }
}
```

---
# Summary

- Control structures like `if`, `while`, and `for` allow you to
  control the flow of an R program

- Infinite loops should generally be avoided, even if (you believe)
  they are theoretically correct.

- Control structures mentioned here are primarily useful for writing
  programs; for command-line interactive work, the "apply" functions
  are more useful.
 
---
class: inverse, center, middle
# Functions

---
# Functions

- A transition from a mere "user" to a developer!

- Often used to
encapsulate a sequence of expressions that need to be executed numerous times, perhaps under slightly different conditions.

- Often written when code must be shared with others or the public.

- When to Write a Function?

---
# Your First Function

Functions are defined using the `function()` directive and are stored
as R objects just like anything else. In particular, they are R
objects of class "function".

```r
f <- function() {
    cat("Hello, world!\n")
}
f()
#> Hello, world!
```

---
# Function arguments

The last aspect of a basic function is the *function arguments*.

```r
f <- function(num) {
    for (i in seq_len(num)) {
        cat("Hello, world!\n")
    }
}
f(3)
#> Hello, world!
#> Hello, world!
#> Hello, world!
```

---
# Example

```r
f <- function(num) {
    hello <- "Hello, world!\n"
    for (i in seq_len(num)) {
        cat(hello)
    }
    chars <- nchar(hello) * num
    chars
}
meaningoflife <- f(3)
#> Hello, world!
#> Hello, world!
#> Hello, world!
print(meaningoflife)
#> [1] 42
```

---
# Default Values

Try this:

```r
f()
```

We can modify this behavior by setting a *default value* for the argument `num`.

```r
f <- function(num = 1) {
    hello <- "Hello, world!\n"
    for (i in seq_len(num)) {
        cat(hello)
    }
    chars <- nchar(hello) * num
    chars
}
f()  ## Use default value for 'num'
f(2)  ## Use user-specified value
```

---
# Default Values

At this point, we have written a function that

* has one *formal argument* named `num` with a *default value* of 1. The _formal arguments_ are the arguments included in the function definition. The `formals()` function returns a list of all the formal arguments of a function

* prints the message "Hello, world!" to the console a number of times indicated by the argument `num`

* *returns* the number of characters printed to the console

---
# Argument Matching

R assigns the first value to the first argument, the second value to second argument, etc. So in the following call to `rnorm()`

```r
str(rnorm)
#> function (n, mean = 0, sd = 1)
mydata <- rnorm(100, 2, 1)  ## Generate some data
```

100 is assigned to the `n` argument, 2 is assigned to the `mean` argument, and 1 is assigned to the `sd` argument, all by positional matching.

```r
## Positional match first argument, default for 'na.rm'
sd(mydata)
#> [1] 1.036166
## Specify 'x' argument by name, default for 'na.rm'
sd(x = mydata)
#> [1] 1.036166
```

---
# Argument Matching

When specifying the function arguments by name, it doesn't matter in what order you specify them.

```r
## Specify both arguments by name
sd(na.rm = FALSE, x = mydata)
#> [1] 1.036166
```

You can mix positional matching with matching by name.

```r
sd(na.rm = FALSE, mydata)
#> [1] 1.036166
```

Here, the `mydata` object is assigned to the `x` argument, because it's the only argument not yet specified.

---
# Argument Matching

Function arguments can also be _partially_ matched, which is useful for interactive work. The order of operations when given an argument is

1. Check for exact match for a named argument.

2. Check for a partial match.

3. Check for a positional match.

---
# The `...` Argument

- There is a special argument in R known as the `...` argument, which indicate a variable number of arguments that are usually passed on to other functions. 
- The `...` argument is often used when extending another function and you don’t want to copy the entire argument list of the original function
- For example, a custom mean function may want to make use of the default `mean()` function along with its entire argument list. The function below changes the default for the `na.rm` argument to the value `na.rm = "TRUE"` (the original default was `na.rm = "FALSE"`).

```r
mymean <- function(x, na.rm = TRUE, ...) {
        mean(x, na.rm = na.rm, ...)         ## Pass '...' to 'mean?rnorm' function
}
x <- c(1, 2, NA)
mean(x)
mymean(x)
```

---
# Summary

* Functions can be defined using the `function()` directive and are assigned to R objects just like any other R object

* Functions have can be defined with named arguments; these function arguments can have default values

* Functions arguments can be specified by name or by position in the argument list

* Functions always return the last expression evaluated in the function body

* A variable number of arguments can be specified using the special `...` argument in a function definition.

---
class: inverse, center, middle

# Looping Functions

---
# Looping Functions

Writing `for` and `while` loops is useful when programming but not
particularly easy when working interactively on the command
line.  Functions that make your life
easier:

- `lapply()`: Loop over a list and evaluate a function on each element

- `sapply()`: Same as `lapply` but try to simplify the result

- `apply()`: Apply a function over the margins of an array

- `tapply()`: Apply a function over subsets of a vector

- `mapply()`: Multivariate version of `lapply`

---
# `lapply()`

The `lapply()` function does the following simple series of operations:

1. it loops over a list, iterating over each element in that list.
2. it applies a *function* to each element of the list (a function that you specify).
3. and returns a list (the `l` is for "list").

This function takes three arguments:

1. a list `X`.
2. a function (or the name of a function) `FUN`.
3. other arguments via its `...` argument. If `X` is not a list, it will be coerced to a list using `as.list()`.

---
# `lapply()` Example 1

Here's an example of applying the `mean()` function to all elements of a list.

```r
x <- list(a = 1:5, b = rnorm(10))
lapply(x, mean)
#> $a
#> [1] 3
#> 
#> $b
#> [1] -0.1000435
```

---
# `lapply()` Example 2

```r
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(x, mean)
#> $a
#> [1] 2.5
#> 
#> $b
#> [1] -0.5675725
#> 
#> $c
#> [1] 1.416292
#> 
#> $d
#> [1] 4.90232
```

---
# `lapply()` Example 3

```r
x <- 1:4
lapply(x, runif)
#> [[1]]
#> [1] 0.6485655
#> 
#> [[2]]
#> [1] 0.8712539 0.9314336
#> 
#> [[3]]
#> [1] 0.4651773 0.2021731 0.4976222
#> 
#> [[4]]
#> [1] 0.06183679 0.10366678 0.01746363 0.07295415
```

---
# `lapply()`

Now how about other arguments?

```r
x <- 1:4
lapply(x, runif, min = 0, max = 10)
#> [[1]]
#> [1] 8.221799
#> 
#> [[2]]
#> [1] 4.7108634 0.4580967
#> 
#> [[3]]
#> [1] 5.998154 1.579201 5.045083
#> 
#> [[4]]
#> [1] 0.8484653 8.7082456 5.4318032 7.3086981
```

---
# `lapply()` Example 4

- Now we create a list that contains two matrices.

```r
x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
x
```

- How do you extract the first column of each matrix in the list?  
- Now you need an anonymous function for extracting the first column of each matrix.

```r
lapply(x, function(elt) {
    elt[, 1]
})
#> $a
#> [1] 1 2
#> 
#> $b
#> [1] 1 2 3
```

---
# `sapply()`

.scroll-output[
- The `sapply()` function behaves similarly to `lapply()`; the only real difference is in the return value.

- `sapply()` will try to simplify the result of `lapply()` if possible.

- Compare `lapply()` and `sapply`.

```r
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(x, mean)
sapply(x, mean)
#> $a
#> [1] 2.5
#> 
#> $b
#> [1] -0.2040602
#> 
#> $c
#> [1] 0.7473551
#> 
#> $d
#> [1] 4.998614
#> 
#>          a          b          c          d 
#>  2.5000000 -0.2040602  0.7473551  4.9986140
```
]

---
# `apply()`

- Used to a evaluate a function over the margins of an array. 
- Often used to apply a function to the rows or columns of a matrix. 
- Using `apply()` is not really faster than writing a loop, but it works in one line and is highly compact.

```r
str(apply)
#> function (X, MARGIN, FUN, ...)
```

The arguments to `apply()` are

- `X` is an array
- `MARGIN` is an integer vector indicating which margins should be “retained”. 
- `FUN` is a function to be applied
- `...` is for other arguments to be passed to `FUN`

---
# `apply()` Example

```r
x <- matrix(rnorm(200), 20, 10)
apply(x, 2, mean)  ## Take the mean of each column
#>  [1] -0.286316012  0.115644854 -0.001762943  0.251713612 -0.096297975 -0.088209138 -0.077962401 -0.367332965
#>  [9] -0.141669153 -0.024868715
apply(x, 1, sum)  ## Take the sum of each row
#>  [1] -1.55065950 -1.40328132 -3.59529339 -0.04404654 -5.87146664  2.52399690 -7.93115986 -0.71577629  0.32184832
#> [10] -2.53119217  2.14521484  1.01312004 -2.71837139 -6.85548022 -0.88882545 -0.98437231  5.56520753  6.44686949
#> [19]  2.00105067  0.73140056
```

- Note that in both calls to `apply()`, the return value was a vector of numbers.

- The `MARGIN` argument essentially indicates to `apply()` which dimension of the array you want to preserve or retain.

---
# Col/Row Sums and Means

For the special case of column/row sums and column/row means of matrices, we have some useful shortcuts.

- `rowSums` = `apply(x, 1, sum)`
- `rowMeans` = `apply(x, 1, mean)`
- `colSums` = `apply(x, 2, sum)`
- `colMeans` = `apply(x, 2, mean)`

The shortcut functions are heavily optimized and hence are _much_ faster, but you probably won’t notice unless you’re using a large matrix. Another nice aspect of these functions is that they are a bit more descriptive. It's arguably more clear to write `colMeans(x)` in your code than `apply(x, 2, mean)`.

---
# Other Ways to Apply

You can do more than take sums and means with the `apply()` function. For example, you can compute quantiles of the rows of a matrix using the `quantile()` function.

```r
x <- matrix(rnorm(200), 20, 10)
## Get row quantiles
apply(x, 1, quantile, probs = c(0.25, 0.75))
#>           [,1]        [,2]       [,3]       [,4]       [,5]       [,6]      [,7]       [,8]       [,9]      [,10]
#> 25% 0.01130106 -0.01995317 -0.2378423 -0.8439313 -0.8822925 -1.1280650 -0.295448 -0.1768166 -0.7035448 -0.6820594
#> 75% 0.62482585  0.98427844  0.8321674  0.9929153  0.4540076  0.4197237  1.214313  0.7833231  0.1471374  0.5762199
#>          [,11]      [,12]      [,13]      [,14]      [,15]      [,16]       [,17]      [,18]     [,19]       [,20]
#> 25% -0.4848463 -0.3366767 -0.8381989 -0.2835923 -0.7302496 -1.3950466 -1.05817115 -1.2686266 0.4284382 -0.54293782
#> 75%  0.4395252  0.3859112  0.5867467  1.2757423  0.8432561  0.5818128  0.09094471  0.1413441 1.1834047  0.08463423
```

Notice that I had to pass the `probs = c(0.25, 0.75)` argument to `quantile()` via the `...` argument to `apply()`.

---
# Summary

* The loop functions in R are very powerful because they allow you to conduct a series of operations on data using a compact form.

* The operation of a loop function involves iterating over an R object (e.g. a list or vector or matrix), applying a function to each element of the object, and the collating the results and returning the collated results.

* Loop functions make heavy use of anonymous functions, which exist for the life of the loop function but are not stored anywhere.

* Other loop functions like `tapply()` and `mapply()` are also very useful (see the references).

---
# Lab Session 3

In this lab, you will use the temperature data in four cities: Melbourne, Sydney, Brisbane and Cairns. You can download them from https://yanfei.site/docs/sc/data/temp.zip.

1. Please make a function `load.file()` to read a .csv file and transform the first column (a character representing date and time) using `as.POSIXlt` into R time format.
2. Then apply `load.file()` to each filename using `lapply()`.
3. How many rows of data are there for each city?
4. What is the hottest temperature recorded by city?
5. Estimate the autocorrelation function for each city.

---
# References

- [Chapter 1 of my book](https://yanfei.site/docs/statscompbook/R.html) in progress.
- Chapters 14, 15 and 18 of the book "R programming for data science".