Yanfei Kang

yanfeikang@buaa.edu.cn

School of Economics and Management

Beihang University

- How to install and use
**R** **R**basics**R**coding style- Some nice
**R**tips

`install.packages(c("forecast", "sos", "formatR"))`

- Standard R comes with some standard packages installed for basic data management, analysis, and graphical tools.
- More than 10,000 packages available on CRAN! See http://cran.r-project.org.
`install.packages('forecast')`

to install an package called ‘forecast’.`library(forecast)`

before using the package.

How to install and use **R**

- Totally free!
- Download R on its official website.
- A new major version of R comes out once a year, and there are 2-3 minor releases each year.

- Direct
**R** - Rstudio
- One of the most popular ways to run R.
- Free, open-source integrated development environment (IDE) for R.
- Many additional fantastic features.
- Updated a couple of times a year.

- Command line in Linux and Unix.

- What editor do you usually use?
- Use a good text editor such as vim, sublime text, text wrangler, notepad, etc
- With syntax highlighting, otherwise, it’s hard to detect errors
- Or use an Integrated Development Environment (IDE) like RStudio

- Syntax highlighting
- Able to evaluate R code
- by line
- by selection
- entire file

- Command auto-completion

**R** basics

```
# simple maths
1 + 2 + 3
1 + 2 * 3
# assign a value to a variable
x <- 1
y <- 2
z <- c(x,y)
z
# function examples
exp(1)
cos(3.141593)
log2(1)
```

- Numerical vectors
- Logical vectors
- Character vectors
- Length of a vector
- Vector calculations
- Extract some elements of a vector

```
# vectors
c(0, 1, 1, 2, 3, 5, 8)
1:10
seq(1, 9, 2)
rep(1, 10)
length(rep(1, 10))
# character vectors
c("Hello world", "Hello R interpreter")
# vector calculation
c(1, 2, 3, 4) + c(10, 20, 30, 40)
c(1, 2, 3, 4) + 1
# you can refer to elements by location in a vector
b <- c(1,2,3,4,5,6,7,8,9,10,11,12)
length(b)
b
b[7]
b[1:6]
b[c(1,6,11)]
b > 5
b[b > 5]
```

- Create a matrix:
`matrix()`

- Dimension of a matrix:
`dim()`

- Transpose of a matrix:
`t()`

- Extract elements from a matrix.
- Combine two or more matrices:
`rbind()`

,`cbind()`

```
# create a matrix
m <- matrix(c(1:6), 2, 3)
n <- matrix(c(8:13), 2, 3)
dim(m)
t(m)
m[1, 2]
m[1, ]
cbind(m, n)
rbind(m, n)
```

- Special data structure that matrix could not handle.
- Data length are not the same.
- Data type are not the same.

- Create a list:
`list()`

- Extract elements of a list:
`[[]]`

or`$`

`l <- list(a = c(1, 2), b = 'apple')`

`data.frame()`

: tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R’s modeling software.- In most cases, the operation with a data frame is similar to matrix operation.

```
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
d <- data.frame(x = 1, y = 1:10, fac = fac)
```

- Create a function

```
f <- function(x, y) {
z <- c(x + 1, y + 1)
return(z)
}
f(1, 2)
```

- Load the function:
`source()`

- Execute your function
- When should you write a function?

Syntax

`if (condition){ do something } else { do something }`

Example

```
x <- 0
if (x > 1) {
print('x is larger than 1')
} else {
print('x is not larger than 1')
}
```

- Example

```
x <- 1:10
for(i in x) {
print(i^2)
}
```

- Write a function
`MySummary()`

where the input argument is x can be any vector and the output is a list that contains the basic summary (mean, variance, length, max and minimum values) of the vector you have supplied to the function. - Test your function with some vectors (that you make up by yourself).

```
MySummary <- function(x){
x.mean <- mean(x)
x.var <- var(x)
x.length <- length(x)
x.max <- max(x)
x.min <- min(x)
x.summary <- list(mean = x.mean,
var = x.var,
length = x.length,
max = x.max,
min = x.min)
return(x.summary)
}
# some tests
MySummary(c(1:10))
MySummary(seq(1, 100, 5))
```

```
MySummary <- function(x){
# firstly create a list
x.summary <- list()
# assigning values
x.summary$mean <- mean(x)
x.summary$var <- var(x)
x.summary$length <- length(x)
x.summary$max <- max(x)
x.summary$min <- min(x)
return(x.summary)
}
# some tests
MySummary(c(1:10))
MySummary(seq(1, 100, 5))
```

**R** coding style

- File names should end in .R and, of course, be
*meaningful*. - GOOD:
`predict_ad_revenue.R`

- BAD:
`foo.R`

- The preferred form for variable names is all lower case letters and words separated with dots (variable.name), but variableName is also accepted. Generally, variable names should be nouns.
- GOOD: avg.clicks
- OK: avgClicks
- BAD: avg_Clicks

- Function names have initial capital letters and no dots. Function names are mostly verbs.
- GOOD: CalculateAvgClicks
- BAD: calculate_avg_clicks , calculateAvgClicks

- Choose a consistent naming style

- Don’t use underscores (_) or hyphens (-).
- Avoid using names of existing functions and variables like
`mean`

,`median`

etc. - Avoid using meaningless names like a, b, c, …, aa, bb, cc, …

- around operators (=, +, -, <-, etc)
- put a space after a comma, and never before

```
x <- c(1:10)
x.average<-mean(x,na.rm=TRUE)
```

\(\Rightarrow\)

`x.average <- mean(x, na.rm = TRUE)`

- split long lines at meaningful places

Don’t be afraid of splitting one long line into individual pieces!

`n <- matrix(sample(1:100, 9), nrow = 3, ncol = 3, byrow = TRUE)`

\(\Rightarrow\)

```
n <- matrix(sample(1:100, 9),
nrow = 3,
ncol = 3,
byrow = TRUE)
```

- An opening curly brace should never go on its own line and should always be followed by a new line.
- A closing curly brace should always go on its own line, unless it’s followed by else.
- Always begin the body of a block on a new line.
- Always indent the code inside curly braces.

`if (y < 0) {print("y is negative")}`

\(\Rightarrow\)

```
if (y < 0) {
print("y is negative")
}
```

- Use two spaces
- Can help in detecting errors in your code because it can expose lack of symmetry
- Reindenting using RStudio

```
if (y < 0) {
print("y is negative")
}
```

\(\Rightarrow\)

```
if (y < 0) {
print("y is negative")
}
```

- Reformat and reindent in Rstudio.
**formatR**package in**R**. You can even make a folder of`.R`

files tidy using`tidy_dir()`

.

- Add a Header for your file
- Add lots of comments
- Use blank lines to separate blocks of code and comments to say what the block does. Remember that in a few months, you may not follow your own code any better than a stranger.

```
x <- c(1:10)
x.mean = mean(x)
x.var = var(x)
```

\(\Rightarrow\)

```
# =============================================
# Title
# Author: Yanfei Kang
# Date: Mar 23, 2017
# Description: your purpose
# =============================================
x <- c(1:10)
# getting the mean of x
x.mean = mean(x)
# getting the variance of x
x.var = var(x)
```

- Functions should contain a comments section immediately below the function definition line.
- These comments should include
- a one-sentence description of the function
- a list of the function’s arguments, denoted by Args:, with a description of each (including the data type)
- a description of the return value, denoted by Returns:.
- The comments should be descriptive enough that a caller can use the function without reading any of the function’s code.

```
CalculateSampleCovariance <- function(x, y, verbose = TRUE) {
# Computes the sample covariance between two vectors.
#
# Args:
# x: One of two vectors whose sample covariance is to be calculated.
# y: The other vector. x and y must have the same length, greater than one,
# with no missing values.
# verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
#
# Returns:
# The sample covariance between x and y.
n <- length(x)
# Error handling
if (n <= 1 || n != length(y)) {
stop("Arguments x and y have different lengths: ",
length(x), " and ", length(y), ".")
}
if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
stop(" Arguments x and y must not have missing values.")
}
covariance <- var(x, y)
if (verbose)
cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
return(covariance)
}
```

Some nice **R** tips

- Functions in installed packages

```
library(forecast)
help.search("auto.arima")
??auto.arima
```

- Functions in other CRAN packages

```
library(sos)
findFn("arima")
RSiteSearch("arima")
```

- Type
`?sort`

for the usage of the function`sort()`

. - Typing the name of a function gives its definition.
- Type
`forecast:::estmodel`

for hidden functions. - Download the tar.gz file from CRAN if you want to see any underlying
**C**or**Fortran**code.

- Every paper, book or scientific report is a ‘project’.
- Every project has its own folder and R workspace.
- Every project is entirely scripted. That is, all analysis, graphs and tables must be able to be generated by running one script.
- This script sources all other R files in the correct order and yields all the required results. This script could be in
`main.R`

or`main.Rmd`

. `functions.R`

contains all non-packaged functions used in the project.- each function can not be too long.

- This script sources all other R files in the correct order and yields all the required results. This script could be in

- For programming questions: StackOverflow.com
- For statistical questions: CrossValidated.com

- RStudio blog: blog.rstudio.org
- R-bloggers: www.r-bloggers.com
- It takes time to develop your own style. Once it is developed, it is really hard to be changed. So please be careful at the beginning.

Write a function to solve the roots of given quadratic equation \(ax^2 + bx + c = 0\) with \(a\), \(b\) and \(c\) as input arguments.

Test your function on some simple equations.

Keep in mind the styles we have learnt.

```
FindRoots <- function(a = 1, b = 1, c = 0) {
# Computes the roots of a quadratic equation.
#
# Args:
# a: The coeffienct of x^2. Default is 1.
# b: The coeffienct of x. Default is 1.
# c: The constant. Default is 0.
#
# Returns:
# Roots.
if (b ^ 2 - 4 * a * c < 0) {
return("No real roots")
} else {
root1 <- (-b + sqrt(b ^ 2 - 4 * a * c)) / (2 * a)
root2 <- (-b - sqrt(b ^ 2 - 4 * a * c)) / (2 * a)
roots <- c(root1, root2)
return(roots)
}
}
# some tests
FindRoots() # using default values of arguments
FindRoots(1, 2, 0)
FindRoots(1, 2, 3)
```

- Write two messy
`.R`

files and put them in a folder. - Use
`tidy_dir()`

to make them tidy.