School of Economics and Management
Beihang University
http://yanfei.site
R has a number of ways to indicate to you that something’s not right.
message
: A generic notification/diagnostic message produced by the message()
function; execution of the function continueswarning
: An indication that something is wrong but not necessarily fatal; execution of the function continues. Warnings are generated by the warning()
functionerror
: An indication that a fatal problem has occurred and execution of the function stops. Errors are produced by the stop()
function.condition
: A generic concept for indicating that something unexpected has occurred; programmers can create their own custom conditions if they want.Try log(-1)
.
printmessage <- function(x) { if(x > 0) print("x is greater than zero") else print("x is less than or equal to zero") invisible(x) }
Now try printmessage(1)
and printmessage(NA)
. What happened? How to fix this problem?
printmessage2 <- function(x) { if (is.na(x)) print("x is a missing value!") else if (x > 0) print("x is greater than zero") else print("x is less than or equal to zero") invisible(x) }
Now try printmessage2(NA)
.
Now try the following and see what happened.
x <- log(c(-1, 2)) printmessage2(x)
printmessage2()
function to allow it to take vector arguments.For the first solution, we check the length of the input.
printmessage3 <- function(x) { if (length(x) > 1L) stop("'x' has length > 1") if (is.na(x)) print("x is a missing value!") else if (x > 0) print("x is greater than zero") else print("x is less than or equal to zero") invisible(x) }
Now try printmessage3(1:2)
.
For the second solution, vectorizing the function can be accomplished easily with the Vectorize()
function.
printmessage4 <- Vectorize(printmessage2) out <- printmessage4(c(-1, 2))
The primary task of debugging any R code is correctly diagnosing what the problem is. Some basic questions you need to ask are
R provides a number of tools to help you with debugging your code.
traceback()
: prints out the function call stack after an error occurs; does nothing if there’s no errordebug()
: flags a function for “debug” mode which allows you to step through execution of a function one line at a timebrowser()
: suspends the execution of a function wherever it is called and puts the function in debug modetrace()
: allows you to insert debugging code into a function a specific placesrecover()
: allows you to modify the error behavior so that you can browse the function call stackThese functions are interactive tools specifically designed to allow you to pick through a function. There's also the more blunt technique of inserting print()
or cat()
statements in the function.
traceback()
traceback()
function prints out the function call stack after an error has occurred.a()
which subsequently calls function b()
which calls c()
and then d()
. If an error occurs, it may not be immediately clear in which function the error occurred. The traceback()
function shows you how many levels deep you were when the error occurred.mean(x) traceback()
f <- function(x) { r <- x - g(x) r } g <- function(y) { r <- y * h(y) r } h <- function(z) { r <- log(z) if (r < 10) r^2 else r^3 }
f(-1)
.traceback()
.debug()
function.debug()
debug()
function initiates an interactive debugger (also known as the "browser" in R) for a function.debug()
function takes a function as its first argument. For debugging the f()
function, try debug(f)
.f()
function it will launch the interactive debugger. To turn this behavior off you need to call the undebug()
function.debug()
The debugger calls the browser at the very top level of the function body. From there you can step through each expression in the body. There are a few special commands you can call in the browser:
n
executes the current expression and moves to the next expressionc
continues execution of the function and does not stop until either an error or the function exitsQ
quits the browserTry f(-1)
.
ls()
to see what is in your current environment (the function environment) and print()
to print out the values of R objects in the function environment.undebug()
function.SS <- function(mu, x) { d <- x - mu d2 <- d ^ 2 ss <- sum(d2) ss } debug(SS) SS(1, rnorm(100))
recover()
recover()
function can be used to modify the error behavior of R when an error occurs. Normally, when an error occurs in a function, R will print out an error message, exit out of the function, and return you to your workspace to await further commands.recover()
you can tell R that when an error occurs, it should halt execution at the exact point at which the error occurred. That can give you the opportunity to poke around in the environment in which the error occurred. This can be useful to see if there are any R objects or data that have been corrupted or mistakenly modified.options(error = recover) ## Change default R error behavior f(-1)
recover()
function will first print out the function call stack when an error occurrs. Then, you can choose to jump around the call stack and investigate the problem. When you choose a frame number, you will be put in the browser (just like the interactive debugger triggered with debug()
) and will have the ability to poke around.message
, warning
, error
; only an error
is fataltraceback
, debug
, browser
, trace
, and recover
can be used to find problematic code in functionsThe real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.
—Donald Knuth
Design first, then optimize
Remember: Premature optimization is the root of all evil
Measure (collect data), don’t guess.
If you're going to be scientist, you need to apply the same principles here!
system.time()
The system.time()
function computes the time (in seconds) needed to execute an expression. The function returns:
The elapsed time may be greater than the user time if the CPU spends a lot of time waiting around.
## Elapsed time > user time system.time(readLines("http://www.jhsph.edu"))
## user system elapsed ## 0.042 0.005 4.956
The elapsed time may be smaller than the user time if your machine has multiple cores/processors (and is capable of using them).
library(plyr) library(doMC)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
registerDoMC(cores = detectCores()) system.time(aaply(1:10000, 1, function(x) rnorm(1, mean = x), .parallel = TRUE))
## user system elapsed ## 3.790 0.745 2.518
You can time longer expressions by wrapping them in curly braces within the call to system.time()
.
system.time({ n <- 1000 r <- numeric(n) for (i in 1:n) { x <- rnorm(n) r[i] <- mean(x) } })
## user system elapsed ## 0.105 0.004 0.109
If your expression is getting pretty long (more than 2 or 3 lines), it might be better to either break it into smaller pieces or to use the profiler. The problem is that if the expression is too long, you won't be able to identify which part of the code is causing the bottleneck.
system.time()
allows you to test certain functions or code blocks to see if they are taking excessive amounts of time.system.time()
on it that piece of code.Rprof()
function starts the profiler in R.Rprof()
, we will use the summaryRprof()
function which summarizes the output from Rprof()
.Rprof()
keeps track of the function call stack at regularly sampled intervals and tabulates how much time is spent inside each function.Rprof.out
. You can specify the name of the output file if you don't want to use this default.Rprof()
function, everything that you do from then on will be measured by the profiler.NULL
to Rprof()
.Now let's play around some data with the profiler running.
Rprof() data(diamonds, package = "ggplot2") plot(price ~ carat, data = diamonds) m <- lm(price ~ carat, data = diamonds) abline(m, col = "red") Rprof(NULL)
summaryRprof()
The summaryRprof()
function tabulates the R profiler output and calculates how much time is spent in which function. There are two methods for normalizing the data.
"by.total" divides the time spend in each function by the total run time
"by.self" does the same as "by.total" but first subtracts out time spent in functions above the current function in the call stack.
Now try summaryRprof()
and interprete the results.
profvis
is a tool for helping you to understand how R spends its time.
library(profvis) profvis({ data(diamonds, package = "ggplot2") plot(price ~ carat, data = diamonds) m <- lm(price ~ carat, data = diamonds) abline(m, col = "red") })
Note that Rstudio includes integrated support for profiling with profvis
.
Rprof()
runs the profiler for performance of analysis of R code
summaryRprof()
summarizes the output of Rprof()
and gives percent of time spent in each function (with two types of normalization)
Good to break your code into functions so that the profiler can give useful information about where time is being spent
RStudio includes integrated support for profiling with profvis
C or Fortran code is not profiled
In this lab, write your own code, enjoy the tools of debugging and profiling and write a short report of optimizing your code.
Chapters 20 and 21 of the book "R programming for data science".