class: left, bottom, inverse, title-slide # Bayesian Statistics and Computing ## Lecture 4: Data Visualization in R ### Yanfei Kang ### 2020/02/01 (updated: 2020-02-25) --- class: inverse, center, middle # > "The simple graph has brought more information to the data analyst’s mind than any other device." > — John Tukey --- # Objectives 1. basic graphics with **R** 2. elegant graphics with **ggplot2** --- class: inverse, center, middle # Basic graphics with R --- # `mpg` dataframe `mpg` contains observations collected by the US Environment Protection Agency on 38 models of car. You can see more details via `?mpg`. Among the variables in `mpg` are: 1. `displ`, a car’s engine size, in litres. 2. `hwy`, a car's fuel efficiency on the highway, in miles per gallon (mpg). A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance. 3. ... --- # Basic plots Practice and look at the help document of `plot`. .pull-left[ ```r library(ggplot2) plot(mpg$displ, mpg$hwy) abline(lm(hwy ~ displ)) title("Regression of MPG on engine size") ``` ] .pull-right[ <!-- --> ] --- # Histograms and density plots .pull-left[ ##### Histograms ```r hist(mpg$hwy) ``` <!-- --> ] .pull-right[ ##### Density plots ```r d <- density(mpg$hwy) ## returns the density data plot(d) ``` <!-- --> ] --- # Piechart .pull-left[ ```r car.table <- table(mpg$manufacturer) pie(car.table) ``` ] .pull-right[ <!-- --> ] --- # Boxplots ```r ## Boxplot of MPG boxplot(mpg$hwy, main = "Boxplot of MPG") ``` <!-- --> ```r ## Boxplot of MPG by Car Cylinders boxplot(hwy ~ cyl, data = mpg, main = "Car Milage Data", xlab = "Number of Cylinders", ylab = "Miles Per Gallon") ``` <!-- --> --- # Correlation plot .pull-left[ ```r library(corrplot) M <- cor(mtcars) corrplot(M, addCoef.col = "grey") ``` ] .pull-right[ <!-- --> ] --- # Time series ```r plot(AirPassengers) ``` <!-- --> --- # Multiple plots ```r par(mfrow = c(1, 2)) plot(AirPassengers) boxplot(ggplot2::mpg$hwy, data = ggplot2::mpg, main = "Boxplot of MPG") ``` <!-- --> --- class: inverse, center, middle # Elegant graphics with ggplot2 --- # Creating a ggplot ```r library(ggplot2) ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ``` <!-- --> --- # Creating a ggplot ``` ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) ``` 1. With ggplot2, you begin a plot with the function `ggplot()`. `ggplot()` creates a coordinate system that you can add layers to. 2. You complete your graph by adding one or more layers to `ggplot()`. For example, `geom_point()` adds a layer of points to your plot, which creates a scatterplot. You can specify the color, size and shape of these points. Each geom function in ggplot2 takes a `mapping` argument. --- # Change the color, size and shape of points .pull-left[ ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = 2, size = 3, shape = 18) ``` ] .pull-right[ <!-- --> ] --- # Facets .pull-left[ - Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display one subset of the data. - To facet your plot by a single variable, use `facet_wrap()`. The first argument should be a formula, which you create with ~ followed by a variable name. ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_wrap(~class, nrow = 2) ``` ] .pull-right[ <!-- --> ] --- # Facets .pull-left[ To facet your plot on the combination of two variables, add facet_grid() to your plot call. ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(drv ~ cyl) ``` ] .pull-right[ <!-- --> ] --- # Pairwise plots .pull-left[ ```r library(GGally) ggpairs(subset(mtcars, select = c(1, 3, 4, 5, 6))) ``` ] .pull-right[ <!-- --> ] --- # Geometric objects ```r ## left p1 <- ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ## right p2 <- ggplot(data = mpg) + geom_smooth(mapping = aes(x = displ, y = hwy)) ## put the two plots in one row gridExtra::grid.arrange(p1, p2, ncol = 2) ``` --- # Geometric objects <!-- --> --- # Geometric objects - A **geom** is the geometrical object that a plot uses to represent data. - Bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. - You can also add multiple geom functions. ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + geom_smooth(mapping = aes(x = displ, y = hwy)) ``` --- # Bar chart .pull-left[ The following chart displays the total number of cars in the `mpg` dataset, grouped by `drv`. ```r ggplot(data = mpg) + geom_bar(mapping = aes(x = drv, fill = drv)) ``` ] .pull-right[ <!-- --> ] --- # Histogram .pull-left[ ```r ggplot(data = mpg) + geom_histogram(mapping = aes(x = hwy)) ``` ] .pull-right[ <!-- --> ] --- # Boxplots .pull-left[ ```r ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot() ``` ] .pull-right[ <!-- --> ] --- # Time series ```r library(ggplot2) library(forecast) autoplot(AirPassengers) ``` <!-- --> --- # Summary - We have seen basic **R** plots and plots from **ggplot2**. - In this course, you will also use animated plots (e.g., gifs), interactive plots, etc. --- # References [R documentation of **ggplot2**.](https://www.rdocumentation.org/packages/ggplot2/versions/3.2.1)