layout: true --- class: inverse, center, middle background-image: url(../figs/titlepage16-9.png) background-size: cover <br> <br> # Bayesian Statistics and Computing ## Lecture 4: Data Visualization in R <img src="../figs/slides.png" width="150px"/> #### *Yanfei Kang | BSC | 2021 Spring* --- class: inverse, center, middle # > "The simple graph has brought more information to the data analyst’s mind than any other device." > — John Tukey --- # Objectives 1. basic graphics with **R** 2. elegant graphics with **ggplot2** --- class: inverse, center, middle # Basic graphics with R --- # `mpg` dataframe `mpg` contains observations collected by the US Environment Protection Agency on 38 models of car. You can see more details via `?mpg`. Among the variables in `mpg` are: 1. `displ`, a car’s engine size, in litres. 2. `hwy`, a car's fuel efficiency on the highway, in miles per gallon (mpg). A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance. 3. ... --- # Basic plots Practice and look at the help document of `plot`. .pull-left[ ```r library(ggplot2) plot(mpg$displ, mpg$hwy) abline(lm(hwy ~ displ, data = mpg)) title("Regression of MPG on engine size") ``` ] .pull-right[ <img src="figure/unnamed-chunk-1-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Histograms and density plots .pull-left[ ##### Histograms ```r hist(mpg$hwy) ``` <img src="figure/unnamed-chunk-2-1.png" width="504" style="display: block; margin: auto;" /> ] .pull-right[ ##### Density plots ```r d <- density(mpg$hwy) ## returns the density data plot(d) ``` <img src="figure/unnamed-chunk-3-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Piechart .pull-left[ ```r car.table <- table(mpg$manufacturer) pie(car.table) ``` ] .pull-right[ <img src="figure/unnamed-chunk-4-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Boxplots ```r ## Boxplot of MPG boxplot(mpg$hwy, main = "Boxplot of MPG") ## Boxplot of MPG by Car Cylinders boxplot(hwy ~ cyl, data = mpg, main = "Car Milage Data", xlab = "Number of Cylinders", ylab = "Miles Per Gallon") ``` <img src="figure/unnamed-chunk-5-1.png" width="504" style="display: block; margin: auto;" /><img src="figure/unnamed-chunk-5-2.png" width="504" style="display: block; margin: auto;" /> --- # Correlation plot .pull-left[ ```r library(corrplot) M <- cor(mtcars) corrplot(M, addCoef.col = "grey") ``` ] .pull-right[ <img src="figure/unnamed-chunk-6-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Time series ```r plot(AirPassengers) ``` <img src="figure/unnamed-chunk-7-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple plots ```r par(mfrow = c(1, 2)) plot(AirPassengers) boxplot(ggplot2::mpg$hwy, data = ggplot2::mpg, main = "Boxplot of MPG") ``` <img src="figure/unnamed-chunk-8-1.png" width="720" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Elegant graphics with ggplot2 --- # Creating a ggplot ```r library(ggplot2) ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ``` <img src="figure/unnamed-chunk-9-1.png" width="504" style="display: block; margin: auto;" /> --- # Creating a ggplot ``` ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) ``` 1. With ggplot2, you begin a plot with the function `ggplot()`. `ggplot()` creates a coordinate system that you can add layers to. 2. You complete your graph by adding one or more layers to `ggplot()`. For example, `geom_point()` adds a layer of points to your plot, which creates a scatterplot. You can specify the color, size and shape of these points. Each geom function in ggplot2 takes a `mapping` argument. --- # Change the color, size and shape of points .pull-left[ ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = 2, size = 3, shape = 18) ``` ] .pull-right[ <img src="figure/unnamed-chunk-10-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Facets .pull-left[ - Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display one subset of the data. - To facet your plot by a single variable, use `facet_wrap()`. The first argument should be a formula, which you create with ~ followed by a variable name. ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_wrap(~class, nrow = 2) ``` ] .pull-right[ <img src="figure/unnamed-chunk-12-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Facets .pull-left[ To facet your plot on the combination of two variables, add facet_grid() to your plot call. ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(drv ~ cyl) ``` ] .pull-right[ <img src="figure/unnamed-chunk-14-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Pairwise plots .pull-left[ ```r library(GGally) ggpairs(subset(mtcars, select = c(1, 3, 4, 5, 6))) ``` ] .pull-right[ <img src="figure/unnamed-chunk-15-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Geometric objects ```r ## left p1 <- ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ## right p2 <- ggplot(data = mpg) + geom_smooth(mapping = aes(x = displ, y = hwy)) ## put the two plots in one row gridExtra::grid.arrange(p1, p2, ncol = 2) ``` --- # Geometric objects <img src="figure/unnamed-chunk-16-1.png" width="864" style="display: block; margin: auto;" /> --- # Geometric objects - A **geom** is the geometrical object that a plot uses to represent data. - Bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. - You can also add multiple geom functions. ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + geom_smooth(mapping = aes(x = displ, y = hwy)) ``` --- # Bar chart .pull-left[ The following chart displays the total number of cars in the `mpg` dataset, grouped by `drv`. ```r ggplot(data = mpg) + geom_bar(mapping = aes(x = drv, fill = drv)) ``` ] .pull-right[ <img src="figure/unnamed-chunk-18-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Histogram .pull-left[ ```r ggplot(data = mpg) + geom_histogram(mapping = aes(x = hwy)) ``` ] .pull-right[ <img src="figure/unnamed-chunk-19-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Boxplots .pull-left[ ```r ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot() ``` ] .pull-right[ <img src="figure/unnamed-chunk-20-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Time series ```r library(ggplot2) library(forecast) autoplot(AirPassengers) ``` <img src="figure/unnamed-chunk-21-1.png" width="504" style="display: block; margin: auto;" /> --- # Summary - We have seen basic **R** plots and plots from **ggplot2**. - In this course, you will also use animated plots (e.g., gifs), interactive plots, etc. --- # References [R documentation of **ggplot2**.](https://www.rdocumentation.org/packages/ggplot2/versions/3.2.1)