Load packages

Run the following code in your console to install packages:

if (!require(htmlwidgets)) install.packages("htmlwidgets")
if (!require(wordcloud2)) install.packages("wordcloud2")
if (!require(GGally)) install.packages("GGally")
if (!require(plotly)) install.packages("plotly")
if (!require(ggmap)) install.packages("ggmap")
if (!require(maps)) install.packages("maps")
if (!require(networkD3)) install.packages("networkD3")
if (!require(webshot)) install.packages("webshot")
if (!require(tidytext)) install.packages("tidytext")

Load packages:

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(wordcloud2))
suppressPackageStartupMessages(library(GGally))
suppressPackageStartupMessages(library(scales))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(ggmap))
suppressPackageStartupMessages(library(maps))
suppressPackageStartupMessages(library(networkD3))
suppressPackageStartupMessages(library(tidytext))

Specialized Plot Types

Maps

Raster maps with ggmap

Steps to using ggmap:

  1. Load a map with get_map.
    • Special cases: get_googlemap (satellite? road? your choice.), get_cloudmademap, get_stamenmap
    • Google maps now requires API key! See ggmap github page for details.
  2. Feed that into the ggmap function to display the map.
    • To make displays overtop of the map, add your ggplot call to the base_layer argument of ggmap, then add your layers as usual outside of the ggmap call.

Example:

# crime data from `ggmap` package.
small_crime <- sample_n(crime, 10000)
bbox <- c(
    left   = -96, 
    bottom = 29.5, 
    right  = -95, 
    top    = 30
)
(basemap <- ggmap::get_stamenmap(bbox, maptype = "toner-lite", zoom = 9) %>% 
        ggmap::ggmap())
## Map from URL : http://tile.stamen.com/toner-lite/9/119/211.png
## Map from URL : http://tile.stamen.com/toner-lite/9/120/211.png
## Map from URL : http://tile.stamen.com/toner-lite/9/119/212.png
## Map from URL : http://tile.stamen.com/toner-lite/9/120/212.png

basemap + 
    geom_density2d(aes(lon, lat), 
                   data   = small_crime, 
                   colour = "red")
## Warning: Removed 218 rows containing non-finite values (stat_density2d).

Exercise: Modify the following code to retrieve and plot map of UBC. Lower-left GPS coordinates (latitude, longitude): 49.241284, -123.273007; upper-right GPS coordinates: 49.281994, -123.216366. WARNING: I don’t recommend trying zoom > 14 (it’ll be slow).

c(-123.273007, 49.241284, -123.216366, 49.281994) %>% 
    get_stamenmap(maptype = "toner-lite", zoom=13) %>% 
    ggmap()
## Map from URL : http://tile.stamen.com/toner-lite/13/1290/2803.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1291/2803.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1292/2803.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1290/2804.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1291/2804.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1292/2804.png

Vector maps

Maps are depicted as polygons. The maps package has lots of polygon data stored, and can be plotted with maps::map()

maps::map("world")

maps::map("world", region="canada")

maps::map("france")

Get the data in tidy tibble format with ggplot2::map_data(), ready to use with ggplot(). Example:

(world_dat <- ggplot2::map_data("world") %>% 
     as_tibble())
## # A tibble: 99,338 x 6
##     long   lat group order region subregion
##  * <dbl> <dbl> <dbl> <int> <chr>  <chr>    
##  1 -69.9  12.5     1     1 Aruba  <NA>     
##  2 -69.9  12.4     1     2 Aruba  <NA>     
##  3 -69.9  12.4     1     3 Aruba  <NA>     
##  4 -70.0  12.5     1     4 Aruba  <NA>     
##  5 -70.1  12.5     1     5 Aruba  <NA>     
##  6 -70.1  12.6     1     6 Aruba  <NA>     
##  7 -70.0  12.6     1     7 Aruba  <NA>     
##  8 -70.0  12.6     1     8 Aruba  <NA>     
##  9 -69.9  12.5     1     9 Aruba  <NA>     
## 10 -69.9  12.5     1    10 Aruba  <NA>     
## # ... with 99,328 more rows

Notice the order of the points. Notice the groups. Use geom_polygon:

ggplot(world_dat, aes(long, lat)) +
    geom_polygon(aes(group=group))

Exercise: Make a plot of Italy.

map_data("italy") %>% 
    ggplot(aes(long, lat)) +
    geom_polygon(aes(group=group), fill="red", colour="black") +
    coord_equal()

Plotly

The plotly R package makes highlight-interactivity possible.

Consider the following plot:

(p <- gapminder %>% 
     filter(continent != "Oceania") %>% 
     ggplot(aes(gdpPercap, lifeExp)) +
     geom_point(aes(colour=pop), alpha=0.2) +
     scale_x_log10(labels=dollar_format()) +
     scale_colour_distiller(
         trans   = "log10",
         breaks  = 10^(1:10),
         labels  = comma_format(),
         palette = "Greens"
     ) +
     facet_wrap(~ continent) +
     scale_y_continuous(breaks=10*(1:10)) +
     theme_bw())

  1. Convert it to a plotly object by applying the ggplotly() function:
ggplotly(p)
  1. You can save a plotly graph locally as an html file. Try saving the above:
    • NOTE: plotly graphs don’t seem to show up in Rmd notebooks, but they do with Rmd documents.
p %>% 
    ggplotly() %>% 
    htmlwidgets::saveWidget("LOCATION_GOES_HERE")
  1. Run this code to see the json format underneath:
p %>% 
    ggplotly() %>% 
    plotly_json()
  1. Check out code to make a plotly object from scratch using plot_ly() – scatterplot of gdpPercap vs lifeExp.
plot_ly(gapminder, 
        x = ~gdpPercap, 
        y = ~lifeExp, 
        type = "scatter",
        mode = "markers",
        opacity = 0.2) %>% 
    layout(xaxis = list(type = "log"))
  1. Add population to form a z-axis for a 3D plot:
plot_ly(gapminder,
        x = ~gdpPercap,
        y = ~lifeExp,
        z = ~pop,
        type = "scatter3d",
        mode = "markers",
        opacity = 0.2)

Wordclouds

The wordcloud2 package is excellent for creating word clouds (with some interactivity!).

Check out the vignette or the wordcloud2 README. Here’s an example from the latter:

**NOTE: It appears wordcloud only permits one plot to viewed at a time.

# demoFreq %>% 
#     filter(freq > 1) %>%
#     wordcloud2(size = 1, minRotation = -pi/2, maxRotation = -pi/2, shape="circle")

Notice:

  • arrangement of words on the plane
  • angle of words
  • colour
  • size

Exercise: What am I saying a lot of in this document (aside from stopwords)? A minimal amount of wrangling is done for you.

(this_file <- read_file("lec4-worksheet-complete.Rmd") %>% 
     strsplit(" ") %>% 
     `[[`(1) %>% 
     as_tibble() %>% 
     count(value) %>% 
     setNames(c("word", "freq")) %>% 
     mutate(word = tolower(word)) %>% 
     anti_join(get_stopwords()) %>% 
     filter(!(word %in% c("\n", "")),
            !str_detect(word, pattern="-|[0-9]|%>%|=|\\+|\\\\")) %>% 
     arrange(desc(freq)))
## Joining, by = "word"
## # A tibble: 357 x 2
##    word       freq
##    <chr>     <int>
##  1 data          6
##  2 package       5
##  3 plot          5
##  4 `ggmap`       4
##  5 code          4
##  6 columns       3
##  7 example       3
##  8 following     3
##  9 format        3
## 10 make          3
## # ... with 347 more rows
wordcloud2(this_file, 
           minRotation = -pi/2, 
           maxRotation = -pi/2, 
           size = 0.5)

GGally

Comes with many extensions to ggplot2. Most useful being the pairs functions.

It’s name is derived from the package’s intent to be an “ally” of ggplot2.

Example: use ggpairs to plot all combinations of measurements of flea body parts (subset to the first 4 columns to save time):

GGally::ggpairs(flea[, 1:4])
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Or, use ggscatmat for numeric-only data. It’s faster. Numeric gapminder data:

gapminder %>% 
    select(lifeExp, gdpPercap, pop) %>% 
    GGally::ggscatmat()

Check out all dependencies in the iris data set:

GGally::ggscatmat(iris)
## Warning in GGally::ggscatmat(iris): Factor variables are omitted in plot

Network Diagrams

There are plenty of options for making network diagrams in R. A comprehensive overview of the landscape is given at kateto.net.

We’ll be looking at:

  • igraph: a flexible package for handling network data. website; tutorial
  • networkD3: a D3-backed package allowing interactivity. tutorial

Network Data

Example: Twitter. Let’s make a simple example on the board!

Network data is comprised of:

  • Nodes/vertices
  • Edges/links

Storage of these data:

  • Node data: variables describing each node.
    • Typical tidy data.
  • Edge data:
    • Each edge indicated by two columns of nodes; other columns optionally include edge properties (like length).
    • Could also be in sparse/adjacency matrix format (“wide”, or “untidy” data).

Example of Les Mis characters:

data("MisLinks")
data("MisNodes")
(MisLinks <- as_tibble(MisLinks))
## # A tibble: 254 x 3
##    source target value
##  *  <int>  <int> <int>
##  1      1      0     1
##  2      2      0     8
##  3      3      0    10
##  4      3      2     6
##  5      4      0     1
##  6      5      0     1
##  7      6      0     1
##  8      7      0     1
##  9      8      0     2
## 10      9      0     1
## # ... with 244 more rows
(MisNodes <- as_tibble(MisNodes))
## # A tibble: 77 x 3
##    name            group  size
##  * <fct>           <int> <int>
##  1 Myriel              1    15
##  2 Napoleon            1    20
##  3 Mlle.Baptistine     1    23
##  4 Mme.Magloire        1    30
##  5 CountessdeLo        1    11
##  6 Geborand            1     9
##  7 Champtercier        1    11
##  8 Cravatte            1    30
##  9 Count               1     8
## 10 OldMan              1    29
## # ... with 67 more rows

Generic Graphing

Make a simple interactive network plot out of the edges:

networkD3::simpleNetwork(MisLinks)

Get more sophisticated with forceNetwork (group specification is required):

networkD3::forceNetwork(MisLinks, MisNodes, NodeID="name", Group="group")
## Links is a tbl_df. Converting to a plain data frame.
## Nodes is a tbl_df. Converting to a plain data frame.

Make a graph with igraph of the example we did on the board:

edges <- c(2,4, 2,3, 3,1, 1,4)
(g <- igraph::graph(edges))
## IGRAPH 72b3e0d D--- 4 4 -- 
## + edges from 72b3e0d:
## [1] 2->4 2->3 3->1 1->4
plot(g)

?plot.igraph

Specialized Network Diagrams

Check out this tutorial starting at “sankey” for other types of specialized plots.