Run the following code in your console to install packages:
if (!require(htmlwidgets)) install.packages("htmlwidgets")
if (!require(wordcloud2)) install.packages("wordcloud2")
if (!require(GGally)) install.packages("GGally")
if (!require(plotly)) install.packages("plotly")
if (!require(ggmap)) install.packages("ggmap")
if (!require(maps)) install.packages("maps")
if (!require(networkD3)) install.packages("networkD3")
if (!require(webshot)) install.packages("webshot")
if (!require(tidytext)) install.packages("tidytext")
Load packages:
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(wordcloud2))
suppressPackageStartupMessages(library(GGally))
suppressPackageStartupMessages(library(scales))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(ggmap))
suppressPackageStartupMessages(library(maps))
suppressPackageStartupMessages(library(networkD3))
suppressPackageStartupMessages(library(tidytext))
ggmap
Steps to using ggmap
:
get_map
.
get_googlemap
(satellite? road? your choice.), get_cloudmademap
, get_stamenmap
ggmap
github page for details.ggmap
function to display the map.
ggplot
call to the base_layer
argument of ggmap
, then add your layers as usual outside of the ggmap
call.Example:
# crime data from `ggmap` package.
small_crime <- sample_n(crime, 10000)
bbox <- c(
left = -96,
bottom = 29.5,
right = -95,
top = 30
)
(basemap <- ggmap::get_stamenmap(bbox, maptype = "toner-lite", zoom = 9) %>%
ggmap::ggmap())
## Map from URL : http://tile.stamen.com/toner-lite/9/119/211.png
## Map from URL : http://tile.stamen.com/toner-lite/9/120/211.png
## Map from URL : http://tile.stamen.com/toner-lite/9/119/212.png
## Map from URL : http://tile.stamen.com/toner-lite/9/120/212.png
basemap +
geom_density2d(aes(lon, lat),
data = small_crime,
colour = "red")
## Warning: Removed 218 rows containing non-finite values (stat_density2d).
Exercise: Modify the following code to retrieve and plot map of UBC. Lower-left GPS coordinates (latitude, longitude): 49.241284, -123.273007; upper-right GPS coordinates: 49.281994, -123.216366. WARNING: I don’t recommend trying zoom > 14 (it’ll be slow).
c(-123.273007, 49.241284, -123.216366, 49.281994) %>%
get_stamenmap(maptype = "toner-lite", zoom=13) %>%
ggmap()
## Map from URL : http://tile.stamen.com/toner-lite/13/1290/2803.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1291/2803.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1292/2803.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1290/2804.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1291/2804.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1292/2804.png
Maps are depicted as polygons. The maps
package has lots of polygon data stored, and can be plotted with maps::map()
maps::map("world")
maps::map("world", region="canada")
maps::map("france")
Get the data in tidy tibble format with ggplot2::map_data()
, ready to use with ggplot()
. Example:
(world_dat <- ggplot2::map_data("world") %>%
as_tibble())
## # A tibble: 99,338 x 6
## long lat group order region subregion
## * <dbl> <dbl> <dbl> <int> <chr> <chr>
## 1 -69.9 12.5 1 1 Aruba <NA>
## 2 -69.9 12.4 1 2 Aruba <NA>
## 3 -69.9 12.4 1 3 Aruba <NA>
## 4 -70.0 12.5 1 4 Aruba <NA>
## 5 -70.1 12.5 1 5 Aruba <NA>
## 6 -70.1 12.6 1 6 Aruba <NA>
## 7 -70.0 12.6 1 7 Aruba <NA>
## 8 -70.0 12.6 1 8 Aruba <NA>
## 9 -69.9 12.5 1 9 Aruba <NA>
## 10 -69.9 12.5 1 10 Aruba <NA>
## # ... with 99,328 more rows
Notice the order of the points. Notice the groups. Use geom_polygon
:
ggplot(world_dat, aes(long, lat)) +
geom_polygon(aes(group=group))
Exercise: Make a plot of Italy.
map_data("italy") %>%
ggplot(aes(long, lat)) +
geom_polygon(aes(group=group), fill="red", colour="black") +
coord_equal()
The plotly
R package makes highlight-interactivity possible.
Consider the following plot:
(p <- gapminder %>%
filter(continent != "Oceania") %>%
ggplot(aes(gdpPercap, lifeExp)) +
geom_point(aes(colour=pop), alpha=0.2) +
scale_x_log10(labels=dollar_format()) +
scale_colour_distiller(
trans = "log10",
breaks = 10^(1:10),
labels = comma_format(),
palette = "Greens"
) +
facet_wrap(~ continent) +
scale_y_continuous(breaks=10*(1:10)) +
theme_bw())
plotly
object by applying the ggplotly()
function:ggplotly(p)
p %>%
ggplotly() %>%
htmlwidgets::saveWidget("LOCATION_GOES_HERE")
p %>%
ggplotly() %>%
plotly_json()
plot_ly()
– scatterplot of gdpPercap vs lifeExp.
plot_ly(gapminder,
x = ~gdpPercap,
y = ~lifeExp,
type = "scatter",
mode = "markers",
opacity = 0.2) %>%
layout(xaxis = list(type = "log"))
plot_ly(gapminder,
x = ~gdpPercap,
y = ~lifeExp,
z = ~pop,
type = "scatter3d",
mode = "markers",
opacity = 0.2)
The wordcloud2
package is excellent for creating word clouds (with some interactivity!).
Check out the vignette or the wordcloud2
README. Here’s an example from the latter:
**NOTE: It appears wordcloud only permits one plot to viewed at a time.
# demoFreq %>%
# filter(freq > 1) %>%
# wordcloud2(size = 1, minRotation = -pi/2, maxRotation = -pi/2, shape="circle")
Notice:
Exercise: What am I saying a lot of in this document (aside from stopwords)? A minimal amount of wrangling is done for you.
(this_file <- read_file("lec4-worksheet-complete.Rmd") %>%
strsplit(" ") %>%
`[[`(1) %>%
as_tibble() %>%
count(value) %>%
setNames(c("word", "freq")) %>%
mutate(word = tolower(word)) %>%
anti_join(get_stopwords()) %>%
filter(!(word %in% c("\n", "")),
!str_detect(word, pattern="-|[0-9]|%>%|=|\\+|\\\\")) %>%
arrange(desc(freq)))
## Joining, by = "word"
## # A tibble: 357 x 2
## word freq
## <chr> <int>
## 1 data 6
## 2 package 5
## 3 plot 5
## 4 `ggmap` 4
## 5 code 4
## 6 columns 3
## 7 example 3
## 8 following 3
## 9 format 3
## 10 make 3
## # ... with 347 more rows
wordcloud2(this_file,
minRotation = -pi/2,
maxRotation = -pi/2,
size = 0.5)
Comes with many extensions to ggplot2
. Most useful being the pairs functions.
It’s name is derived from the package’s intent to be an “ally” of ggplot2
.
Example: use ggpairs
to plot all combinations of measurements of flea body parts (subset to the first 4 columns to save time):
GGally::ggpairs(flea[, 1:4])
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Or, use ggscatmat
for numeric-only data. It’s faster. Numeric gapminder data:
gapminder %>%
select(lifeExp, gdpPercap, pop) %>%
GGally::ggscatmat()
Check out all dependencies in the iris
data set:
GGally::ggscatmat(iris)
## Warning in GGally::ggscatmat(iris): Factor variables are omitted in plot
There are plenty of options for making network diagrams in R. A comprehensive overview of the landscape is given at kateto.net.
We’ll be looking at:
igraph
: a flexible package for handling network data. website; tutorialnetworkD3
: a D3-backed package allowing interactivity. tutorialExample: Twitter. Let’s make a simple example on the board!
Network data is comprised of:
Storage of these data:
Example of Les Mis characters:
data("MisLinks")
data("MisNodes")
(MisLinks <- as_tibble(MisLinks))
## # A tibble: 254 x 3
## source target value
## * <int> <int> <int>
## 1 1 0 1
## 2 2 0 8
## 3 3 0 10
## 4 3 2 6
## 5 4 0 1
## 6 5 0 1
## 7 6 0 1
## 8 7 0 1
## 9 8 0 2
## 10 9 0 1
## # ... with 244 more rows
(MisNodes <- as_tibble(MisNodes))
## # A tibble: 77 x 3
## name group size
## * <fct> <int> <int>
## 1 Myriel 1 15
## 2 Napoleon 1 20
## 3 Mlle.Baptistine 1 23
## 4 Mme.Magloire 1 30
## 5 CountessdeLo 1 11
## 6 Geborand 1 9
## 7 Champtercier 1 11
## 8 Cravatte 1 30
## 9 Count 1 8
## 10 OldMan 1 29
## # ... with 67 more rows
Make a simple interactive network plot out of the edges:
networkD3::simpleNetwork(MisLinks)
Get more sophisticated with forceNetwork
(group specification is required):
networkD3::forceNetwork(MisLinks, MisNodes, NodeID="name", Group="group")
## Links is a tbl_df. Converting to a plain data frame.
## Nodes is a tbl_df. Converting to a plain data frame.
Make a graph with igraph
of the example we did on the board:
edges <- c(2,4, 2,3, 3,1, 1,4)
(g <- igraph::graph(edges))
## IGRAPH 72b3e0d D--- 4 4 --
## + edges from 72b3e0d:
## [1] 2->4 2->3 3->1 1->4
plot(g)
?plot.igraph
Check out this tutorial starting at “sankey” for other types of specialized plots.