In this worksheet, we’ll be exploring various plot types (i.e., geometric objects), only using the x
and y
aesthetics (and group
).
We’ll be jumping straight into the ggplot()
function, instead of the more limited qplot()
function.
tidyverse
and gapminder
R packages.suppressPackageStartupMessages(library(tidyverse))
library(gapminder)
Let’s look at a scatterplot of gdpPercap
vs. lifeExp
.
ggplot2
plot.
Grammar Component | Specification |
---|---|
data | gapminder |
aesthetic mapping | x=lifeExp and y=gdpPercap |
geometric object | point |
scale | linear |
statistical transform | none |
coordinate system | rectangular/cartesian |
facetting | none |
ggplot
. What is returned? What’s missing?ggplot(data=gapminder, mapping=aes(x=lifeExp, y=gdpPercap)) +
geom_point()
ggplot(gapminder, aes(lifeExp, gdpPercap)) +
geom_point()
ggplot(gapminder) +
geom_point(aes(x=lifeExp, y=gdpPercap))
3. Add the missing component as a layer.
Notice the “metaprogramming” again!
aes
function! What happens if you forget?#ggplot(gapminder) +
# geom_point(x = lifeExp, y = gdpPercap)
ggplot2
does some data wrangling and computations itself! We don’t always have to modify the data frame.ggplot(gapminder, aes(lifeExp, gdpPercap)) +
geom_point() +
scale_y_log10()
ggplot(gapminder, aes(lifeExp, log(gdpPercap))) +
geom_point()
6. Try again, this time by changing the scale (this way is better).
7. The aesthetic mappings can be specified on the geom layer if you want, instead of the main ggplot
call. Give it a try:
Uses of a scatterplot:
Let’s build a histogram of life expectancy.
ggplot2
plot.Grammar Component | Specification |
---|---|
data | gapminder |
aesthetic mapping | x=lifeExp, y=count (corrected from before) |
geometric object | histogram |
scale | x and y both linear |
statistical transform | count |
ggplot(gapminder, aes(lifeExp)) +
geom_histogram(bins=50)
3. Change the number of bins to 50.
ggplot(gapminder, aes(lifeExp)) +
geom_density()
Uses of a histogram: Explore the distribution of a single numeric variable.
Let’s make box plots of population for each continent. Note: y-axis is much better on a log scale!
ggplot2
plot.Grammar Component | Specification |
---|---|
data | gapminder |
aesthetic mapping | x=continent, y=gdpPercap |
geometric object | boxplot OR violin |
scale | log-y; x is linear |
statistical transform | boxplot: 5 number summary; violinplot: density estimate |
ggplot
call, with the log y scale, and store it in the variable a
. Print out a
.a <- ggplot(gapminder, aes(continent, gdpPercap)) +
scale_y_log10()
a
.a + geom_boxplot()
a + geom_point(alpha=0.2)
a
.
a + geom_violin()
ggplot(gapminder, aes(continent, lifeExp)) +
geom_violin()
Use of boxplot: Visualize 1-dimensional distributions (of a single numeric variable).
Let’s work up to the concept of a jitter plot. As above, let’s explore the population for each continent, but using points (again, with the y-axis on a log scale).
Let’s hold off on identifying the grammar.
ggplot
call to make a scatterplot of continent
vs pop
; initiate the log y scale. Store the call in the variable b
.b <- ggplot(gapminder, aes(continent, pop)) +
scale_y_log10()
b
. Why is this an ineffective plot?b + geom_point()
b + geom_jitter()
b + geom_violin() + geom_jitter(alpha=0.1)
We can add multiple geom layers to our plot. Put a jitterplot overtop of the violin plot, starting with our base b
. Try vice-versa.
Uses of jitterplot: Visualize 1-dimensional distributions, AND get a sense of the sample size.