Making effective plots can tell you a LOT about data. Its hard! Its an under-rated but very powerful skill to develop.

- Di Cook

suppressPackageStartupMessages(library(tidyverse))
library(gapminder)
knitr::opts_chunk$set(fig.width=5, fig.height=3)

1 Agenda

Tips for effective graphing

At least two exercises related to content and http://viz.wtf/ (see the worksheet).

2 Resources

These resources are listed on the syllabus in the lecture table. They provide a good overview of tips for effective plotting.

Here are some resources that dive a little deeper:

An entertaining but inspiring resource:

If you want to spend more time on this and/or dig deeper, take a look at the following books:

3 Preface

Disclaimer: The tips you see here and online hold true for most cases. There might be some rare cases where the tips don’t hold – the key is to be intentional about every component of the graph.

“Let’s Practice What We Preach: Turning Tables into Graphs” by Gelman A, Pasarica C, Dodhia R. The American Statistician, Volume 56, Number 2, 1 May 2002 , pp. 121-130(10).

4 Learning Objectives

From today’s lecture, students are expected to:

For the quiz, you aren’t expected to know/memorize all of the tips.

5 Consider Information Density

Sometimes called overplotting.

gapxy <- ggplot(gapminder, aes(lifeExp, gdpPercap)) +
    theme_bw()
gapxy + geom_point()

gapxy <- gapxy + scale_y_log10()
gapxy + geom_point() 

gapxy + geom_point(alpha=0.2)

gapxy + geom_hex() 

gapxy + geom_density2d()

gapxy + facet_wrap(~continent) + geom_point(alpha=0.2) 

ggplot(gapminder, aes(continent, lifeExp)) +
    geom_violin(fill="red", alpha=0.2) +
    geom_boxplot(fill="blue", alpha=0.2) +
    geom_jitter(alpha=0.2)

6 Find the Goldilocks Plot

Display just the right amount of content: not too much, not too little.

In particular: reveal as much relevant information as possible; trim irrelevant and redundant information.

6.1 Reveal as much relevant information as possible

Because hiding your data is not effective at conveying information!

  • jitter + violin, not pinhead plots.
  • mosaic plots

6.2 Trim Irrelevant Information

Only use as much data as is required for answering a data analytic question.

map_data("france") %>% 
    ggplot(aes(long, lat)) +
    geom_polygon(aes(group=group), fill=NA, colour="black") +
    theme_bw() +
    ggtitle("Are lat and long really needed?")
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map

ggplot(gapminder, aes(year, lifeExp)) +
    geom_line(aes(group=country, colour=country), alpha=0.2) +
    guides(colour=FALSE) +
    theme_bw() +
    ggtitle("Is colouring by country really necessary here?\nNevermind fitting the legend!")

6.3 Trim Redundant Information

Don’t redundantly map variables to aesthetics/facets.

  • Common example: colouring/filling and facetting by the same variable.
HairEyeColor %>% 
    as_tibble() %>% 
    uncount(n) %>% 
    ggplot(aes(Hair)) +
    facet_wrap(~Sex) +
    geom_bar(aes(fill=Sex)) +
    theme_bw() +
    ggtitle("Don't do this.")

Really want to use colour? No problem, colours are fun! Try this:

HairEyeColor %>% 
    as_tibble() %>% 
    uncount(n) %>% 
    ggplot(aes(Hair)) +
    facet_wrap(~Sex) +
    geom_bar(fill="#D95F02") +
    theme_bw() +
    ggtitle("Do this.")

  • Delegate numeric details to an appendix, not the graph (or omit entirely).
HairEyeColor %>% 
    as_tibble() %>% 
    uncount(n) %>% 
    count(Hair) %>% 
    ggplot(aes(Hair, n)) +
    geom_col() +
    geom_text(aes(label=n), vjust=-0.1) +
    theme_bw() +
    labs(x="Hair colour", y="count", 
         title="Are the bar numbers AND y-axis really needed?")

7 Choose Human-Interpretable Aesthetic Mappings and Geom’s

plot_beav2 <- bind_rows(
    mutate(beaver1, beaver = "Beaver 1"), 
    mutate(beaver2, beaver = "Beaver 2")
) %>% 
    group_by(beaver) %>% 
    summarize(med = median(temp)) %>% 
    ggplot(aes(beaver, med)) +
    theme_bw() +
    xlab("") +
    ylab("Body Temperature\n(Celsius)")
cowplot::plot_grid(
    plot_beav2 +
        geom_col() +
        ggtitle("Don't do this."),
    plot_beav2 +
        geom_point() +
        ggtitle("Do this.")
)

(Yes, that’s really all the info you’re conveying. Own it.)

plot_iris <- ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
    geom_jitter(aes(colour=Species)) +
    theme_bw() +
    theme(legend.position = "bottom")
cowplot::plot_grid(
    plot_iris +
        scale_colour_manual(values=c("brown", "gray", "yellow")) +
        ggtitle("Don't do this."),
    plot_iris +
        scale_colour_brewer(palette="Dark2") +
        ggtitle("Leave it to an expert.\nDo this.")
)

8 Consider Zero

Are you comparing data across groups? Consider what a meaningful distance measure might be between two groups.

Are differences meaningful, and proportions not? Example: temperature. Zero doesn’t matter.

plot_beav <- bind_rows(
    mutate(beaver1, beaver = "Beaver 1"), 
    mutate(beaver2, beaver = "Beaver 2")
) %>% 
    ggplot(aes(beaver, temp)) +
    geom_violin() +
    geom_jitter(alpha=0.25) +
    theme_bw() +
    xlab("") +
    ylab("Body Temperature\n(Celsius)")
cowplot::plot_grid(
    plot_beav + 
        ggtitle("This."), 
    plot_beav + 
        ylim(c(0,NA)) +
        ggtitle("Not This.")
)

Are proportions meaningful, and differences not? Example: counts.

HairEyeColor %>% 
    as_tibble() %>% 
    uncount(n) %>% 
    ggplot(aes(Hair)) +
    geom_bar() +
    theme_bw() +
    ggtitle("Keep this starting from 0.")

Want to convey absolute life expectancies, in addition to relative life expectancies? Show 0.

ggplot(gapminder, aes(continent, lifeExp)) +
    geom_boxplot() +
    ylim(c(0, NA)) +
    geom_hline(yintercept = 0,
               linetype = "dashed")

9 Order factors

It’s easier to see rankings. See this STAT 545 example by Jenny Bryan. Use forcats!