suppressPackageStartupMessages(library(tidyverse))
library(gapminder)
knitr::opts_chunk$set(fig.width=5, fig.height=3, fig.align="center")
Here is what we expect you to know, corresponding to today’s lecture:
You are not expected to memorize the names of the plot types for your quizzes.
Last time, we mostly looked at:
ggplot2
)
Elaborate:
We’ve also been getting set up for the other three main elements of this course
(we can think of these as a forming a flowchart: effective choice -> components + theme -> tooling -> EDA)
ggplot(gapminder, aes(continent, lifeExp)) +
geom_violin() +
geom_jitter(width=0.2, alpha=0.2) +
ggtitle("Jitter + Violin Plot")
gapminder %>%
group_by(continent) %>%
summarize(sd = sd(lifeExp),
mean_life_exp = mean(lifeExp)) %>%
ggplot(aes(continent)) +
geom_errorbar(aes(ymax = mean_life_exp + sd,
ymin = mean_life_exp - sd),
width=0.2) +
geom_col(aes(y=mean_life_exp)) +
ggtitle("Pinhead plot") +
ylab("lifeExp")
We’ll fill out the lec2-worksheet.Rmd
worksheet for the latter two.
What is exploratory data analysis (EDA)? (What is Data Science?) There’s no one definition. Generally:
When reading a plot, there are generally two main components to consider:
(Link to 552: estimate + uncertainty)
Which species has the largest Sepal Width? How certain are you? (Did I need a hypothesis test for this? This is not to say that hypothesis tests are not useful)
iris %>%
ggplot(aes(Species, Sepal.Width)) +
geom_violin() +
geom_jitter(width=0.2)
Which continent will have the highest life expectancy in 2020? Will a model help you here?
gapminder %>%
group_by(continent, year) %>%
summarize(mean_life_exp = mean(lifeExp)) %>%
ggplot(aes(year, mean_life_exp)) +
geom_point() +
geom_line(aes(colour=continent, group=continent))
Is there a relationship between GDP per capita and life expectancy? Describe the dependence. How confident are you?
ggplot(gapminder, aes(gdpPercap, lifeExp)) +
geom_point(alpha=0.2) +
scale_x_log10()
How certain are you with this one?
set.seed(10)
gapminder %>%
sample_n(10) %>%
ggplot(aes(gdpPercap, lifeExp)) +
geom_point() +
scale_x_log10()
Is there a relationship between the sepal and petal lengths of the setosa plant? How confident are you?
iris %>%
filter(Species == "setosa") %>%
ggplot(aes(Petal.Length, Sepal.Length)) +
geom_jitter() +
ggtitle("Species: Setosa")
Poll:
/poll Dependence strong weak none
/poll "Confidence" "highly confident", "somewhat confident", "little confidence", "no confidence"