{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lab Assignment 2: Exploratory Data Analysis with ggplot2\n",
"\n",
"Due Saturday."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assignment Objective\n",
"\n",
"Last time (Lab 1), you identified the components of three graphs using your intuition, and practiced using `ggplot2`. This time, you'll be making and identifying the components of three graphs in terms of the grammar of graphics. Each graph corresponds to an exploratory data analysis (EDA). In preparation for next week, we'll also get you to reflect on plotting effectiveness.\n",
"\n",
"This assignment is not autograded."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tidy Submission (worth 5%)\n",
"\n",
"rubric={mechanics:5}\n",
"\n",
"To get the marks for tidy submission:\n",
"\n",
"- Submit the assignment by filling in this jupyter notebook with your answers embedded\n",
"- Be sure to follow the [general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions/)\n",
"- Do not include any code that installs packages (this is not good practice anyway)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Writing (worth 8%)\n",
"\n",
"rubric={writing:8}\n",
"\n",
"To get these marks, you must use proper English, spelling, and grammar that's also _concise_."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Your Tasks\n",
"\n",
"Complete three iterations of the five tasks below. We've given you space below to write your answers in the sections labelled as Exercise `i.t`, where `i` is the iteration number (1-3), and `t` is the task number (0-4). We've addressed Task 0 for the first iteration (i.e., Exercise 1.0), to serve as an example of Task 0.\n",
"\n",
"0\\. _Pose a data analytic question corresponding to a data set of your choice (for example, `gapminder`, or data sets in the pre-loaded `datasets` package). At least 3 different random variables from the data set must be relevant for shedding light on the question._\n",
"\n",
"Then, come up with an appropriate graph to address this question, and express the graph in the following two ways:\n",
"\n",
"1\\. _Make a graph to address the question in Task 0 using `ggplot2`. Do not use the `qplot()`/`quickplot()` function._ \n",
"2\\. _Identify the seven components of the grammar of graphics corresponding to your graph (note that there must be at least three aesthetic mappings and facet variables combined -- one for each variable in your plot)._ \n",
"\n",
"Then, use the insight provided by your graph to write about the following:\n",
"\n",
"3\\. _Communicate insight into the question you posed in Task 0. Reflect on both the effect and confidence. What is it about the graph that allowed you to draw this insight?_ \n",
"4\\. _Comment on how effective your graph is at conveying the information contained in your data._ \n",
"\n",
"There is no word limit, but it's very important to be concise when responding to Tasks 3 and 4. We're not wanting essays here. As always, just write about the big picture / main idea. If you find yourself fishing for minutiae, you're probably trying too hard."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluation for Task 0 (the question)\n",
"\n",
"Each Task 0 is worth 3% of your assignment grade. \n",
"\n",
"To get the marks, don't fret trying to think of an intricate question. Marks here are based on: \n",
"\n",
"- whether the question is related to the data, \n",
"- whether at least three variables are included, and\n",
"- whether the question is at least somewhat realistic (i.e., you should not just randomly choose some variables and ask whether some randomly chosen relationship exists). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluation for Task 1 (the vis)\n",
"\n",
"Each Task 1 is worth 13% of your assignment grade:\n",
"\n",
"- 6% is for accuracy: \n",
" - Does your code run? _i.e._, can you make a plot starting with the `ggplot()` function?\n",
"- 7% is for choice of vis: \n",
" - Should be publication quality.\n",
" - Should use a proper plot type given your variable types (for example, not using a scatterplot when you have a categorical variable on the x or y axis). \n",
" - Should not unreasonably hide the data (hint: think pinhead plots vs. jitter+violin plots)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluation for Task 2 (the grammar)\n",
"\n",
"Each Task 2 is worth 4% of your assignment grade, and is evaluated based on whether you've correctly mapped your graph to its grammar components (for whatever graph you produced, even if it's off-topic). \n",
"\n",
"Use the following table to fill out the grammar of graphics:\n",
"\n",
"| Grammar Component | Specification |\n",
"|-----------------------|---------------|\n",
"| __data__ | YOUR_RESPONSE_HERE |\n",
"| __aesthetic mapping__ | YOUR_RESPONSE_HERE |\n",
"| __geometric object__ | YOUR_RESPONSE_HERE |\n",
"| scale | YOUR_RESPONSE_HERE |\n",
"| statistical transform | YOUR_RESPONSE_HERE |\n",
"| coordinate system | YOUR_RESPONSE_HERE |\n",
"| facetting | YOUR_RESPONSE_HERE |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluation for Task 3 (the insight)\n",
"\n",
"Each Task 3 is worth 7% of your assignment grade, and evaluates how well you can read and communicate about a graph. Again, we're only looking for big picture ideas here.\n",
"\n",
"- There might not be enough information present to actually _answer_ your question with confidence, and that's okay. This should be reflected in the language you use. \n",
" - Unlike in DSCI 552, you don't need to indicate any specific/numeric confidence level. This is the beauty of EDA, which is based on the notion that we can get a pretty good qualitative sense about most tasks without fitting models. \n",
"- The variables measured might not be the best indictors of your question, and that's okay, too. For example, perhaps you want to gain insight into IQ, but only have exam scores in different subjects. Something like this should be indicated in your response, if relevant. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluation for Task 4 (the effectiveness)\n",
"\n",
"Each Task 4 is worth 3% of your assignment grade. This exercise is reflective as opposed to evaluative, because we haven't covered Lecture 5: Plotting for Humans. Marks are based on insightfulness and whether your response makes sense.\n",
"\n",
"Here are some questions to get you thinking about effectiveness (you don't have to answer all of these): \n",
"\n",
"- In what ways does your graph fall short? \n",
"- In what ways does your graph excel? \n",
"- Can you think of a graph that would be worse/better at conveying the information contained in the data? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1: Warm-up graph (worth 27%)\n",
"\n",
"### 1.0: Question\n",
"\n",
"For the first graph, we've completed Task 0 by giving you the data analytic question to explore:\n",
"\n",
"The data set is the `gapminder` dataset. for which continent(s) is GDP per capita the strongest indicator of life expectancy? With the `gapminder` dataset (contained within the `gapminder` R package),"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1: Graph\n",
"\n",
"rubric={accuracy:6, viz:7}\n",
"\n",
"Task 1: Make the graph below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR_CODE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2: Grammar\n",
"\n",
"rubric={reasoning:4}\n",
"\n",
"Task 2: Fill in the grammar of graphics components below (hint: use the table we provided)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3: Insight\n",
"\n",
"rubric={reasoning:7}\n",
"\n",
"Task 3: Write your insight below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.4: Effectiveness\n",
"\n",
"rubric={reasoning:3}\n",
"\n",
"Task 4: Reflect on the plot's effectiveness below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2: Second Graph (worth 30%)\n",
"\n",
"### 2.0: Question\n",
"\n",
"rubric={reasoning:3}\n",
"\n",
"Task 0: Put the question below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1: Graph\n",
"\n",
"rubric={accuracy:6, viz:7}\n",
"\n",
"Task 1: Make the graph below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR_CODE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2: Grammar\n",
"\n",
"rubric={reasoning:4}\n",
"\n",
"Task 2: Fill in the grammar of graphics components below (hint: use the table we provided)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3: Insight\n",
"\n",
"rubric={reasoning:7}\n",
"\n",
"Task 3: Write your insight below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4: Effectiveness\n",
"\n",
"rubric={reasoning:3}\n",
"\n",
"Task 4: Reflect on the plot's effectiveness below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 3: Third Graph (worth 30%)\n",
"\n",
"### 3.0: Question\n",
"\n",
"rubric={reasoning:3}\n",
"\n",
"Task 0: Put the question below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.1: Graph\n",
"\n",
"rubric={accuracy:6, viz:7}\n",
"\n",
"Task 1: Make the graph below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR_CODE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2: Grammar\n",
"\n",
"rubric={reasoning:4}\n",
"\n",
"Task 2: Fill in the grammar of graphics components below (hint: use the table we provided)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3: Insight\n",
"\n",
"rubric={reasoning:7}\n",
"\n",
"Task 3: Write your insight below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.4: Effectiveness\n",
"\n",
"rubric={reasoning:3}\n",
"\n",
"Task 4: Reflect on the plot's effectiveness below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"YOUR_RESPONSE_HERE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}