Syllabus#
👋 Welcome to DSCI 531! 🚀
In this course we will learn how to (and how not to) visualize data. We will cover graphical grammars and practice using them via ggplot in R and Altair in Python. With these tools, we will create effective data visualizations that strengthen our own exploratory data analysis skills and our ability to communicate insights to others.
Course structure#
The main parts of this course are the following:
Lecture notes, lab solutions, and course info are put together on this site.
Lab assignments are submitted via Gradescope (use the link from Canvas to get into Gradescope).
Quizzes will be administrated via PrairieLearn.
Office hours will be administrated in person or via Zoom (Zoom links are on Canvas and in the MDS calendar).
A few additional details as we get started:
Required readings are listed on top of the lecture notes. There are none for the first class, but there will be for many of the other classes.
Important course announcements are made via on Slack by tagging everyone in the channel so that you get a notification. We will also pin these messages so that you can view a list of all important messages at one place by clicking “Pinned messages” in the top left corner of the Slack channel.
Before asking a new question, please search the FAQ from previous iterations of the course.
When asking about specific lab questions on Slack, you can add a text tag (e.g.
lab3py_q10) to make it easier for you and your colleagues to search the channel.If you have any questions regarding the course content, lectures, labs, autograders, etc, please post your question in the course Slack channel instead of sending a direct message to the instructor or TAs. This approach enables the teaching team (or another student) to respond more promptly and also benefits other students who might have similar questions.
Response time: We will try our best to reply to your inquiries as soon as possible during the regular working hours (9 am - 5 pm Mon-Fri). If you send us a message outside of regular working hours, please expect a response on the next working day.
Conda environment setup#
To set up the necessary packages for running the labs
and lecture material from 531,
download the environment file from the student repo to your computer
(hit “Raw” and then Ctrl + s to save it, or copy paste the content).
Then create a virtual environment by using conda with the environment file you just downloaded:
conda env create --file environment.yaml
This will setup both R and Python with the correct versions of all required packages. If you prefer (or run into issues), you are free to use your system R installation and setup the packages from the environment file manually instead.
Note
If you don’t remember how to use a conda environment with JupyterLab,
review this section of Lec 8 in 521.
In essence, you should only need to run the following command
from the environment where you have JupyterLab installed (e.g. base or jl)
if you haven’t already done so.
conda install nb_conda_kernels
Next time you open JupyterLab,
you should be able to select 531-py or 531-r
as your notebook kernel.
Running the lecture notes#
If you want to annotate on the lecture notes,
you can download a PDF version by clicking on the little download icon on the top of the page.
If you want to run the lecture notes on your computer,
you can download the folder called notebooks from the student repo.
The Python cells can be run as is,
but you need to copy the R cells
to a new R notebook to run them.
The simplest way is manually copy the cells you are interested in trying out,
but if you want to extract all the R cells at once
you could try the nbconvert command line tool.
This is what it would look like to create a separate R-notebook for the first lecture
(this part is totally optional and might not work on all OSes):
jupyter nbconvert --to notebook --output=lec1-r.ipynb 1-intro-viz-altair-ggplot.ipynb \
--ClearMetadataPreprocessor.enabled=True \
--ClearMetadataPreprocessor.clear_notebook_metadata=True \
--RegexRemovePreprocessor.enabled=True \
--RegexRemovePreprocessor.patterns="(?\!%%R.*)" \
&& sed -i 's/"%%R.*\\n",//' lec1-r.ipynb
Alternatively you could also try installing rpy2 and run R and Python in the same notebook, but in the past there have been some occasional issues with this package so we generally don’t recommend this unless you are OK potentially spending some extra time on troubleshooting. This is an optional step and you don’t need to install rpy2 for anything mandatory in 531.
Course learning outcomes#
In this course students will learn how to
Use a grammar of graphics to create data visualizations.
Use ggplot and Altair to generate data visualizations.
Perform exploratory data analysis on a dataset.
Select the data visualization most appropriate for the data-type and the communication goal.
Interpret data visualizations to answer questions and formulate follow-up questions.
Detailed learning outcomes are available in the lecture notes.
Assessments#
For each lab, you will have two submissions: one notebook with your Python code and one notebook with your R code.
In 531, the worksheet style questions are part of each lab notebook. This means that the first few questions in each lab will have all their tests visible for you. The reason for this is to reduce the number of separate documents and submissions that you have to keep track of.
The weight for the assessments can be seen below; due dates can be found on Gradescope.
Assessment |
Weight |
|---|---|
Lab 1 |
12.5% |
Lab 2 |
12.5% |
Quiz 1 |
25% |
Lab 3 |
12.5% |
Lab 4 |
12.5% |
Quiz 2 |
25% |
Lecture topics#
See the MDS Calendar for lectures times. Each week will focus on a separate theme.
Lecture |
Topic |
|---|---|
1 |
Intro to data visualization and graphical grammars |
2 |
Effective use of visual channels |
3 |
Visualizing data distributions |
4 |
Exploratory data analysis |
5 |
Visualization for communication |
6 |
Color theory and application |
7 |
Uncertainty, layouts, and interactivity |
8 |
Figure formats, paired comparisons, and more interactivity |
Teaching team#
See the MDS Calendar for office hours.
Role |
Name |
Slack |
|
|---|---|---|---|
Instructor Sec1 |
Payman Nickchi |
|
|
Instructor Sec2 |
Andy Tai |
|
|
TA Sec1 |
Jenny Zhang |
|
|
TA Sec1 |
Nima Hashemi |
|
|
TA Sec1 |
Kate Manskaia |
|
|
TA Sec1 |
Skylar Fang (TA) |
|
|
TA Sec2 |
Ali Balapour |
|
|
TA Sec2 |
Inder Khera |
|
|
TA Sec2 |
Parsa Seyfourian |
|
|
TA Sec2 |
Tony Liang |
|
Policies#
Please see the general MDS policies.
COVID-19 safety#
Read the UBC COVID-19 Campus Rules for the latest updates of what is expected of you in terms of COVID-safety.
Annotated resources#
Specific and required readings will be listed in the lecture notes. The resources below are optional if you wish to learn more about visualization.
General data visualization#
Fundamentals of Data Visualization
General principles of visualization independent of programming language.
Visualization analysis and design
More detailed resource, can be downloaded through the UBC online library
Data visualization in R (ggplot2)#
-
Comprehensive resource for learning about
ggplot2by its main author.
-
Overall good book on using R for data science, including visualization.
-
Great for quick reference.
Data visualization in Python (Altair)#
-
The user guide for the
Altairlibrary.
UW CSE512 - Data Visualization curriculum
Parts of our curriculum are based of this course.
Available under the open BSD-3 license. Copyright (c) 2019, University of Washington. All rights reserved.
-
Several annotated Altair examples.
Collection of additional references#
STAT 545.com by Jenny Bryan.