Welcome to DSCI 554: Experimentation and Causal Inference#
This frequentist course focuses on statistical evidence from randomized experiments versus observational studies along with applications of randomization, e.g., A/B testing for website optimization.
High-Level Goals#
By the end of the course, students are expected to:
Distinguish between experimentally-generated data and observational data, with particular reference to the strength of ensuing statistical conclusions regarding causality.
Fit and interpret regression models for observational data, with particular reference to adjustment for potential confounding variables.
Apply the principle of “block what you can, randomize what you cannot” in designing an A/B testing experiment.
Assessments#
This is an assignment-based course. The following deliverables will determine your course grade:
Assessment |
Weight |
---|---|
Lab Assignment 1 |
12% |
Lab Assignment 2 |
12% |
Lab Assignment 3 |
12% |
Lab Assignment 4 |
12% |
Quiz 1 |
25% |
Quiz 2 |
25% |
Lecture Attendance (iClicker) |
2% |
Lecture Schedule#
This course occurs during Block 6 in the 2023/24 school year.
Course notes can be accessed here. Typically, you should review these notes before each lecture. Moreover, there is optional reading material.
Lecture |
Topic |
Optional Reading Material |
---|---|---|
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
5 |
||
6 |
|
|
7 |
||
8 |
Matched Case-Control Scheme, Ordinal Regressors, and Final Wrap-Up |
See the lecture learning objectives for a detailed breakdown of lecture-by-lecture learning objectives.
Reference Material#
Seltman HJ, Experimental Design and Analysis, 2015.
Oehlert GW, A First Course in Design and Analysis of Experiments, 2010.
O’Neil, Cathy and Schutt, Rachel. “Causality,” Ch. 11 of Doing Data Science: Straight Talk from the Frontline, O’Reilly Media, 2013.
Tang, Diane, et al. “Overlapping Experiment Infrastructure: More, Better, Faster Experimentation.” Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2010.
Further reading:
Work by Judea Pearl, such as “The Book of Why”.
Recommended Course Reviews#
This course is taught in R
(we will follow the tidyverse
style guide) and Stan
with a reasonable mathematical, statistical, and programming basis. We strongly recommend reviewing the following courses:
DSCI 551: Descriptive Statistics and Probability for Data Science, for basic statistical and probabilistic concepts, and familiarity with the mathematical notation.
DSCI 552: Statistical Inference and Computation I, for statistical inference concepts with a frequentist approach.
DSCI 561: Regression I, for Ordinary Least-squares (OLS).
DSCI 562: Regression II, for generalized linear models (GLMs).
DSCI 531: Data Visualization I, for plotting tools using the package
ggplot2
.
Policies#
See the general MDS policies.
Attribution#
The course is built upon previous years’ materials developed by previous instructors.
License#
© 2024 G. Alexi Rodríguez-Arelis, Daniel Chen, Benjamin Bloem-Redd, Tiffany Timbers, and Vincenzo Coia
Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.