Welcome to DSCI 551: Descriptive Statistics and Probability for Data Science#
This course introduces descriptive statistics and probability, including measures of location and spread, random variables, distributions, parameters, categorical variables, and uncertainty.
High-Level Goals#
By the end of the course, students are expected to:
Provide fundamental concepts in probability, including conditional, joint, and marginal distributions.
Develop a statistical view of data coming from a probability distribution.
Learning Objectives#
Compute summary statistics, such as expected value and variance, of simple discrete and continuous probability distributions.
Compare/contrast location summary statistics such as mean/median/mode/quantiles.
Estimate summary statistics such as mean/median/variance from a plot of a distribution’s PDF or CDF.
Identify common continuous distributions such as Gaussian/Poisson/uniform from a plot of a distribution’s PDF or CDF.
Match common discrete distributions such as Bernoulli/binomial/multinomial to descriptions.
Compare/contrast conditional, joint and marginal distributions.
Explain the notion of “marginalizing out” a random variable.
Identify independence between random variables from plots/tables of conditional/joint/marginal distributions.
Connect conditional distributions to the notion of supervised learning.
Explain the concept of maximum likelihood estimation.
Identify the units of various quantities such as mean/variance/density for continuous distributions.
Simulate sample generation from probability distributions, and interpret the results.
Lecture Topics#
This course occurs during Block 1 in the 2024/25 school year. The course notes can be accessed here.
Lecture Topic/Notes |
Required Readings |
Optional Readings |
---|---|---|
Depicting Uncertainty |
||
Parametric Families |
||
Joint Probability |
Part 3: Probabilistic Models, Chapter 5.1, Covariance and correlation (video), How would you explain covariance … |
|
Conditional Probabilities |
||
Continuous Distributions |
||
Common Distribution Families and Conditioning |
||
Maximum Likelihood Estimation |
Part 5: Machine Learning,Beyond Multiple Linear Regression, sections 2.1 to 2.4, Chapter 7.1 & 7.2 |
|
Simulation and Empirical Distributions |
Cheat sheet#
Here is a cheat sheet we created to summarize the main formulas and concepts covered in DSCI 551.
Deliverables#
This is an assignment-based course. The following deliverables will determine your course grade:
Assessment |
Weight |
---|---|
Lab Assignment 1 |
12% |
Lab Assignment 2 |
12% |
Lab Assignment 3 |
12% |
Lab Assignment 4 |
12% |
Quiz 1 |
25% |
Quiz 2 |
25% |
Lecture Attendance (iClicker) |
2% |
Use of LLMs#
LLMs, such as ChatGPT, can be helpful tools if we use them responsibly. In this course, students are permitted to use these tools to gather more information, review concepts, or brainstorm, and students must cite these tools if they use them for assignment. Having said all this, it is not permitted to write any given assignment via copying and pasting AI-generated responses.
Resources#
Note: Some of these resources cover much more material than DSCI 551.
Policies#
See the general MDS policies.
Attribution#
The course is built upon previous years’ materials developed by previous instructors.
License#
© 2024 Vincenzo Coia, Mike Gelbart, Aaron Berk, Alexi Rodríguez-Arelis, and Vincent Liu.
Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.