DSCI 551

Short Description

Describing data in terms of its location, spread, and general distribution. How to balance the use of procedures from classical, parametric statistics with robust approaches that account for outliers and missing data.

Learning Outcomes

By the end of the course, students will be able to:

Define probability concepts such as random variables, the distribution of a random variable, and parameters of a distribution.
Describe quantitative or continuous variables, and explain measures of location, spread, and other more complicated features of a distribution.
Describe the connection between probability concepts and observed data, including the distinction between a property of the true distribution and its empirical counterpart (e.g., true mean vs sample mean), measures of uncertainty (e.g., standard error), and interval estimation.
Compute descriptive statistics, both low dimensional (e.g., sample mean, variance, and median) and high dimensional (e.g., empirical distribution, histogram estimator, and kernel density estimator).
Describe categorical variables. Explain and compute relevant measures such as frequency, relative frequency, entropy, and mode.
Identify problems in observed data, such as contamination with outliers or missing data. Implement mitigation strategies, such as robust statistics and imputation.
Guard against biases potentially caused by missing data.

Reference Material

Introduction to Statistics Through Resampling Methods and R
Roger Peng. Exploratory Data Analysis with R. https://leanpub.com/exdata
Jeff Leek. The Elements of Data Analytic Style. https://leanpub.com/datastyle

Instructor (2016-2017)

Jenny Bryan

Note: information on this page is preliminary and subject to change.