Welcome to DSCI 562: Regression II#

This course focuses on exploring regression techniques beyond Ordinary Least-squares (OLS). Some specific inquiries are the following:

  • What if the response is still continuous but non-negative, or our observations are not independent anymore?

  • Maybe the response is binary, a count, or categorical.

  • Moreover, what if we have censored data?

  • In many other cases, we might be interested in modelling a response different from a conditioned mean. Different Data Science-related projects will require making inference on different conditioned response quantiles.

We will cover useful extensions to classical linear regression: generalized linear models (GLMs), mixed-effects, local, survival, and quantile regression, and techniques for dealing with missing data.

High-Level Goals#

By the end of the course, students are expected to:

  • Describe the risk and value of making parametric assumptions in regression.

  • Fit model functions that represent probabilistic quantities besides the mean.

  • Identify situations where OLS regression is sub-optimal, and apply alternative regression methods that better address the situation.

Lecture Schedule#

This course occurs during Block 4 in the 2024/25 school year. The course notes can be accessed here. Typically, you should review these notes before each lecture.

See the lecture learning objectives for a detailed breakdown of lecture-by-lecture learning objectives.

Regression Mind Map#

Here is a mind map we created to summarize all regression models to be covered in this course.

Deliverables#

This is an assignment-based course. The following deliverables will determine your course grade:

Assessment

Weight

Lab Assignment 1

12%

Lab Assignment 2

12%

Lab Assignment 3

12%

Lab Assignment 4

12%

Quiz 1

25%

Quiz 2

25%

Lecture Attendance (iClicker)

2%

Use of LLMs#

LLMs, such as ChatGPT, can be helpful tools if we use them responsibly. In this course, students are permitted to use these tools to gather more information, review concepts, or brainstorm, and students must cite these tools if they use them for assignment. Having said all this, it is not permitted to write any given assignment via copying and pasting AI-generated responses.

Reference Material#

  • Agresti, A (2013). Categorical Data Analysis, John Wiley & Sons, Incorporated. ProQuest Ebook Central.

    • The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account. This book is helpful for generalized linear models with discrete responses.

  • Collett, D. (2003). Modelling Binary Data (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16654

  • Fahrmeir, L. (2013). Regression Models, Methods and Aplications. Springer Berlin Heidelberg.

    • The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account.

  • Faraway, Julian J. (2005). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, CRC Press LLC. ProQuest Ebook Central.

    • The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account. This book is great for learning how to work within the R environment with the models we will be working on. Its approach is essentially practical.

  • Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge University Press.

    • The physical book is available through the UBC Library. This book is pretty useful and practical as introductory material.

  • Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Publising Company, Incorporated.

  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2014). An Introduction to Statistical Learning: with Applications in R. Springer Publishing Company, Incorporated.

  • Kleinbaum, D. G. and Klein, M. (2005). Survival analysis : A Self-Learning Text. Springer.

    • The e-book is available through the UBC Library. This book is a good start for Survival Analysis:

      • Chapter 1 (Introduction): Introduction to Survival Analysis (I), Censored Data (II), Terminology and Notation (III).

      • Chapter 2 (Kaplan-Meier Curves): Review (I), Example of Kaplan-Meier Curves (II), General Features of Kaplan-Meier Curves (III), Confidence Intervals for Kaplan-Meier Curves (VII and VIII).

      • Chapter 3 (Cox Proportional Hazards Model): Example of Cox Proportional Hazards Model (I), Formula of Cox Proportional Hazards Model (II), Why the Cox Proportional Hazards Model is Popular (III), Estimation of the Cox Proportional Hazards Model (IV).

      • Chapter 7 (Parametric Model): Overview (I), Relationship Between the Probability Density Function with Hazard and Survival Functions (II), Weibull Example (IV).

  • Rousseeuw, L. P. J. and Leroy A. M. (2003). Robust Regression and Outlier Detection. Hoboken, NJ : Wiley-Interscience.

    • The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account.

  • Roback, P. and Legler, J. (2020). Beyond Multiple Linear Regression.

  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.

    • The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account.

  • van Buuren, S. (2012). Flexible Imputation of Missing Data:

    • 1.1 The problem of missing data.

    • 1.2 Concepts of MCAR, MAR and MNAR.

    • 1.3 Ad-hoc solutions.

    • 1.4 Multiple imputation in a nutshell.

Dataset References#

These are the papers from which each dataset used in the lectures comes from. If you are interested in knowing more about them, you can obtain a PDF copy of each paper with your CWL account via the UBC library:

Policies#

See the general MDS policies.

Attribution#

The course is built upon previous years’ materials developed by previous instructors.

License#

© 2025 G. Alexi Rodríguez-Arelis, Payman Nickchi, Rodolfo Lourenzutti, and Vincenzo Coia.

Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.