Welcome to Regression II!
DSCI 562 explores regression techniques that go beyond ordinary least-squares (OLS). In particular, we will ask questions like:
- What if the response is still continuous but constrained (e.g., non-negative), or the observations are no longer independent?
- What if the response is binary, a count, or categorical?
- What if the data are censored (for example, due to limits of detection or incomplete follow-up)?
- What if we are interested in something other than the conditional mean (such as conditional quantiles) because different data science applications call for different inferential responses?
To address these settings, we will study practical extensions of classical linear regression, including generalized linear models (GLMs), mixed-effects models, local regression, survival analysis, and quantile regression, as well as methods for handling missing data.
1 High-Level Goals
By the end of the course, students are expected to:
- Describe the risk and value of making parametric assumptions in regression.
- Fit model functions that represent probabilistic quantities besides the mean.
- Identify situations where OLS regression is sub-optimal, and apply alternative regression methods that better address the situation.
2 Lecture Topics
This course occurs during Block 4 in the school year. Typically, you should review these notes before each lecture.
3 Regression Mind Map
Here is a mind map we created to summarize all regression models to be covered in this course.
4 Cheat sheet
Here is a cheat sheet we created to summarize the main formulas and concepts covered in DSCI 562.
5 Deliverables
This is an assignment-based course. The following deliverables will determine your course grade:
| Assessment | Weight |
|---|---|
| Lab Assignment 1 | 12.5% |
| Lab Assignment 2 | 12.5% |
| Lab Assignment 3 | 12.5% |
| Lab Assignment 4 | 12.5% |
| Quiz 1 | 25% |
| Quiz 2 | 25% |
6 Lab Topics
| Lab Topic | |
|---|---|
| 1 | Introduction to Generalized Linear Models (Lectures 1 and 2) |
| 2 | Ordinal and Mixed-effects Regression Models (Lectures 3 and 4) |
| 3 | Survival Analysis and Local Regression (Lectures 5 and 6) |
| 4 | Quantile Regression and Missing Data Imputation (Lectures 7 and 8) |
7 Use of Generative AI (GenAI)
GenAI tools (e.g., ChatGPT) can be useful when used responsibly. In this course, you may use these tools to gather information, review concepts, or brainstorm. If you use GenAI in any graded work, you must clearly cite it (including what tool you used and how you used it). What is not permitted is submitting work that is primarily written by a GenAI tool; for example, copying and pasting AI-generated responses into an assignment. For details and expectations, please review the MDS policies.
8 Reference Material
- Agresti, A (2013). Categorical Data Analysis, John Wiley & Sons, Incorporated. ProQuest Ebook Central.
- The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account. This book is helpful for GLMs with discrete responses.
- Collett, D. (2003). Modelling Binary Data (2nd ed.). Chapman and Hall/CRC.
- The e-book is available through the UBC Library.
- Fahrmeir, L. (2013). Regression Models, Methods and Aplications. Springer Berlin Heidelberg.
- The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account.
- Faraway, Julian J. (2005). Extending the Linear Model with
R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, CRC Press LLC. ProQuest Ebook Central.- The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account. This book is great for learning how to work within the
Renvironment with the models we will be working on. Its approach is essentially practical.
- The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account. This book is great for learning how to work within the
- Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge University Press.
- The physical book is available through the UBC Library. This book is pretty useful and practical as introductory material.
- Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Publising Company, Incorporated.
- The e-book is available through the UBC Library.
- James, G., Witten, D., Hastie, T., and Tibshirani, R. (2014). An Introduction to Statistical Learning: with Applications in
R. Springer Publishing Company, Incorporated. - Kleinbaum, D. G. and Klein, M. (2005). Survival analysis : A Self-Learning Text. Springer.
- The e-book is available through the UBC Library. This book is a good start for Survival Analysis:
- Chapter 1 (Introduction): Introduction to Survival Analysis (I), Censored Data (II), Terminology and Notation (III).
- Chapter 2 (Kaplan-Meier Curves): Review (I), Example of Kaplan-Meier Curves (II), General Features of Kaplan-Meier Curves (III), Confidence Intervals for Kaplan-Meier Curves (VII and VIII).
- Chapter 3 (Cox Proportional Hazards Model): Example of Cox Proportional Hazards Model (I), Formula of Cox Proportional Hazards Model (II), Why the Cox Proportional Hazards Model is Popular (III), Estimation of the Cox Proportional Hazards Model (IV).
- Chapter 7 (Parametric Model): Overview (I), Relationship Between the Probability Density Function with Hazard and Survival Functions (II), Weibull Example (IV).
- The e-book is available through the UBC Library. This book is a good start for Survival Analysis:
- Rousseeuw, L. P. J. and Leroy A. M. (2003). Robust Regression and Outlier Detection. Hoboken, NJ : Wiley-Interscience.
- The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account.
- Roback, P. and Legler, J. (2020). Beyond Multiple Linear Regression.
- Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.
- The e-book is available through the UBC Library. You can obtain a PDF copy with your CWL account.
- van Buuren, S. (2012). Flexible Imputation of Missing Data:
- 1.1 The problem of missing data.
- 1.2 Concepts of MCAR, MAR and MNAR.
- 1.3 Ad-hoc solutions.
- 1.4 Multiple imputation in a nutshell.
9 Recommended Course Reviews
This course is entirely taught in R (we will follow the tidyverse style guide) with a reasonable mathematical and statistical basis. We strongly recommend reviewing the following courses:
- DSCI 551: Descriptive Statistics and Probability for Data Science, for basic statistical concepts and familiarity with the mathematical notation.
- DSCI 552: Statistical Inference and Computation I, for statistical inference concepts with a frequentist approach.
- DSCI 561: Regression I, since the topics of this course follow the same thread.
- DSCI 531: Data Visualization I, for plotting tools using the package
ggplot2.
10 Dataset References
These are the papers from which each dataset used in the lectures comes from. If you are interested in knowing more about them, you can obtain a PDF copy of each paper with your CWL account via the UBC library:
- Brockmann, H.J. (1996). Satellite Male Groups in Horseshoe Crabs, Limulus polyphemus. Ethology, 102: 1-21.
- Deb, P. and Trivedi, P. (1997). Demand for medical care by the elderly: a finite mixture approach. Journal of Applied Econometrics, 12(3), 313-336.
- Grunfeld Y. (1958). The determinants of corporate investment. Ph.D. thesis, Department of Economics, University of Chicago.
- Harrison, D. and Rubinfeld, D.L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
- Mangasarian, O. L., Street, W. N., and Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), 570-577.
- Wolberg, W. H. and Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences of the United States of America, 87(23), 9193–9196.
- Wood, P. (1967). Algebraic Model of the Lactation Curve in Cattle. Nature, 216, 164–165.
11 Policies
See the general MDS policies.
12 Attribution
The course is built upon previous years’ materials developed by previous instructors.
13 License
© 2026 G. Alexi Rodríguez-Arelis, Payman Nickchi, Rodolfo Lourenzutti, and Vincenzo Coia.
Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.