DSCI 571 Course README

DSCI 571 Course README#

Welcome to DSCI 571: Supervised Learning I! This course introduces fundamental concepts and techniques in supervised machine learning such as data splitting, cross-validation, generalization, overfitting, the fundamental trade-off, the golden rule, and data preprocessing. Additionally, we will explore popular machine learning algorithms, including decision trees, \(k\)-nearest neighbors, SVMs, naive Bayes, and linear models, using the scikit-learn framework.

Important links#

YouTube videos

Course learning outcomes#

Click to expand!

By the end of this course, you will be able to:

Describe supervised learning and its suitability for various tasks.
Explain key machine learning concepts such as classification, regression, overfitting, and the trade-off in model complexity.
Identify appropriate data preprocessing techniques for specific scenarios, provide reasons for their selection, and integrate them into machine learning pipelines.
Develop an intuitive understanding of common machine learning algorithms.
Build end-to-end supervised machine learning pipelines using Python and scikit-learn on real-world datasets.

Deliverables#

Click to expand!

The following deliverables will determine your course grade:

Assessment	Weight	Where to submit
Lab Assignment 1	12%	Gradescope
Lab Assignment 2	12%	Gradescope
Lab Assignment 3	12%	Gradescope
Lab Assignment 4	12%	Gradescope
iClicker participation	2%	iClicker Cloud
Quiz 1	25%	PrairieLearn
Quiz 2	25%	PrairieLearn

See Calendar for the due dates.

Teaching Team#

Click to expand!

Section 001: Classifiers#

Role	Name
Lecture Instructor	Prajeet Bajpai
Lab Instructor	Prajeet Bajpai

Section 002: Regressors#

Role	Name
Lecture Instructor	Varada Kolhatkar
Lab Instructor	Varada Kolhatkar

Lectures#

Format#

Click to expand!

This course follows a semi-flipped classroom format, where you will watch pre-recorded videos before class. In-class sessions will focus on demos, iClicker questions, Q&A, discussions, and worksheets. It’s highly recommended that you run the lecture Jupyter notebooks on your own and actively experiment with the code.

Lecture schedule#

This course occurs during Block 2 in the 2023/24 school year.

Lecture	Topic	Assigned videos	Resources and optional readings
	Course information	📹 Pre-watch: 1.0
1	Terminology, baselines, decision trees	📹 Pre-watch: 2.1, 2.2, 2.3, 2.4
2	ML fundamentals	📹 Pre-watch: 3.1, 3.2, 3.3, 3.4	An article by Pedro Domingos
3	\(k\)-NNs, SVM RBF	📹 Pre-watch: 4.1, 4.2, 4.3, 4.4, 5.1
4	Preprocessing, pipelines, column transformer	📹 Pre-watch: 5.2, 5.3, 5.4, 6.1
5	More preprocessing, text features	📹 Pre-watch: 6.2
6	Hyperparameter optimization, optimization bias	📹 Pre-watch: 8.1, 8.2
7	Naive Bayes	None	Conditional probability visualization Naive Bayes chapter, Jurafsky and Martin
8	Logistic Regression, multi-class classification	📹 Pre-watch: 7.1, 7.2, 7.3

Datasets#

Here is the list of Kaggle datasets we’ll use in this class.

Labs#

During labs, you’ll have the opportunity to work independently or in groups. These sessions offer ample time for discussion and receiving help. They also provide a chance for the instructors to get to know you better :).

Installation#

We are providing you with a conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.

conda env create -f env-dsci-571.yml
conda activate 571

In order to use this environment in Jupyter, you will have to install nb_conda_kernels in the environment where you have installed Jupyter (typically the base environment). You will then be able to select this new environment in Jupyter. If you’re unable to see the environment in Jupyter, you might have to install the kernel manually. See the documentation here.

I’ve only attempted to install this environment file on a few machines, and you may encounter issues with certain packages from the yml file when executing the commands above. This is not uncommon and may suggest that the specified package version is not yet available for your operating system via conda. When this occurs, you have a couple of options:

Modify the local version of the yml file to remove the line containing that package.
Create the environment without that package.
Activate the environment and install the package manually either with conda install or pip install in the environment.

Note that this is not a complete list of the packages we’ll be using in the course and there might be a few packages you will be installing using conda install later in the course. But this is a good enough list to get you started.

Course communication#

Click to expand!

We are all here to support your learning and success in the course and the program. Here’s how our communication will work during the course.

Clarifications on the lecture notes or lab questions#

If there is any clarification on the lecture material or lab questions, I’ll open an issue in the course repository and tag you. I will also post a Slack message and tag you. It is your responsibility to read the messages whenever you are tagged. (I know that there are too many things for you to keep track of. You do not have to read all the messages but please make sure to carefully read the messages whenever you are tagged. Also, if anything is unclear, please feel free to reach out on Slack or during office hours.

Questions on lecture material or labs#

If you have questions about the lecture material or lab questions please post them on the course Slack channel rather than direct messaging me or the TAs. Here are the advantages of doing so:

You’ll get a quicker response.
Your classmates will benefit from the discussion.

When you ask your question on the course channel, please avoid tagging the instructor unless it’s specific for the instructor (e.g., if you notice some mistake in the lecture notes). If you tag a specific person, other teaching team members or your colleagues are discouraged to respond. This decreases the response rate on the channel.

Please use some consistent convention when you ask questions on Slack to facilitate easy search for others or future you. For example, if you want to ask a question on Exercise 3.2 from Lab 1, start your post with the label lab1-ex2.3. Or if you have a question on lecture 2 material, start your post with the label lecture2. Once the question is answered/solved, you can add “(solved)” tag before the label (e.g., (solved) lab1-ex2.3. Do not delete your post even if you figure out the answer on your own. The question and the discussion can still be beneficial to others.

Reference Material#

Click to expand!

Books#

A Course in Machine Learning (CIML) by Hal Daumé III (also relevant for DSCI 572, 573, 575, 563)
Introduction to Machine Learning with Python: A Guide for Data Scientists by Andreas C. Mueller and Sarah Guido.
An Introduction to Statistical Learning
The Elements of Statistical Learning (ESL)
Data Mining: Practical Machine Learning Tools and Techniques (PMLTT)
Artificial intelligence: A Modern Approach by Russell, Stuart and Peter Norvig.
Artificial Intelligence 2E: Foundations of Computational Agents (2017) by David Poole and Alan Mackworth (of UBC!).

Online courses#

CPSC 330
I’m currently teaching an undergrad course on applied machine learning. Unlike DSCI 571, CPSC 330 is a semester-long course but there is a lot of overlap and sharing of notes between these courses.
Machine Learning Crash Course
Mike’s CPSC 340
Machine Learning (Andrew Ng’s famous Coursera course)
Foundations of Machine Learning online course from Bloomberg.
Machine Learning Exercises In Python, Part 1 (translation of Andrew Ng’s course to Python, also relevant for DSCI 561, 572, 563)

Misc#

A Visual Introduction to Machine Learning (Part 1)
A Few Useful Things to Know About Machine Learning (an article by Pedro Domingos)
Metacademy (sort of like a concept map for machine learning, with suggested resources)
Machine Learning 101 (slides by Jason Mayes, engineer at Google)

Policies#

Please see the general MDS policies.

Enjoy your learning journey in DSCI 571: Supervised Learning I!

DSCI 571 Course README

Contents

DSCI 571 Course README#

Important links#

Course learning outcomes#

Deliverables#

Teaching Team#

Section 001: Classifiers#

Section 002: Regressors#

Lectures#

Format#

Lecture schedule#

Datasets#

Labs#

Installation#

Course communication#

Clarifications on the lecture notes or lab questions#

Questions on lecture material or labs#

Questions related to grading#

Questions related to your personal situation or talking about sensitive information#

Reference Material#

Books#

Online courses#

Misc#

Policies#