Course Number Block Course Title Short Description 2021-22 Lecture Instructor 2021-22 Lab Instructor
DSCI 511 1 Programming for Data Science Program design and data manipulation with Python. Overview of data structures, iteration, flow control, and program design relevant to data exploration and analysis. When and how to exploit pre-existing libraries. Arman Ahmadi Arman Ahmadi
DSCI 523 1 Programming for Data Manipulation Program design and data manipulation with R. Organizing, filtering, sorting, grouping, reformatting, converting, and cleaning data to prepare it for further analysis. Tiffany Timbers Tiffany Timbers
DSCI 521 1 Computing Platforms for Data Science How to install, maintain, and use the data scientific software stack. The Unix shell, version control, and problem solving strategies. Literate programming documents. Florencia D’Andrea Florencia D’Andrea
DSCI 551 1 Descriptive Statistics and Probability for Data Science Fundamental concepts in probability including conditional, joint, and marginal distributions. Statistical view of data coming from a probability distribution. Mike Gelbart Mike Gelbart
DSCI 512 2 Algorithms and Data Structures How to choose and use appropriate algorithms and data structures to help solve data science problems. Key concepts such as recursion and algorithmic complexity (e.g., efficiency, scalability). Mike Gelbart Mike Gelbart
DSCI 571 2 Supervised Learning I Introduction to supervised machine learning. Basic machine learning concepts such as generalization error and overfitting. Various approaches such as K-NN, decision trees, linear classifiers. Varada Kolhatkar Varada Kolhatkar
DSCI 531 2 Data Visualization I Exploratory data analysis. Design of effective static visualizations. Plotting tools in R and Python. Joel Östblom Joel Östblom
DSCI 552 2 Statistical Inference and Computation I The statistical and probabilistic foundations of inference. Large sample results. The frequentist paradigm. Alexi Rodríguez-Arelis Quan Nguyen
DSCI 513 3 Databases and Data Retrieval How to work with data stored in relational database systems. Storage structures and schemas, data relationships, and ways to query and aggregate such data. Arman Ahmadi Arman Ahmadi
DSCI 561 3 Regression I Linear models for a quantitative response variable, with multiple categorical and/or quantitative predictors. Matrix formulation of linear regression. Model assessment and prediction. Gabriela Cohen Freue Alexi Rodríguez-Arelis
DSCI 522 3 Data Science Workflows Interactive vs. scripted/unattended analyses and how to move fluidly between them. Reproducibility through automation and containerization. Tiffany Timbers Florencia D’Andrea
DSCI 573 3 Feature and Model Selection How to evaluate and select features and models. Cross-validation, ROC curves, feature engineering, and regularization. Varada Kolhatkar Varada Kolhatkar
DSCI 572 4 Supervised Learning II Introduction to numerical optimization (e.g., gradient descent). Neural networks and deep learning. Arman Ahmadi Arman Ahmadi
DSCI 562 4 Regression II Useful extensions to basic regression, e.g., generalized linear models, mixed effects, smoothing, robust regression, and techniques for dealing with missing data. Alexi Rodríguez-Arelis Alexi Rodríguez-Arelis
DSCI 542 4 Communication and Argumentation How to interpret and present data science findings to a variety of audiences. Written and spoken presentation skills. Quan Nguyen Quan Nguyen
DSCI 524 4 Collaborative Software Development How to exploit practices from collaborative software development techniques in data scientific workflows. Appropriate use of abstraction, the software life cycle, unit testing / continuous integration, and packaging for use by others. Tiffany Timbers Florencia D’Andrea
DSCI 532 5 Data Visualization II How to make principled and effective choices with respect to marks, spatial arrangement, and colour. Analysis, design, and implementation of interactive figures. How to provide multiple views, deal with complexity, and make difficult decisions about data reduction. Florencia D’Andrea Florencia D’Andrea
DSCI 563 5 Unsupervised Learning How to find groups and other structure in unlabeled, possibly high dimensional data. Dimension reduction for visualization and data analysis. Clustering, association rules, model fitting via the EM algorithm. Varada Kolhatkar Varada Kolhatkar
DSCI 553 5 Statistical Inference and Computation II Bayesian reasoning for data science. How to formulate and implement inference using the prior-to-posterior paradigm. Alexi Rodríguez-Arelis Alexi Rodríguez-Arelis
DSCI 574 5 Spatial and Temporal Models Model fitting and prediction in the presence of correlation due to temporal and/or spatial association. ARIMA models. Quan Nguyen Quan Nguyen
DSCI 541 6 Privacy, Ethics, and Security The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies. Joel Östblom Joel Östblom
DSCI 554 6 Experimentation and Causal Inference Statistical evidence from randomized experiments versus observational studies. Applications of randomization, e.g., A/B testing for website optimization. Methods for dealing with the multiple testing problem. Alexi Rodríguez-Arelis Daniel Chen
DSCI 525 6 Web and Cloud Computing How to use the web as a platform for data collection, computation, and publishing. Accessing data via scraping and APIs. Using the cloud for tasks that are beyond the capability of your local computing resources. Gittu George Gittu George
DSCI 575 6 Advanced Machine Learning Advanced machine learning methods in the context of natural language processing (NLP) applications. Bag of words, recommender systems, topic models, natural language as sequence data, Markov chains, and recurrent neural networks. Varada Kolhatkar Varada Kolhatkar
DSCI 591 7 Capstone Project A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a reproducible analysis pipeline, project report, presentation and possibly other products, such as a dashboard. MDS teaching team MDS teaching team