Short Description

Converting data from the form in which it is collected to the form needed for analysis. How to clean, filter, arrange, aggregate, and transform diverse data types, e.g. strings, numbers, and date-times.

Learning Outcomes

By the end of the course, students are expected to be able to:

  1. Analyze and determine appropriate ways of manipulating a single data table using various techniques including: filtering rows or observations based on a criterion or combination of criteria; selecting variables (columns); arranging observations or variables in a deliberate way (e.g., sorting, grouping); forming new variables from one or more existing variables; reshaping data; computing summaries on groups of observations based on one or more categorical variables.
  2. Handle common and tricky data types; manipulate text, dates/times, strings, and regular expressions; detect and handle duplicates and outliers.
  3. Determine appropriate manipulations for two-table data, including lookups and joins with suitably-selected columns.
  4. Handle non-tabular data (e.g., nested lists) in the languages being used (e.g., Python, R); convert data among general formats (e.g., XML, JSON).

Prerequisites

  • DSCI 521 (Computing Platforms for Data Science)
  • DSCI 511 (Programming for Data Science)

Reference Material

  • TBD

Instructor (2016-2017)

Note: information on this page is preliminary and subject to change.