Short Description
Interactive vs. scripted/unattended analyses and how to move fluidly between them. Reproducibility through automation and dynamic, literate documents. The use of version control and file organization to enhance machine- and human-readability.
Learning Outcomes
By the end of the course, students are expected to be able to:
- Analyze data interactively using read-eval-print-loop (REPL) processes; write scripts for non-interactive use; use tools and work styles to create fluidity between these two modes (e.g., RStudio IDE, iPython).
- Perform dynamic reporting functions such as integrating narrative, code, data, numerical results, and visual results; create reproducible reports and workflows (e.g., R Markdown, Project Jupyter).
- Manage projects by designing workflows for self-documentation, reproducibility, and collaboration; organize files with appropriate naming conventions; manage paths and dependencies.
- Use version control software (e.g., Git) including distributed version control and remote servers (e.g., GitHub, Bitbucket).
- Automate data science workflows (using e.g., Make, Galaxy).
Prerequisites
- DSCI 511 (Programming for Data Science)
- DSCI 521 (Computing Platforms for Data Science)
Reference Material
TBD
Instructor (2016-2017)
Note: information on this page is preliminary and subject to change.