These instructions will walk you through installing the required Data Science software stack for the UBC Master of Data Science program. Before starting, ensure that your laptop meets our program requirements:

  • runs one of the following operating systems: macOS Yosemite 10.10.3 or later;
  • can connect to networks via a wireless connection (and preferably also a wired connection);
  • has at least 40 GB disk space available;
  • has at least 4 GB of RAM;
  • uses a 64-bit CPU;
  • is 4 years old or newer at the start of the program;
  • uses English as the default language;

Table of Contents

GitHub

In MDS we will use GitHub.com as well as an Enterprise version of GitHub hosted here at UBC, GitHub.ubc.ca. Please follow the set-up instructions for both below.

GitHub.com

If you do not yet have one, sign up for a free account at GitHub.com.

GitHub.ubc.ca

For us to add you to the MDS organization on Github.ubc.ca we need you to login using your CWL:

visit Github.ubc.ca to do this.

This step is required for

  • being able to store your work
  • all homework submission and grading
  • working collaboratively

Git

We will be using the command line version of Git as well as Git through RStudio and JupyterLab. There are some new Git commands that we will use that are only available as of Git 2.23, thus to get this newest version we will ask you to install Git using a tool called Homebrew, and thus we will install Homebrew first and then Git.

Open Terminal (how to video) and type the following command to install Homebrew:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Next, install Git using Homebrew. Do this by typing the following in the Terminal:

brew install git

After installation, in terminal type the following to ask for the version:

git --version

you should see something like this if you were successful:

git version 2.23.0

If you run into trouble, please see that Install Git > Mac OS section from Happy Git and GitHub for the useR for additional help or strategies for Git installation.

Python

We will be using Python for a large part of the program, including many popular 3rd party Python libraries for scientific computing. Anaconda is an easy-to-install distribution of Python and most of these libraries (as well as Jupyter notebooks, one of the developing environments we will be using). We require that you use Anaconda for this program. If you insist on using your own Python setup instead of Anaconda, we will not be able to provide technical support with installation or later issues. For this program we are using Python 3 , not Python 2, so please choose the Anaconda versions that include Python 3.7

Head to https://www.anaconda.com/download/#macos and download the Anaconda version for Mac OS with Python 3.7. Follow the instructions on that page to run the installer.

After installation, in terminal type the following to ask for the version:

python --version

you should see something like this if you were successful:

Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

If instead you see Python 2.7.X you installed the wrong version. Follow these instructions to delete this installation and try the installation again, selecting Python 3.7.

To see if Jupyter was successfully installed in the Anaconda Python distribution, quit and restart Terminal and type the following:

jupyter lab

A browser should have launched and you should see a page that looks like the screenshot below.

If you already have installed Anaconda at some point in the past, we recommend that you update to the latest Anaconda version by updating conda. In Terminal, type the following:

conda update conda
conda update anaconda

R, IRkernel and RStudio

We will be using R, another programming language, a lot in the program. We will use R both in Jupyter notebooks and in RStudio. To have R work in Jupyter notebooks we will also have to install the IR kernel.

R

Go to https://cran.r-project.org/bin/macosx/ and download the latest version of R for Mac (Should look something like this: R-3.6.1.pkg). Open the file and follow the installer instructions.

After installation, in Terminal type the following to ask for the version:

R --version

you should see something like this if you were successful:

R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

Note: Although it is possible to install R through Anaconda, we highly recommend not doing so. In case you have already installed R using Anaconda you can remove it by executing conda uninstall r-base.

RStudio

Chose and download the Mac version of RStudio from https://www.rstudio.com/products/rstudio/download/#download. Open the file and follow the installer instructions.

To see if you were successful, try opening RStudio by clicking on its icon (from Finder, Applications or Launchpad). It should open and looks something like this picture below:

IR kernel

Open RStudio and type the following commands into the Console panel:

install.packages(c('IRkernel', 'tidyverse'))

Next, open terminal and type the following:

R -e "IRkernel::installspec()"

To see if you were successful, try running Jupyter Lab and seeing if you have working R kernel. To launch the Jupyter Lab type the following in Terminal:

jupyter lab

A browser should have launched and you should see a page that looks like the screenshot below. Now click on “R” notebook (circled in red on the screenshot below) to launch an Jupyter Lab with an R kernel.

Sometimes a kernel loads, but doesn’t work as expected. To test whether your installation was done correctly now type library(tidyverse) in the code cell and click on the run button to run the cell. If your R kernel works you should see something like the image below:

PostgreSQL

We will be using PostgreSQL as our database management system. You can download it from here. Follow the instructions for the installation. In the password page, type whatever password you want, but make sure you’ll remember it later. For all the other options, use the default. You can execute SQL Shell after the installation to test if the installation was successful.

Visual Studio Code

We need a text editor to be able to write complete applications. One is available through Jupyter, but sometimes it is helpful to have a standalone text editor, for this we will be using the open-source text editor Visual Studio Code (VS Code). You can download VS Code at https://code.visualstudio.com/download. Follow the installation instructions.

Once the installation finishes, copy the Visual Code Studio app from the Downloads folder to the Applications folder. Next run the following command in Terminal:

cat << EOF >> ~/.bash_profile
# Add Visual Studio Code (code)
export PATH="\$PATH:/Applications/Visual Studio Code.app/Contents/Resources/app/bin"
EOF

You can test that VS code is installed and can be opened from Terminal by restarting terminal and typing the following command:

code --version

you should see something like this if you were successful:

1.36.1
2213894ea0415ee8c85c5eea0d0ff81ecc191529
x64

LaTeX

LaTeX allows to use syntax to write nicely formatted mathematical expressions and equations. For this program we only need the smaller BasicTeX package.

  1. Download the BasicTeX package from here.
  2. Open the .pkg file and run the installer with default options.
  3. BasicTeX has a few missing files we’ll need. To install these files, restart terminal and run:
    sudo tlmgr update --self
    sudo tlmgr install framed
    sudo tlmgr install titling
    

After installation, in terminal type the following to ask for the version:

latex --version

you should see something like this if you were successful:

pdfTeX 3.14159265-2.6-1.40.20 (TeX Live 2019)
kpathsea version 6.3.1
Copyright 2019 Han The Thanh (pdfTeX) et al.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.
Compiled with libpng 1.6.36; using libpng 1.6.36
Compiled with zlib 1.2.11; using zlib 1.2.11
Compiled with xpdf version 4.01

Make

We will be using Make to automate our analysis scripts. More on this later! To install Make type the following in the Terminal:

xcode-select --install

To test if Make is successfully installed, in Terminal type the following to ask for the version:

make --version

you should see something like this if it is successfully installed:

GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

Docker

You will use Docker to create reproducible, sharable and shippable computing environments for your analyses. For this you will need a Docker account. You can sign up for a free one here: https://store.docker.com/signup?next=%2F%3Fref%3Dlogin

After signing-up and signing into the Docker Store, go here: https://store.docker.com/editions/community/docker-ce-desktop-mac and click on the “Get Docker” button on the right hand side of the screen. Then follow the installation instructions on that screen.

To test if Docker is working, after installation open the Docker app by clicking on its icon (from Finder, Applications or Launchpad). Next open Terminal and type the following:

docker run hello-world

you should see something like this if you were successful:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete 
Digest: sha256:451ce787d12369c5df2a32c85e5a03d52cbcef6eb3586dd03075f3034f10adcd
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Attributions