These instructions will walk you through installing the required Data Science software stack for the UBC Master of Data Science program. Before starting, ensure that your laptop meets our program requirements:

  • Windows 10 Professional, Enterprise or Education (Build 15063 or later).
  • can connect to networks via a wireless connection (and preferably also a wired connection)
  • has at least 40 GB disk space available
  • has at least 4 GB of RAM
  • uses a 64-bit CPU
  • is 4 years old or newer at the start of the program
  • uses English as the default language

Table of Contents

GitHub

In MDS we will use GitHub.com as well as an Enterprise version of GitHub hosted here at UBC, GitHub.ubc.ca. Please follow the set-up instructions for both below.

GitHub.com

If you do not yet have one, sign up for a free account at https://github.com/ .

GitHub.ubc.ca

For us to add you to the MDS organization on Github.ubc.ca we need you to login using your CWL:

visit Github.ubc.ca to do this.

This step is required for

  • being able to store your work
  • all homework submission and grading
  • working collaboratively

Git

We will be using the command line version of Git as well as Git through RStudio.

Go to https://git-scm.com/download/win and a download should automatically start (if not, follow the instructions on that page). After the download has finished, run the installer selecting the following options:

  • On the Select Components page, check “On the Desktop” under “Additional icons”
  • On the Choosing the default editor used by Git page, select “Use the Nano editor by default” from the drop-down menu
  • For all other pages, use the default options

After installation, test if you were successful by opening the Git Bash program (if you followed the install instructions above, you should now have an icon on your Desktop). Below is a picture of the Git Bash icon on the Desktop and an opened instance of the Git Bash program:

Python

We will be using Python for a large part of the program, including many popular 3rd party Python libraries for scientific computing. Anaconda is an easy-to-install distribution of Python and most of these libraries (as well as Jupyter notebooks, one of the developing environments we will be using). We strongly recommend that you use Anaconda for this program. If you insist on using your own Python setup instead of Anaconda, we will not be able to provide technical support with installation or later issues. For this program we are using Python 3 , not Python 2, so please choose the Anaconda versions that include Python 3.7

Head to https://www.anaconda.com/download/#windows and download the Anaconda version for Windows with Python 3.7. After the download has finished, run the installer selecting the following options:

  • On the Advanced Installation Options page, check both boxes (see image below)
  • For all other pages, use the default options

After installation, open Powershell (select the Start button, type powershell, and click Windows Powershell from the list) and type the following:

python

you should see something like this (in the first line) if you were successful:

Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32

If instead you see Python 2.7.X you installed the wrong version. Follow these instructions to delete this installation and try the installation again, selecting Python 3.7.

To see if Jupyter was successfully installed in the Anaconda Python distribution, restart Powershell and type the following:

jupyter lab

A browser should have launched and you should see a page that looks like the screenshot below.

If you already have installed Anaconda at some point in the past, we recommend that you update to the latest Anaconda version by updating conda. In powershell, type the following:

conda update conda
conda update anaconda

R, IRkernel and RStudio

We will be using R, another programming language, a lot in the program. We will use R both in Jupyter notebooks and in RStudio. To have R work in Jupyter notebooks we will also have to install the IR kernel.

R

Go to https://cran.r-project.org/bin/windows/base/ and download the latest version of R for Windows (Should look something like this: Download R 3.6.1 for Windows). Open the file and follow the installer instructions.

Note: Although it is possible to install R through Anaconda, we highly recommend not doing so. In case you have already installed R using Anaconda you can remove it by executing conda uninstall r-base.

RStudio

Chose and download the Windows version of RStudio from https://www.rstudio.com/products/rstudio/download/#download. Open the file and follow the installer instructions.

To see if you were successful, try opening RStudio by clicking on its icon. It should open and looks something like this picture below:

Rtools

Windows users will also need to install Rtools, which will allow you to use external libraries. Go to http://cran.r-project.org/bin/windows/Rtools/ and download the latest version (e.g., Rtools35.exe). After the download has finished, run the installer selecting the following options:

  • On the Select Additional Tasks page, check the Add rtools to system PATH option (see image below)
  • For all other pages, use the default options

To see if you were successful, open RStudio and type install.packages("xtable", type="source") into the console panel. If this install command runs successfully, Rtools should be installed!

IR kernel

Open RStudio and type the following commands into the console panel:

install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest', 'tidyverse'))
devtools::install_github('IRkernel/IRkernel')
IRkernel::installspec()

To see if you were successful, try running Jupyter Lab and seeing if you have working R kernel. To launch the Jupyter Lab type the following in the Windows Powershell:

jupyter lab

A browser should have launched and you should see a page that looks like the screenshot below. Now click on “R” notebook (circled in red on the screenshot below) to launch an Jupyter Lab with an R kernel.

Sometimes a kernel loads, but doesn’t work as expected. To test whether your installation was done correctly now type library(tidyverse) in the code cell and click on the run button to run the cell. If your R kernel works you should see something like the image below:

Rscript

To be able to call R scripts from the Powershell or Git Bash you need to be able to execute the Rscript program from there. To make this possible you need to add the the file path of the directory/folder that houses the Rscript program (i.e., location where this program lives on your computer) to the Windows PATH system variable. You can modify the PATH system variable using System Utility in control panel on Windows. We outline below how to do this:

  • Hit the windows key to get to the windows search bar and type: edit environ
  • This should bring up a menu item that will take you to where you can edit the Environment Variables. It should look something like this:

  • Double click on the “Path” entry from the top box. This should bring you to a new window titled “Edit Enviroment Variables”.

  • Here, click on an empty line and then click Browse. This will open a dialogue box within which you can navigate to the folder where Rscript is installed (usually C:\Program Files\R\R-3.6.1\bin) and click OK. Close all remaining windows by clicking OK.

After following the instructions above, restart Powershell and type the following:

Rscript --version

you should see something like this if you were successful:

R scripting front-end version 3.6.1 (2019-07-05)

PostgreSQL

We will be using PostgreSQL as our database management system. You can download it from here. Follow the instructions for the installation. In the password page, type whatever password you want, but make sure you’ll remember it later. For all the other options, use the default. You can execute SQL Shell to test if you were successful.

Visual Studio Code

We need a text editor to be able to write complete applications. One is available through Jupyter, but sometimes it is helpful to have a standalone text editor, for this we will be using the open-source text editor Visual Studio Code (VS Code). You can download VS Code at https://code.visualstudio.com/download. At “Select Additional Tasks” make sure that “Add to path” is selected (I also like to select all the other three items - see figure below).

You can test that VS code is installed and can be opened from the Powershell by running the following command:

code --version

you should see something like this if you were successful:

1.36.1
2213894ea0415ee8c85c5eea0d0ff81ecc191529
x64

LaTeX

Download the installer from http://mirror.ctan.org/systems/texlive/tlnet/install-tl-windows.exe. After downloading run the install-tl-windows.exe file and follow the installation instructions choosing the default options.

note - if a screen appears titled “Windows protected your PC”, click on “More info” link and the “Run anyway” button

After installation, restart Powershell and type the following to ask for the version:

latex --version

you should see something like this if you were successful:

pdfTeX 3.14159265-2.6-1.40.19 (TeX Live 2018/W32TeX)
kpathsea version 6.3.0
Copyright 2018 Han The Thanh (pdfTeX) et al.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.
Compiled with libpng 1.6.34; using libpng 1.6.34
Compiled with zlib 1.2.11; using zlib 1.2.11
Compiled with xpdf version 4.00

Make

We will be using Make to automate our analysis scripts. More on this later!

Go to https://github.com/gnu-mcu-eclipse/windows-build-tools/releases and download gnu-mcu-eclipse-windows-build-tools-2.12-20190422-1053-win64.zip.

Unpack the archive and copy it into the %userprofile%\AppData\Roaming\GNU MCU Eclipse (for example C:\Users\user_name\AppData\Roaming\GNU MCU Eclipse) folder.

Next, like you did when installing Rscript, you need to add the folder where make.exe is located to the PATH variable. The make.exe file is located in C:\Users\user_name\AppData\Roaming\GNU MCU Eclipse\Build Tools\2.12-20190422-1053\bin.

After installation, restart Powershell and type the following to ask for the version:

make --version

you should see something like this if you were successful:

GNU Make 4.2.1
Built for i686-pc-msys
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Docker

You will use Docker to create reproducible, sharable and shippable computing environments for your analyses. For this you will need a Docker account. You can sign up for a free one here.

After signing-up and signing into the Docker Store, go here and click on the “Get Docker” button on the right hand side of the screen. Then follow the installation instructions on that screen.

After installation, restart Powershell and type the following to ask for the version:

docker run hello-world

you should see something like this if you were successful:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
c04b14da8d14: Pull complete
Digest: sha256:0256e8a36e2070f7bf2d0b0763dbabdd67798512411de4cdcf9431a1feb60fd9
Status: Downloaded newer image for hello-world:latest

Hello from Docker!

Attributions