Virtual Environments: R

Learning outcomes

  1. Understand what is a computational environment and how can ensure the reproducibility of a project
  2. Differenciate Python, Anaconda, MiniConda, Conda and pip
  3. Manage packages and environments in Python using Conda
  4. Manage packages and environments in R using renv

Platform in focus R, RStudio

R environments

In R, environments are managed by {renv}, which works with similar principles as conda, and other virtual environment managers, but the commands are different. To see which commands are used in {renv}, you can visit the project website: https://rstudio.github.io/renv/articles/renv.html.

Briefly,

  • renv::init(): create a new env in the current directory/project
  • renv::snapshot(): save/export the environment to a file (renv.lock), and installing and removing packages are done as usual via the install.packages() and remove.packages() commands.
  • renv::restore(): updates current environment from the renv.lock file

{renv} assumes and enforces the “one project one environment” mantra of virtual environments. Unlike conda that lets you “activate” an environment and let you move across different projects, {renv} prefers it when all your work is contained in a single project.

Note

For MDS, this might mean you may need a separate {renv} file for each lab assignment.

Activity 1

Which of the following commands is used to initialize a new renv environment in an R project?

A. `renv::create()`
B. `renv::activate()`
C. `renv::init()`
D. `renv::start()`

One of the other distinguishing factors between R and Python is CRAN (the package repository for R), guarantees that on any given calendar day, all the packages on CRAN are installable.

An example R Project

Here we’ll go through an example of using {renv} in the context of an RStudio project. Unlike conda environments, you don’t necessarily, “activate” and “deactivate” it manually. One of the nice things is {renv} works naturally within an RStudio project.

The New Project creation wizard in R Studio has an option to enable the folder as a git repository, and also to set it up with {renv}.

This does the same thing as running renv::init() in the current project.

Environment Files

When you either create a new project with renv enabled or run renv::init() in the current working directory (getwd()), a few new files will be created

  1. .Rprofile in the current project directory
  2. renv/ directory
  3. renv.lock file

Before we talk about which each file and directory does, now when you start a new R session or re-open your RStudio project, you will see a line that will read something like:

- Project '~/Desktop/my_r_project' loaded. [renv 1.0.7]

How does R know that there is a project renv? It’s because of the .Rprofile file. This is a special file that R will read at the start of every session. This file exists in 3 places, it will search for them in the following order, and once it finds 1 of them, it will not look for any more.

  1. The current project or working directory
  2. The user home directory
  3. The system R installation

There is a similar file called .Renviron that only stores key-value environment variables. You can read more about these initialization files here.

The {renv} .Rprofile will add a single line:

source("renv/activate.R")

This is the line that “activates” the renv environment for you automatically, assuming you are in the correct working directory.

All the environment files and packages are installed in the renv/ directory. This directory has its own .gitignore file that will ignore the correct things for you in that directory, so you do not need to worry about checking in your entire set of packages.

Finally, the renv.lock file is the file that contains all the packages used in the repository, its version, and where it was installed from (e.g., CRAN or GitHub).

Snapshot Packages

When you install and use R packages in your project, you will occasionally run renv::snapshot(). This will update the renv.lock file with all the packages and dependencies for you. Do not forget to add, commit, and push this file to your remote repository.

Restore Packages

As you work on a project with other people and packages get updated, you will occasionally run renv::restore() to install any missing packages on your local r environment. Occasionally renv may tell you things are out of sync and will prompt you to run renv::status() or renv::restore().

Takeaway

{renv} is not the same as a conda environment where you can activate an environment, and move anywhere in your filesystem to run code. Many virtual environment systems actually assume you have a single environment for each project/directory. Sometimes IDES allow you do override the currently activated environment, but it lowers the reproducibility of your projects if you do that.

Finally, the renv.lock file does track every package and dependency you use in your project, which is a bit different from the recommendation we gave in the conda environment section. To just track individual packages and not the entire snapshot, you will learn about the DESCRIPTION file used for packaging later on in the program. However, we can get away with this because CRAN does guarantee all packages to be installable on any given calendar day.