This is the course handbook for WolfWorks: Reproducible Research with R.


Objectives:


Starting with an R Project

When using R for an extended task (e.g., all analyses relating to a project), it is good practice to keep all data, code, figures and results in a self-contained folder - a working directory. RStudio provides an easy way to do this through its “Projects” interface.

An R Project is essentially just a working directory with an associated .RProj file. When you open a project the working directory will automatically be set to the working directory in which the .RProj file is located.

Benefits of using an R project:


Creating an R Project

  1. Start RStudio.
  2. Under the File menu, click on New Project. Choose Existing Directory
  3. Click the Browse... button and select a convenient location on your system
  4. Click on Create Project
  5. (Optional) Set Preferences to ‘Never’ save workspace in RStudio:

RStudio’s default preferences generally work well, but saving a workspace to a .RData file can be computationally heavy, especially if you are working with larger datasets as this would save all the data that is loaded into R into the .RData file. To turn that off, go to Tools –> Global Options and select the ‘Never’ option for Save workspace to .RData' on exit.


The working directory

Your working directory is the location on your computer where R will default to when reading or writing files. You can check where your current working directory is by typing getwd() into your console and executing.

getwd()
## [1] "/Users/User/repos/Wolfson_Reproducible_R_Website"

You can think of your working directory as a room in which R is sat. When you want R to fetch something from your computer, a data file for example, you have to give R a path to get to there and this path needs to start from where R already is i.e., your working directory. For example, if I wanted to load the “” file from within my “data” folder, I’d need to tell R to first go into “data”, then grab ““. Hence, the relative path would look like this:

"data/filename"

Alternatively, if I want to access a file that is not below my working directory in the file (i.e., I need to go leave the room to find another door), I can use ../ to mean go up one level. For example, if I wanted to access another folder within “repos”, I would need to leave the “Wolfson_Intro_R” room (which is my current working directory) and enter “repos”, so my filepath would be:

"../repos"

These are what we call relative file paths - the file path is given relative to our working directory. The alternative is to provide an absolute file path, which starts at the highest point in your computer system. For example, to access my “” file via an absolute path I would use:

"/Users/User/repos/Wolfson_Intro_R/data/filename"

For the rest of this course we will be using relative file paths from our working directory. Make sure your working directory contains a folder called “Data” with the relevant files in. If you wish to change your working directory to the place where you have this folder, you can use setwd() and provide an absolute file path to where the directory should be.


Organising your working directory

It is best to use a consistent folder structure across all of your projects as this will make it easier to find and file things in the future. In general, you may create sub-directories (folders within your project directory) for scripts/notebooks, raw data, processed data, results and figures. These may vary depending on the exact nature of your project.

Importantly, it is good practice to follow the standard R naming nomenclature when we name our sub-directories. This means avoiding the use of spaces and instead using _ or -.

A typical working directory structure
Sub-directory Description
raw_data Store raw data sets
processed_data Store data generated via processing of R data - intermediate files
results Store final results files
figures Store figures generated via code during the processing and analysis of data
scripts Store R scripts or notebooks, and potentially a separate folder for functions


For this course, we will use a raw_data folder for our raw files, a processed_data folder for our intermediate processed files, and a figures folder for any figures that we generate. We will also have a scripts folder to store any scripts or R Markdown files that we create. The aim of generating this organised file structure is two-fold:

  1. to help ourselves and others to understand what the project is about, what data was used, what analyses were run
  2. to help ourselves and others to repeat the analysis over again and gain the same result

Challenge
If you have not done so already, create an R project in a suitable location on your device. Navigate to the Files tab on the right of the screen and use the New Folder button to add the three required sub-directories to your working directory.