5  Setting up an R project

Because it is the first session, take it slowly. Regularly check for learner’s understanding and make use of the stickies/hats to do so.

5.1 Learning objectives

This session’s overall learning outcome is to:

  1. Explain why it’s important where file and folders are stored, how they are named, and how they are organized from a reproducibility and ease of use perspective. Then use R packages and tools in RStudio to help create and manage these file and folders as a project.

Specific objectives are to:

  1. Describe why the contents of a project, like files, should be self-contained within a single folder and sub-folders, and how that helps with reproducibility and helping you keep organized.
  2. Use built-in tools in RStudio to make it easier to manage R projects.
  3. Explain why naming files is the first step to effectively communicating the contents of the file and describe a commonly used style guide for file naming.

5.2 📖 Reading task: What is a project and why use it?

Time: ~5 minutes

Before we create a project, we should first define what we mean by “project”. What is a project? In a research context, a project is a set of files that together lead to some type of scientific “output” or “product”, for instance a scientific document. Use data for your output? That’s part of the project. Do any analysis on the data to give some results? Also part of the project. Write a document based on the data and results? Have figures inserted into the output document? These are also part of the project.

More and more how we make a claim in a scientific product is just as important as the output describing the claim. This includes not only the written description of the methods but also the exact steps taken, that is, the code used. So, using a project file organization can help with keeping things self-contained and easier to track and link with your scientific product. Here are some things to consider when working in projects:

  • Organise all files necessary for the scientific product in one folder (also called “directory”) along with sub-folders so it is more self-contained (doesn’t rely on other components in your computer).
  • Use a common and consistent folder and file structure for your projects.
  • Use version control to track changes to your files.
  • Make raw data “read-only” (don’t edit it directly) and use code to show what was done.
  • Whenever possible, use code to create output (figures, tables) rather than manually creating or editing them.
  • Think of your code and project like you do your scientific document or thesis: that other people will eventually look at it and review it, and that it will likely also be published or archived online.

These simple steps can be huge steps toward being reproducible in your analysis. And by managing your projects in a reproducible fashion, you’ll not only make your science better and more rigorous, it also makes your life easier too!

Sticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩

5.3 💬 Discussion activity: How do you organise your files and projects?

Time: ~8 minutes

This seems so basic, how files are organized on computers. We literally work with files all the time on computers. But consider, how do you organize them? Take some time to discuss and share with your neighbour.

  1. Take 1 minute to think to yourself.
  2. Take 5 minutes to discuss and share with your neighbour.
  3. For the remaining time, we will all share our thoughts with the group.

5.4 RStudio and R Projects

RStudio helps us with managing projects by making use of R Projects. RStudio R Projects make it easy to divide your work projects into a “container”, that have their own working directory (the folder where your analysis occurs), workspace (where all the R activity and output is temporarily saved), history, and documents.

Briefly mention to the learners about challenges that can happen when using OneDrive, Dropbox, or other file synchronization services.

Warning

File synchronizing and backup services like OneDrive or Dropbox are very common. Unfortunately, they also can be a major source of frustration and difficulty when working with data analysis projects. This is because of the way they synchronize files, by constantly looking at changes to files and then synchronizing when a change occurs. When doing data analysis, especially as you get more advanced and use reproducible documents and version control systems, changes to files can happen very often and very quickly. This can essentially cause these services to “spasm” and may overwrite the changes that are happening. Whenever possible, always save your work on your computer and not on these services.

There are many ways one could organise a project folder. We’ll be setting up a project folder and file structure using prodigenr. We’ll use RStudio’s “New Project” menu item under:

File -> New Project.. -> New directory -> Scientific Analysis Project using prodigenr

We’ll call the new project LearningR. Save it on your Desktop/. See Figure 5.1 for the steps to do it:

Figure 5.1: Creating a new analysis project in RStudio.
Tip

You can also type the below function into the Console, but we won’t do that in this session.

Console
prodigenr::setup_project("~/Desktop/LearningR")

After we’ve created a New Project in RStudio, we’ll have a bunch of new files and folders.

LearningR
├── .git/
├── R/
│   └── README.md
├── data/
│   └── README.md
├── data-raw/
│   └── README.md
├── docs/
│   └── README.md
├── .gitignore
├── DESCRIPTION
├── LearningR.Rproj
├── README.md
└── TODO.md

This forces a specific and consistent folder structure to all your work. Think of this like the “Introduction”, “Methods”, “Results”, and “Discussion” sections of your paper. Each project is then like a single scientific report, that contains everything relevant to that specific project. There is a lot of power in something as simple as a consistent file and folder structure. Projects are used to make life easier. Once a project is opened within RStudio the following actions are automatically taken:

  • A new R session (process) is started.
  • The R session’s working directory is set to the project directory.
  • RStudio project options are loaded.

Each R project is designated with a .Rproj file. This file contains information about the file path and various metadata. So, when opening an R project, you need to open it using the .Rproj file.

A project can be opened by either double clicking on the .Rproj from your file browser or from the file prompt within R Studio:

File -> Open Project

or

File -> Recent Project.. -> LearningR

Within the project we created, there are several README files in each folder that explain a bit about what should be placed there. Briefly:

  1. Documents like reports, theses, conference abstracts, and exploration type documents should be put in the docs/ directory, including Quarto files. We will cover this later in Chapter 6.
  2. Data, raw data, and metadata should be in either the data/ directory or in data-raw/ for the raw data. We’ll explain this more in Chapter 8.
  3. All .R files (called scripts) and code should be in the R/ directory.
  4. The .git/ directory is where Git stores all the information about the changes made to the project’s file. This is called version control and we’ll cover this in Chapter 7.

5.5 📖 Reading task: What’s in a (file) name?

Briefly go over this section with them to reinforce what they read.

Time: ~10 minutes

Something so simple as the name of a file can have a huge impact on how easy it is for others, including yourself in the future, as well as computers, to navigate and work on your project. Everything we humans do comes down to effective communication. And naming your files is one form of communication, so be effective, clear, and explicit when doing it!

Look over some example names for files below and think about what they communication, how effective they are at it, and what are some good and bad things you might notice.file naming. Which file names are good names and which aren’t? We’ll briefly discuss them afterwards about why some are good names and others are not.

fit models.R
fit-models.R
foo.r
stuff.r
get_data.R
Thesis version 10.docx
thesis.docx
new version of analysis.R
trying.something.here.R
plotting-regression.R
utility_functions.R
code.R

Just like how we have standards as grammar and rules for writing natural languages like English in order to effectively communicate to others and ourselves, there are also standards for how and what to name files. In general, it’s better to follow a commonly used style guide like the tidyverse guide for file naming. Based on their guide, files should generally be lowercase and use either _ or - instead of a space, though using - tends to be more commonly used. File names should also generally describe what the file contains. So, using these guidelines, we’ll critically review the file names from above.

Spacing

While humans have no problem with spaces in file names, computers often do. It becomes especially obvious that computers don’t handle spaces well as you start coding more in R or other programming languages. Unfortunately, computers rarely warn you or encourage you to not use spaces, so it’s up to you to remember to not use them. Instead of spaces, use - or _ to separate words in a file name.

# Bad: Has a space.
fit models.R
Thesis version 10.docx
new version of analysis.R

# Good: No spaces.
fit-models.R
get_data.R
thesis.docx
plotting-regression.R
utility_functions.R

Descriptive

Communication is about describing something in a way that someone else understands it in the way that you are trying to convey it. The same goes for file names. A good file name should describe what the file contains. This is especially important when you have many files in a project. If you can’t tell what a file contains from its name, you’ll have to open it to find out. This can waste a lot of time for you in the future and for others when they look at your project. So, make sure to name your file to briefly describe what it contains or what it does.

# Bad: Not descriptive.
foo.r
stuff.r
code.R
new version of analysis.R
trying.something.here.R

# Good: Descriptive.
get_data.R
thesis.docx
plotting-regression.R
utility_functions.R

Dots

Computers often use . dots to indicate the file extension or file type. So, when you name files, don’t use dots in the name unless the text after the dot is the file extensions.

# Bad: Using dots
trying.something.here.R
Sticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩

5.6 Summary

  • Use prodigenr to assist with setting up a new project, with standard files and folders to begin working on a data analysis project.
  • Use R Projects in RStudio to manage your project and make it easier to work with R in it.
  • How you name your files is the first step to communicating the files’ contents with yourself in the future and with others. Use the tidyverse style guide for file naming.