Console
prodigenr::setup_project("~/Desktop/LearningR")
Because it is the first session, take it slowly. Regularly check for learner’s understanding and make use of the stickies/hats to do so.
This session’s overall learning outcome is to:
Specific objectives are to:
Time: ~5 minutes
Before we create a project, we should first define what we mean by “project”. What is a project? In a research context, a project is a set of files that together lead to some type of scientific “output” or “product”, for instance a scientific document. Use data for your output? That’s part of the project. Do any analysis on the data to give some results? Also part of the project. Write a document based on the data and results? Have figures inserted into the output document? These are also part of the project.
More and more how we make a claim in a scientific product is just as important as the output describing the claim. This includes not only the written description of the methods but also the exact steps taken, that is, the code used. So, using a project file organization can help with keeping things self-contained and easier to track and link with your scientific product. Here are some things to consider when working in projects:
These simple steps can be huge steps toward being reproducible in your analysis. And by managing your projects in a reproducible fashion, you’ll not only make your science better and more rigorous, it also makes your life easier too!
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
Time: ~8 minutes
This seems so basic, how files are organized on computers. We literally work with files all the time on computers. But consider, how do you organize them? Take some time to discuss and share with your neighbour.
RStudio helps us with managing projects by making use of R Projects. RStudio R Projects make it easy to divide your work projects into a “container”, that have their own working directory (the folder where your analysis occurs), workspace (where all the R activity and output is temporarily saved), history, and documents.
Briefly mention to the learners about challenges that can happen when using OneDrive, Dropbox, or other file synchronization services.
File synchronizing and backup services like OneDrive or Dropbox are very common. Unfortunately, they also can be a major source of frustration and difficulty when working with data analysis projects. This is because of the way they synchronize files, by constantly looking at changes to files and then synchronizing when a change occurs. When doing data analysis, especially as you get more advanced and use reproducible documents and version control systems, changes to files can happen very often and very quickly. This can essentially cause these services to “spasm” and may overwrite the changes that are happening. Whenever possible, always save your work on your computer and not on these services.
There are many ways one could organise a project folder. We’ll be setting up a project folder and file structure using prodigenr. We’ll use RStudio’s “New Project” menu item under:
File -> New Project.. -> New directory -> Scientific Analysis Project using prodigenr
We’ll call the new project LearningR
. Save it on your Desktop/
. See Figure 5.1 for the steps to do it:
You can also type the below function into the Console, but we won’t do that in this session.
Console
prodigenr::setup_project("~/Desktop/LearningR")
After we’ve created a New Project in RStudio, we’ll have a bunch of new files and folders.
LearningR
├── .git/
├── R/
│ └── README.md
├── data/
│ └── README.md
├── data-raw/
│ └── README.md
├── docs/
│ └── README.md
├── .gitignore
├── DESCRIPTION
├── LearningR.Rproj
├── README.md
└── TODO.md
This forces a specific and consistent folder structure to all your work. Think of this like the “Introduction”, “Methods”, “Results”, and “Discussion” sections of your paper. Each project is then like a single scientific report, that contains everything relevant to that specific project. There is a lot of power in something as simple as a consistent file and folder structure. Projects are used to make life easier. Once a project is opened within RStudio the following actions are automatically taken:
Each R project is designated with a .Rproj
file. This file contains information about the file path and various metadata. So, when opening an R project, you need to open it using the .Rproj
file.
A project can be opened by either double clicking on the .Rproj
from your file browser or from the file prompt within R Studio:
File -> Open Project
or
File -> Recent Project.. -> LearningR
Within the project we created, there are several README files in each folder that explain a bit about what should be placed there. Briefly:
docs/
directory, including Quarto files. We will cover this later in Chapter 6.data/
directory or in data-raw/
for the raw data. We’ll explain this more in Chapter 8..R
files (called scripts) and code should be in the R/
directory..git/
directory is where Git stores all the information about the changes made to the project’s file. This is called version control and we’ll cover this in Chapter 7.Briefly go over this section with them to reinforce what they read.
Time: ~10 minutes
Something so simple as the name of a file can have a huge impact on how easy it is for others, including yourself in the future, as well as computers, to navigate and work on your project. Everything we humans do comes down to effective communication. And naming your files is one form of communication, so be effective, clear, and explicit when doing it!
Look over some example names for files below and think about what they communication, how effective they are at it, and what are some good and bad things you might notice.file naming. Which file names are good names and which aren’t? We’ll briefly discuss them afterwards about why some are good names and others are not.
fit models.R
fit-models.R
foo.r
stuff.r
get_data.R
Thesis version 10.docx
thesis.docx
new version of analysis.R
trying.something.here.R
plotting-regression.R
utility_functions.R
code.R
Just like how we have standards as grammar and rules for writing natural languages like English in order to effectively communicate to others and ourselves, there are also standards for how and what to name files. In general, it’s better to follow a commonly used style guide like the tidyverse guide for file naming. Based on their guide, files should generally be lowercase and use either _
or -
instead of a space, though using -
tends to be more commonly used. File names should also generally describe what the file contains. So, using these guidelines, we’ll critically review the file names from above.
While humans have no problem with spaces in file names, computers often do. It becomes especially obvious that computers don’t handle spaces well as you start coding more in R or other programming languages. Unfortunately, computers rarely warn you or encourage you to not use spaces, so it’s up to you to remember to not use them. Instead of spaces, use -
or _
to separate words in a file name.
# Bad: Has a space.
fit models.R
Thesis version 10.docx
new version of analysis.R
# Good: No spaces.
fit-models.R
get_data.R
thesis.docx
plotting-regression.R
utility_functions.R
Communication is about describing something in a way that someone else understands it in the way that you are trying to convey it. The same goes for file names. A good file name should describe what the file contains. This is especially important when you have many files in a project. If you can’t tell what a file contains from its name, you’ll have to open it to find out. This can waste a lot of time for you in the future and for others when they look at your project. So, make sure to name your file to briefly describe what it contains or what it does.
# Bad: Not descriptive.
foo.r
stuff.r
code.R
new version of analysis.R
trying.something.here.R
# Good: Descriptive.
get_data.R
thesis.docx
plotting-regression.R
utility_functions.R
Computers often use .
dots to indicate the file extension or file type. So, when you name files, don’t use dots in the name unless the text after the dot is the file extensions.
# Bad: Using dots
trying.something.here.R
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩