```{r}
2 + 2
```
[1] 4
This session’s overall learning outcome is to:
Specific objectives are to:
Time: ~10 minutes
Both reproducibility and replicability are cornerstones for doing rigorous and sound science. Unfortunately, reproducibility in science is near to non-existent. However, being reproducible isn’t just about doing better science. It can also mean that:
One common way of being more reproducible when doing data analysis in R is to use Quarto. Quarto is a way of writing R (or Python) code alongside text in a way that allows you to create documents like HTML or Word where the output from R code is directly inserted into the document. For example, if you need to make a figure, you can write R code in the Quarto document so that when you generate a Word document the figure gets inserted automatically.
A main feature of using Quarto is that Quarto, when you render the output document, will always run the code in the order used in the file and in an fresh, empty environment (a new R session). This means that the output from the code and the results will be, at least within the document, reproducible.
If you have heard of R Markdown, Quarto is the next generation version of that.
A Quarto file is a file format (a plain text format like R scripts, .csv
, or .txt
files) with the extension .qmd
where you write text with Markdown syntax. Markdown is a markup syntax and formatting tool, like HTML, that allows you to write a document in plain text (e.g. like a .txt
or .csv
file). The Markdown text can then be converted into a vast range of other document types, e.g. HTML, PDF, Word documents, slides, posters, or websites. In fact, this website is built from Quarto! Check out Quarto’s Gallery to see a list of things you can create.
For now, we’re going to focus on the main reason that Quarto is used: to use R code and insert the R output into a document. By using R code in a document, you can easily switch between data analysis and document-writing. Which means that:
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
Briefly explain what the Command Palette is as you use it here. Talk through this section slowly, especially the YAML header section.
Now, we will create and save a Quarto file. We’ll use the Command Palette by using Ctrl-Shift-PCtrl-Shift-P and then typing “quarto” and select the one that says “Create a new Quarto document”. You can also use the menu, by going to File -> New File -> Quarto Document ...
, and a dialog box will then appear. Enter “Reproducible documents” in the title field and your name in the author field. HTML should be automatically selected as the output format. There’s also the option to use the “visual mode”. This mode is great if you are used to working with Word and you can test it out on your own later. For this course, we will focus on using the normal mode.
After clicking “Create”, the new file will open in RStudio. Before continuing, let’s save this file as learning.qmd
in the docs/
folder.
In the newly saved docs/learning.qmd
file, you will see some text that gives a brief overview of how to use the Quarto file. For now, let’s ignore the text. At the top of the file you will see something that looks a bit like this:
This section is called the YAML header and it contains the metadata about the document and the settings for how Quarto should process it into another document. Most Markdown documents have this YAML header at the top of the document and they are always surrounded by ---
on the top and bottom of the section.
YAML is a data format that has the form of a key: value
pairing to store data. The keys in this case are title
, author
, and format
. The values are those that follow the key (e.g. “Your Name” for author
). In the case of Quarto, these key
data are used to store the settings that Quarto will use to create the format
output document. The keys listed above are some of many settings that Quarto has available to use.
In the case of this YAML header, the Quarto document will generate an HTML file because of the format: html
setting. You can also create a word document by changing this to format: docx
.
It is possible to create PDF documents, though this requires installing a LaTeX distribution such as tinytex, which can sometimes be complicated to install.
So, how do we create a HTML (or Word) document from the this document? We do that by “rendering” it. At the top of the pane near the “Save” button, there is a button with the word “Render”. To render, you either click that button or use the shortcut Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) anywhere in the Quarto document. When you click the “Render” button, a bunch of processing messages should appear in a new pane beside the Console, followed by a new window popping up or in the Viewer pane with the newly created document or.
You’ve now created a HTML document! Let’s try making a Word document. Change the YAML value in the key format:
from html
to docx
. Then render the document again with the “Render” button or with the keybinding Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”). A Word document should open up if you have a Word processer installed. This is the basic approach to creating documents from Quarto.
You’ve already gotten a bit familiar with RStudio in the pre-course tasks, but if you want more details, RStudio has a great cheat sheet on how to use RStudio. The items to know right now are the “Console”, “Files”/“Help”, and “Source” tabs.
Code is written in the “Source” tab, where it saves the code and text as a file. You can send selected code to the Console from the opened file by typing Ctrl-EnterCtrl-Enter (or clicking the “Run” button). In the “Source” tab (where R scripts and Quarto files are shown), there is a “Document Outline” button (top right beside the “Run” button) that shows you the headers or “Sections” (more on that later). To open it you can either click the button, use the keybinding Ctrl-Shift-OCtrl-Shift-O or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “outline”), go through the menu to Code -> Show Document Outline
. The Command Palette is a very useful tool to learn, since you can easily access almost all features and options inside RStudio through it. Because of this reason, we will be using it a lot throughout the course. Open it up with Ctrl-Shift-PCtrl-Shift-P and then in the pop-up search bar, type out “document outline”. The first item should be the one we want, so hit Enter
to activate the Outline.
If you can’t remember a specific keybinding in RStudio, check out the help for it by going to the menu item Help -> Keyboard Shortcuts Help
.
Being able to insert R code directly into a document is one of the most powerful features of Quarto. This frees you from having to switch between programs when simultaneously writing text and running R code to derive output that you’d then put into your scientific document.
Running and including R code in Quarto is done using “R code chunks”. You insert these chunks into the document by placing the cursor at the location where you want the chunk to be, then using the shortcut Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”). Alternatively, you can also use the Command Palette with Ctrl-Shift-PCtrl-Shift-P followed by type “new chunk” or from the menu item Code -> Insert Chunk
.
Before we insert the code chunk, let’s delete all the text in your document, except for the YAML header (including the dashes surrounding it). Make sure that the YAML key format:
is set to html
. Then, place your cursor two lines below the YAML header and insert a code chunk with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”). In the code chunk, type out 2 + 2
. It should look something like:
You can run R code inside the code chunk using Ctrl-EnterCtrl-Enter on the line, which will send the code 2 + 2
to the R Console, with the output appearing directly below the code chunk in the document. Note that this output is temporary.
To see how the output is inserted into the HTML document, let’s render the document using Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to see what happens. The output 4
should appear below the code chunk in the HTML document, something like this:
Let them read it over, then briefly go over the content again. We don’t need to do most of this as a code-along, since we will be using them a lot throughout the course.
Emphasize that, in general, code with ()
means it is a function and that it does an action. Mention that, like everything, there are some situations where that isn’t completely true but it mostly is.
Time: ~5 minutes
Before moving on, let’s go over a bit about how R works, and what the “R session” means. An R session is the way you normally interact with R, where you would write code in the Console to tell R to do something. Normally, when you open an R session without an R Project, the session defaults to assuming you will be working in the ~/Desktop
or ~
(your Home folder) location. But this location usually isn’t where you actually work and where your R code is. You normally work in the folder that has your R scripts, Quarto documents, or data files. The assumption with R Projects on the other hand, is that the R session’s working directory should be where the R Project is, since that is where you have your R scripts and data files.
In R, everything is an object and every action is a function. A function is also an object, but an object isn’t always a function. To create an object, also called a variable, we use the <-
assignment operator. So, if we want to create an object called weight_kilos
and assign it the value 100
, we would write:
weight_kilos <- 100
weight_kilos
[1] 100
The new object now stores the value we assigned it. We can read it like:
“
weight_kilos
contains the number 100” or “put 100 into the objectweight_kilos
”
You can name an object in R almost anything you want, but it’s best to stick to a style guide. For example, we will use snake_case
to name things.
There are also several main “classes” (or types) of objects in R: lists, vectors, matrices, and data frames. For now, the only two we will cover are vectors and data frames. A vector is a string of values, while a data frame is multiple vectors put together as columns. Data frames are a form of data that you’d typically see as a spreadsheet. This type of data is called “rectangular data” since it has two dimensions: columns and rows.
So these are vectors, which have different types like character, number, or factor:
Notice how we use the #
to write comments or notes. Whatever we write after the “hash” (#
) tells R to ignore it and not run it.
This is what a data frame looks like, if we look at the built-in dataset called airquality
, which is a data frame object loaded by default when you start R:
head(airquality)
# A tibble: 6 × 6
Ozone Solar.R Wind Temp Month Day
<int> <int> <dbl> <int> <int> <int>
1 41 190 7.4 67 5 1
2 36 118 8 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
The c()
function puts values together and head()
prints the first 6 rows. Both c()
and head()
are functions since they do an action and they can be recognized by the ()
at their end. Functions take an input (known as arguments) and give back an output. Each argument is separated by a comma ,
. Some functions can take unlimited number of arguments (like c()
). Others, like head()
can only take a few arguments. In the case of head()
, the first argument requires a data frame object.
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
We showed a very basic example of how a code chunk works. Let’s continue by doing something slightly more complicated.
One of the major strengths of R, and many other programming languages, is in its ability for other people to create packages that simplify doing complex tasks. For example, if you need to use mixed effects models for your data analysis, you can use the lme4 package. Or if you want to create figures you can use the ggplot2 package. As you experienced from the pre-course tasks, installing packages is easy by using install.packages()
. Whenever we work with R, we very rarely work only with the base R functions. We usually use a lot of functions from many other packages, because that is one of the easiest ways for you to simplify your work! No need to re-invent the wheel 😁
One “meta-package” we will use throughout the course is called tidyverse. So let’s load the package up so we can use the functions from inside it.
First, go to the top of the docs/learning.qmd
file and create a new code chunk two lines below the YAML header with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”). In the newly created code chunk, type out setup
right after the r
. This area after the r
is where you write the code chunk label. In this case, we labeled the code chunk with the name setup
. Code chunk labels should be named without _
, spaces, or .
and instead should be one word or be separated by -
.
Emphasize the warning note below.
If you use a space in your chunk label, an error or warning may not necessarily occur, but there can be unintended side effects that you may not realize. This may likely cause quite a bit of annoyance and frustration.
You also can’t use duplicate code chunk labels in your document.
A nifty thing about using chunk labels is that you can get an overview of your code chunks using the “Document Outline” with Ctrl-Shift-OCtrl-Shift-O or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “outline”), but only if you have this option set up in: Tools -> Global Options -> R Markdown -> Show in document outline
.
The name setup
also has a special meaning in Quarto. When you run other code chunks in the document in a new (or restarted) session, Quarto will know to first look for and run the code in the setup
chunk. So this is the perfect place to add packages to use or code to load your data, which we will do in a later session.
The way you load a package and get access to the functions inside is by using the library()
function. So let’s load the tidyverse package by writing library(tidyverse)
in the code chunk. It should look like this:
Let’s run this code chunk by placing the cursor over the code and using Ctrl-EnterCtrl-Enter. After you run the code, you should see some text below the setup
chunk that might look something like this:
── Attaching core tidyverse packages ──────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
As we continue adding code to the docs/learning.qmd
file, we’ll also add some text to help organize the document and maybe you’ll also find it useful to add notes to yourself as you are coding along. In order for us to do that, we need to learn a bit about Markdown syntax.
Time: ~8 minutes
Formatting text in an output document like HTML or Word when using Markdown is done with the use of “special” characters or syntax. These special characters control whether the text will be bold, if it will be a header or a list, and so on. Almost every feature you will need to write a scientific document is available in Markdown, but it can’t do everything. If you can’t get Markdown to do what you want, our suggestion would be to try to fit your writing around Markdown, rather than force or fight with Markdown to do something it wasn’t designed to do. You might actually find that the simpler Markdown approach is easier than what you wanted or were thinking of doing, and that you can do quite a lot with Markdown’s capabilities.
You can access a quick guide to formatting features of Markdown using the RStudio menu: Help -> Cheat Sheets -> R Markdown Cheat Sheet
. Quarto also has a great guide to the Basics of Markdown.
Creating headers (like chapters or sections) is done by using one or more #
at the beginning of a line followed by some text. Headers should always be preceded and followed by an empty line:
Lists are created by adding either -
or 1.
to the beginning of a line. An empty line must be at the start and end of the list.
For unnumbered lists, it looks like:
And numbered lists look like:
Markdown | Output |
---|---|
**bold** |
bold |
*italics* |
italics |
super^script^ |
superscript |
sub~script~ |
subscript |
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
Time: ~10 minutes.
Get some practice writing Markdown by completing these tasks in the docs/learning.qmd
file. Use this scaffold below to help guide you by replacing the ___
with the instructions from above:
##
) called “About me” below the setup
code chunk.##
) called “Simple code”.3 * 3
and run by using Ctrl-EnterCtrl-Enter.When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
()
at the end.# Header 1
), text formatting (**bold**
) and lists (-
) in the Quarto file to format the text in the output document.