```{r}
2 + 2
```
[1] 4
Session objectives:
Both reproducibility and replicability are cornerstones for doing rigorous and sound science. As we’ve learned, reproducibility in science is lacking, which this course aims to address. However, being reproducible isn’t just about doing better science. It can also mean that:
Hopefully by the end of this session, you’ll want to start using R Markdown files for writing your manuscripts and other technical documents. Believe us, you can save so much time and make your work more reproducible once you learn how to incorporate text with R code. Plus, you can create some very aesthetically appealing reports, which are way easier to produce than if you had done it in Word.
R Markdown is a file format (a plain text format like R scripts or .csv
files) that allows you to be more reproducible in your analysis and to be more productive in your work. R Markdown is an extension of Markdown that integrates R code with written text (as Markdown formatting).
Quarto is a next generation version of R Markdown and chances are, if you’ve been using a fairly recent version of RStudio, you are already using it without realizing it. That’s because Quarto uses the same Markdown syntax as R Markdown. The only difference is that with Quarto, you can create more types of output documents (like books, websites, slides), you have more options for customization, and it’s easier to do and learn than R Markdown.
So, what is Markdown? It is a markup syntax and formatting tool, like HTML, that allows you to write a document in plain text. That text can then be converted into a vast range of other document types, e.g. HTML, PDF, Word documents, slides, posters, or websites. In fact, this website is built from Quarto! The Markdown used in Quarto is based on pandoc (“pan” means all and “doc” means document, so “all documents”). Pandoc is a very powerful, popular, and well-maintained software tool for document conversion. You can use Quarto to do any number of things. Check out Quarto’s Gallery to see a list of things you can create. Just a few example document types could be:
For now, we’re going to focus on the main reason that Quarto is used: to incorporate R code and output into a single document. By using R code in a document, you can have seamless integration between data analysis and document-writing.
Why would you use this? There are many reasons, with some of them being:
Now, we will create and save a Quarto file. We’ll use the Command Palette by using Ctrl-Shift-PCtrl-Shift-P and then typing “quarto” and select the one that says “Create a new Quarto document”. You can also use the menu, by going to File -> New File -> Quarto Document ...
, and a dialog box will then appear. Enter “Reproducible documents” in the title field and your name in the author field. HTML should be automatically selected as the output format. There’s also the option to use the “visual mode”. This mode is great if you are used to working with Word and you can test it out on your own later. For this course, we will focus on using the normal mode.
After clicking “Create”, the new file will open in RStudio. Before continuing, let’s save this file as learning.qmd
in the doc/
folder.
In the newly saved doc/learning.qmd
file, you will see some text that gives a brief overview of how to use the Quarto file. For now, let’s ignore the text. At the top of the file you will see something that looks a bit like this:
This section is called the YAML header and it contains the metadata about the document and the settings for how to process it into another document. Most Markdown documents have this YAML header at the top of the document and they are always surrounded by ---
on the top and bottom of the section. YAML is a data format that has the form of a key: value
pairing to store data. The keys in this case are title
, author
, and format
. The values are those that follow the key (e.g. “Your Name” for author
). In the case of Quarto, these key
data are used to store the settings that Quarto will use to create the format
output document. The keys listed above are some of many settings that Quarto has available to use.
In the case of this YAML header, the Quarto document will generate an HTML file because of the format: html
setting. You can also create a word document by changing this to format: docx
. You can also create PDF documents, though this requires installing LaTeX through the R package tinytex, which can sometimes be complicated to install. We will only cover HTML and Word documents in this course.
So, how do we create a HTML (or Word) document from the this document? We do that by “rendering” it. At the top of the pane near the “Save” button, there is a button with the word “Render” (if it’s R Markdown, it will be called “Knit” with a yarn symbol beside it). To render, you either click that button or use the shortcut Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) anywhere in the Quarto document.
When you click the “Render” button, a bunch of processing messages should appear in a new pane beside the Console, followed by a new window popping up with the newly created document. Alternatively, the HTML document may pop up in the “Viewer” pane.
You’ve now created a HTML document! Let’s try making a Word document. Change the YAML value in the key format:
from html
to docx
. Then render the document again with the “Render” button or with the keybinding Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”). A Word document should open up. This is the basic approach to creating documents from R Markdown or Quarto. Before continuing, let’s add and commit the newly created file into the Git history with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”).
Being able to insert R code directly into a document is one of the most powerful features of Quarto. This frees you from having to switch between programs when simultaneously writing text and running R code to derive output that you’d then put into your manuscript.
Running and including R code in Quarto is done using “R code chunks”. You insert these chunks into the document by placing the cursor at the location where you want the chunk to be, then using the shortcut Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”). With the Command Palette, you select the option from the menu to insert a new code chunk. You can also use the menu item Code -> Insert Chunk
to insert a new code chunk.
Before we insert the code chunk, let’s delete all the text in your document, with exception of the YAML header (including the dashes surrounding it). Make sure that the YAML key format:
is set to html
. Then, place your cursor two lines below the YAML header and insert a code chunk with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”). In the code chunk, type out 2 + 2
. It should look something like:
You can run R code inside the code chunk the same way as you would write it in an R script. Typing Ctrl-EnterCtrl-Enter on the line will send the code 2 + 2
to the console, with the output appearing directly below the code chunk in the document. Note that this output is temporary.
To ensure that the output is inserted into the HTML document, render the document using Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and see what happens in the resulting HTML document. The output 4
should appear below the code chunk in the HTML document, something like this:
2 + 2
[1] 4
This is a very simple example of how code chunks work. Things are usually more complicated than this though. Normally, we have to load R packages to use for our subsequent code, and this is no different in an Quarto document. We will set this up together now.
Create a new code chunk with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”) and then type setup
right after the r
. This area that you just typed ‘setup’ in is for code chunk labels. In this case, we labelled the code chunk with the name setup
. Code chunk labels should be named without _
, spaces, or .
and instead should be one word or be separated by -
. An error may not necessarily occur if you don’t follow this rule, but there can be unintended side effects that you may not realize and R will likely not tell you about it, probably causing you quite a bit of annoyance and frustration. Note, you can’t use duplicate code chunk labels in your document.
A nifty thing about using chunk labels is that you can get an overview of your code chunks using the “Document Outline” with Ctrl-Shift-OCtrl-Shift-O or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “outline”), but only if you have this option set up in: Tools -> Global Options -> R Markdown -> Show in document outline
.
The name setup
also has a special meaning for Quarto. When you run other code chunks in the document, Quarto will know to first look for and run the code in the setup
chunk. Therefore, this is a good place to put your library()
calls or other setup functions. Let’s enter some code to load the packages and the dataset we have been using to the setup chunk:
Let’s insert another code chunk below this one with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”), and simply put nhanes_small
in the chunk:
# A tibble: 10,000 × 8
age sex bmi diabetes phys_active bp_sys_ave bp_dia_ave
<dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 34 male 32.2 No No 113 85
2 34 male 32.2 No No 113 85
3 34 male 32.2 No No 113 85
4 4 male 15.3 No <NA> NA NA
5 49 female 30.6 No No 112 75
6 9 male 16.8 No <NA> 86 47
7 8 male 20.6 No <NA> 107 37
8 45 female 27.2 No Yes 118 64
9 45 female 27.2 No Yes 118 64
10 45 female 27.2 No Yes 118 64
# ℹ 9,990 more rows
# ℹ 1 more variable: education <chr>
Let’s run this code as we normally would in a script file, by placing the cursor over the code and using the shortcut Ctrl-EnterCtrl-Enter. We can also render the document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and see what it looks like. When the HTML document opens, you should see some text below the setup
chunk that might look something like this:
── Attaching packages ──────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.0
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ─────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
You probably don’t want this text in your generated document, so we will add a chunk option to remove this message. Chunk options are used to change how code chunks work. When adding them inside the code chunk, they always need to start with #|
. If you want to run the code but not show those messages and warnings, you can add the options #| message: false
and #| warning: false
:
If you want to hide the code, messages, warnings, and output, but still run the code, you can use the option #| include: false
.
Other common options are:
echo
: To show the code. Default value is true
. Use false
to hide.results
: To show the output. Default is markup
. Use hide
to hide or asis
as regular text (not inside a code block).eval
: To evaluate (run) the R code in the chunk. Default value is true
, while false
does not run the code.These options all work on the individual code chunk. If you want to set an option to all the code chunks (e.g. to hide all the code but keep the output), you can use Quarto’s execute
options. These options are added to the YAML header and will apply the settings to everything in the document. We won’t do this in this session, but here is what it looks like:
Let’s try running some R code to get Quarto to create a table. First, create a new header1 ## Table of results
and a new code chunk. Second, copy the code we worked on from the Data Wrangling session, in Section 7.14, which is shown below for you to copy from. Instead of using phys_active
, let’s change that to education
.
1 A “header” is something like a Chapter in books, or section titles in manuscripts like “Introduction” or “Results”.
doc/learning.qmd
# A tibble: 12 × 4
diabetes education mean_age mean_bmi
<chr> <chr> <dbl> <dbl>
1 No 8th Grade 51.8 28.8
2 No 9 - 11th Grade 46.3 28.6
3 No College Grad 46.0 27.3
4 No High School 46.1 28.9
5 No Some College 43.8 28.7
6 No <NA> 10.1 20.5
7 Yes 8th Grade 63 32.0
8 Yes 9 - 11th Grade 61.4 33.1
9 Yes College Grad 60.6 31.3
10 Yes High School 59.6 33.8
11 Yes Some College 58.9 33.0
12 Yes <NA> 16.7 26.1
Putting the cursor somewhere in the code, use the shortcut Ctrl-EnterCtrl-Enter to run the code and see what it looks like. This output is almost in a table format. We have the columns that would be the table headers and rows that would be meaningful table rows. Ideally, we would want this to be report-ready. The first thing we should remove are the NA
education rows, just like we did with diabetes
. Then, we’ll convert it into a more elegant table in the Quarto HTML output document, we use the kable()
function from the knitr package. Because we don’t want to load all of the knitr functions, we’ll use knitr::kable()
instead:
doc/learning.qmd
diabetes | education | mean_age | mean_bmi |
---|---|---|---|
No | 8th Grade | 51.8 | 28.8 |
No | 9 - 11th Grade | 46.3 | 28.6 |
No | College Grad | 46.0 | 27.3 |
No | High School | 46.1 | 28.9 |
No | Some College | 43.8 | 28.7 |
Yes | 8th Grade | 63.0 | 32.0 |
Yes | 9 - 11th Grade | 61.4 | 33.1 |
Yes | College Grad | 60.6 | 31.3 |
Yes | High School | 59.6 | 33.8 |
Yes | Some College | 58.9 | 33.0 |
Now, render with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and view the output in the HTML document. Pretty eh! Before continuing, let’s run styler using the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”) and then add and commit these changes into the Git history using Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”).
Time: ~20 minutes.
In the doc/learning.qmd
file, create a new header called ## Prettier table
along with a code chunk. Copy the code below (that includes some code we wrote above) and paste the code into the new chunk. Add the option #| echo: false
to the code chunk.
doc/learning.qmd
nhanes_small %>%
filter(!is.na(diabetes), !is.na(education)) %>%
group_by(diabetes, education) %>%
summarise(
mean_age = mean(age, na.rm = TRUE),
mean_bmi = mean(bmi, na.rm = TRUE)
) %>%
ungroup() %>%
mutate(
# Task 2a.
___ = ___(mean_age, ___),
___ = ___(mean_bmi, ___),
# Task 2b.
___ = ___(education)
) %>%
rename(
# Task 3.
"___" = ___,
"___" = ___,
"___" = ___,
"___" = ___
) %>%
knitr::kable(caption = "Mean values of Age and BMI for each education and diabetes status.")
Use mutate()
to perform the following wrangling tasks:
Rename diabetes
to "Diabetes Status"
, education
to Education
, and mean_age
and mean_bmi
to "Mean Age"
and "Mean BMI"
, using rename()
function. Hint: You can rename columns to include spaces by using "
around the new column name (e.g. "Diabetes Status" = diabetes
). Don’t forget, the renaming form is new = old
.
Run the code chunk to make sure the code works, including the knitr::kable()
function at the end of the pipe, with a table caption of your choice. If you want you can keep the same caption as is provided in the starting point below.
Run styler on the document with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “style file”).
Render the document to HTML with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and see what the table looks like.
End the exercise by adding, committing, and pushing the files to your GitHub repository with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”).
nhanes_small %>%
filter(!is.na(diabetes)) %>%
group_by(diabetes, education) %>%
summarise(
mean_age = mean(age, na.rm = TRUE),
mean_bmi = mean(bmi, na.rm = TRUE)
) %>%
ungroup() %>%
# 2. Round the means to 1 digit and
# modify the `education` column so that male and female get capitalized.
mutate(
mean_age = round(mean_age, 1),
mean_bmi = round(mean_bmi, 1),
education = str_to_sentence(education)
) %>%
# 3. Rename `diabetes` to `"Diabetes Status"` and `education` to `Education`
rename(
"Diabetes Status" = diabetes,
"Education" = education,
"Mean Age" = mean_age,
"Mean BMI" = mean_bmi
) %>%
knitr::kable(caption = "Mean values of Age and BMI for each education and diabetes status.")
Formatting text in Markdown is done using characters that are considered “special” and act like commands. These special characters indicate what text is bolded, what is a header, what is a list, and so on. Almost every feature you will need to write a scientific document is available in Markdown, although some are missing. If you can’t get Markdown to do what you want, our suggestion would be to try to fit your writing around Markdown, rather than force or fight with Markdown to do something it wasn’t designed to do. You might actually find that the simpler Markdown approach is easier than what you wanted or were thinking of doing, and that you can actually do quite a lot with Markdown’s capabilities.
You can access a quick guide to formatting features of Markdown using the RStudio menu: Help -> Cheatsheets -> R Markdown Cheat Sheet
. Quarto also has a great guide to the Basics of Markdown.
Creating headers (like chapters or sections) is done by using one or more #
at the beginning of a line. Headers should always be preceded and followed by an empty line:
Lists are created by adding either -
or 1.
to the beginning of a line and an empty line must be at the start and end of the list.
For unnumbered lists, it looks like:
which gives…
And numbered lists look like:
which gives…
**bold**
gives bold.*italics*
gives italics.super^script^
gives superscript.sub~script~
gives subscript.Quarto also allows you to include output in-text. For instance, if you wanted to add the mean of some values to the text, it would look like this:
The mean of BMI is
`r round(mean(nhanes_small$bmi, na.rm = TRUE), 2)`
.
which gives…
The mean of BMI is 26.66.
But note that using inline R code can only insert a single number or character value, and nothing more.
For more details about other Markdown “syntax”, check out Appendix F as well as the R Markdown cheatsheet (
Tools -> Cheatsheets
, which works for many Quarto features too) and Quarto’s Markdown Basics page. Continue to the exercise below.
Time: ~5 minutes.
Get some practice writing Markdown by completing these tasks in the doc/learning.qmd
file.
#
), called “Intro”, “Methods and Results”, and “Discussion”.##
) under “Methods and Results” called “Analysis”.**word**
) one word in each and italicize (*word*
) another.2 + 2
).Aside from tables, figures are the most common form of output inserted into documents. Like tables, you can insert figures into the document either with Markdown or R code chunks. We’ll do it with Markdown in this session and with R code in the next session. First, we need an image to use. Open a browser and search for a picture to use (we’re using a kitten, because they’re cute). Download the image, create a folder in doc/
called images
, and save the image in that folder. Then, in your Quarto document, use the Markdown syntax for images: ![Caption text](path/to/image.png)
. The image can be in png, jpeg, or pdf formats. If you download an image and intend to use it in an official document, you will need to add text on the source and author of the image for copyright purposes.
Gives…
You can include a link to a picture instead of downloading the image, though this may only work in HTML documents and only if you have internet access. Quarto has amazing image capabilities, which they show in the Figures guide.
Image files are always relative to the .qmd
file.
Render the document again with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and view the HTML document with the new picture. We can, if we want to, change the width and height of the image as well as its alignment. We do this by adding a {}
to the end of the Markdown image tag and put options inside there.
fig-align
: To align the figure, either in "center"
, "left"
, or "right"
.width
and height
: To set the image width and height for external images (not created by R). You can use percent to set the size as well, e.g. "75%"
.#fig-LABEL
: Use this to add a label so you can cross-reference it by typing inline @fig-LABEL
.For this image, we will change the width and height to "50%"
, and change the caption to something like "Kittens attacking flowers!"
, and add a label and reference:
Cute kitten in @fig-kitten-attack!
![Kitten attacking flowers!](images/kitten.jpg){#fig-kitten-attack width="50%" height="50%"}
Now in Figure 8.1, we see a kitten! Render again with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to see how the image changes.
For HTML documents, customizing the appearance (e.g. fonts) is pretty easy, since you can add settings to the YAML header that will change the theme. There is a setting that you provide under html
called theme
, where multiple different themes can be used that are listed in Quarto’s HTML Theming page. It would look like this, if we use a theme called yeti
:
Notice the indentations and use of colons. Indentation tells YAML that the key is actually a sub-key of the key above. The key theme
is a sub-key of html
, which is a sub-key (an setting) of format
. All the themes can be viewed directly on the Bootswatch page.
Modifying the theme and appearance of HTML documents is surprisingly easy after you learn a bit of CSS, which is a bit like YAML since it provides data in a key {subkey: value}
style pairing. We won’t cover that in this course though. On the other hand, modifying the appearance of Word documents is more difficult. That’s because Word doesn’t allow it to be easily modified programmatically like HTML can, since both are plain text file while Word is not plain text, but a propriety format (.docx
). So changing the appearance of the document itself requires that you manually create a Word template file first, manually point-and-click to modify the appearance, and then link to that template file with the reference-doc
option in the YAML header (as a sub-key of docx
). Quarto’s Word Templates page continues more details about this. We won’t be covering this in the course.
In general, there are multiple ways of collaborating on a document, some traditional approaches are:
The first workflow is not possible in a Quarto document since there isn’t a feature like Word’s “Track Changes”. Instead, you’d use a workflow that probably resembles how peer reviews are done; reading the document and making comments in a separate file to upload to the journal later. Or you’d use a workflow that revolves around GitHub and Git, an efficient workflow that has been tried and tested by tens of thousands of teams in tens of hundreds of companies globally. The goal of this course is to slowly move researchers more into the modern era, based on modern technology, tools, and workflows.
The second workflow is pretty similar while using Git and GitHub along with Quarto. You might split up a document into sections that each collaborator may work on, and then later on merge them together. This last approach is what we will get you to do for the group project.
knitr::kable()
# Header 1
), text formatting (**bold**
) and lists (-
) in the Quarto file.![Caption text](path/to/image.png)
.