Console
download.file(
url = "https://zenodo.org/records/4989220/files/Eliasson_data2.csv?download=1",
destfile = here::here("data/post-meal-insulin.csv")
)
This session’s overall learning outcome is to:
Specific objectives are to:
here::here()
function to make these paths.Very briefly go over this section again, mainly emphasizing that how you import your data depends on what data file format it is in.
Time: ~4 minutes
Before you can do any type of data analysis, you first need to import the data into R. There are several ways to import in a dataset, which are listed below. Don’t run these code, just read for now.
Using the RStudio menu File -> Import Dataset -> From Text/Excel/SPSS/SAS/Stata
(depending on your file type you want to import). This approach will also generate the code for you to use in the future, which you should copy and paste to your Quarto document or R script.
If the file is a .csv
file, use readr::read_csv()
to import the dataset. This is what we will be doing shortly.
If the dataset is a .rda
file, use load()
:
This loads the dataset into your R session so that you can use it again.
For SAS, SPSS, or Stata files, you can use the package haven to import those types of data files into R.
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
Now is the time to import some data into our R project. But we’re missing something: the data itself! We’ll download an open dataset from Zenodo (1), which is a online open science archive for various research outputs, like datasets, preprints, software, protocols, and teaching material. The data is from a study looking at the impact of a meal on insulin levels in different groups.
We want to download the data to our R project. First, make sure you are in your LearningR
R project. Then copy and paste the code below into the Console and run it.
Console
download.file(
url = "https://zenodo.org/records/4989220/files/Eliasson_data2.csv?download=1",
destfile = here::here("data/post-meal-insulin.csv")
)
This function downloads the file from the Zenodo website and then saves it to the data/
folder in your R project with the name post-meal-insulin.csv
. We are using the Console and not the Quarto document because we don’t want to run this code every time we render the document. There is the here::here()
function in the code, which we will explain in a bit.
Before we import the data into the R session, it’s useful to have a quick manual look at data to see if there are any things we should be aware of or consider when we do import it. So go to the File pane, go to the data/
folder, click the post-meal-insulin.csv
file, then click the “View file” option to open it up in RStudio. When we look at it directly, we see something that looks like:
OFS.ID;Group;Age;BMI;Length;Weight;Bone.mineral.DXA;Fat.m...
OFS 301;FDR;50;27,5;1,83;92;3,54;30,2;27,9;64,34;60,8;5,1...
OFS 302;FDR;51;33,7;1,77;105,6;4,05;36,4;38,7;67,75;63,7;...
OFS 304;FDR;43;26,3;1,84;89,1;3,77;24,4;21,9;67,97;64,2;4...
OFS 303;FDR;55;25,9;1,8;84;3,14;27,5;23,2;61,14;58;5;5,3;...
OFS 305;FDR;53;29,4;1,84;99,4;4,09;31,2;31,2;68,99;64,9;5...
You’ll notice that the data is separated by semicolons (;
) and not commas (,
). This is a common issue when working with data from Europe, as they often use semicolons as the delimiter since commas are often used for decimal places in numbers. For data actually separated by commas in comma separated value (CSV) files, we would use the read_csv()
function in the tidyverse package. But for data separated by semicolons, we use the read_csv2()
function instead.
So, in the setup
code chunk, write out the code to import (read) the data into R.
In the code chunk above, we are using the read_csv2()
function that is technically from the readr package, which is loaded in with tidyverse, to import the dataset. We’re also again using the here::here()
function, but there is a short reading task below to explain it, so we won’t cover this just yet. Let’s run this code by having our cursor of the line and using Ctrl-EnterCtrl-Enter. There will be some output (like shown above this paragraph). This output message informs you what R is doing when reading in the data and gives some basic details about the data.
Briefly explain the output of the read_csv2()
function to them.
Let’s print out the data into our Quarto document to see what it looks like. At the bottom of your docs/learning.qmd
file, create a new level 2 header called “Showing the data” and insert a new code chunk with Ctrl-Alt-ICtrl-Alt-I or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “new chunk”), which should look like the following:
Walk through how to use auto-completion.
To more quickly type out objects in R, use “tab-completion” or “auto-completion” to finish an object name for you. Normally RStudio will start auto-completing for you as you type code, but you can manually trigger auto-completion with Tab
. As you type out an object name, hit the Tab
key to see a list of objects available. RStudio will not only list the objects, but also shows the possible options and potential help associated with the object. Let’s do that for the new object we created by starting to type:
docs/learning.qmd
post_
Then hit tab. You should see a menu pop up with a list of potential matches. Hit tab again to finish with post_meal_data
. This simple tool can save so much time and can prevent spelling mistakes. Let’s finish adding the new data frame object into the code chunk.
Run the code using Ctrl-EnterCtrl-Enter to see the output. This way of showing the data can be useful to see bits of the data. Another way of getting a bit better view of the data is using the glimpse()
function. So, let’s use glimpse()
below the code we just write. Don’t forget to use auto-completion!
docs/learning.qmd
post_meal_data
# A tibble: 31 × 85
OFS.ID Group Age BMI Length Weight Bone.mineral.DXA
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 OFS 301 FDR 50 27.5 1.83 92 3.54
2 OFS 302 FDR 51 33.7 1.77 106. 4.05
3 OFS 304 FDR 43 26.3 1.84 89.1 3.77
4 OFS 303 FDR 55 25.9 1.8 84 3.14
5 OFS 305 FDR 53 29.4 1.84 99.4 4.09
6 OFS 306 FDR 51 23.7 1.8 76.8 3.21
7 OFS 307 FDR 48 23.9 1.78 75.8 3.33
8 OFS 308 FDR 35 22 1.75 67.5 3.26
9 OFS 309 FDR 54 26.4 1.83 87.9 4.49
10 OFS 310 FDR 52 24.5 1.72 72.2 2.87
# ℹ 21 more rows
# ℹ 78 more variables: Fat.mass...DXA <dbl>, Fat.mass.DXA <dbl>,
# Fat.free.mass.DXA <dbl>, Fat.free.soft.tissue.DXA <dbl>,
# FP.Glucose.screen <dbl>, P.Glucose..5.OGTT <dbl>,
# P.Glucose.0.OGTT <dbl>, P.GLucose.30.OGTT <dbl>,
# P.Glucose.60.OGTT <dbl>, P.Glucose.90.OGTT <dbl>,
# P.Glucose.120.OGTT <dbl>, FS.Insulin.screen <dbl>, …
glimpse(post_meal_data)
Rows: 31
Columns: 85
$ OFS.ID <chr> "OFS 301", "OFS 302", "OFS 304", "OFS…
$ Group <chr> "FDR", "FDR", "FDR", "FDR", "FDR", "F…
$ Age <dbl> 50, 51, 43, 55, 53, 51, 48, 35, 54, 5…
$ BMI <dbl> 27.5, 33.7, 26.3, 25.9, 29.4, 23.7, 2…
$ Length <dbl> 1.83, 1.77, 1.84, 1.80, 1.84, 1.80, 1…
$ Weight <dbl> 92.0, 105.6, 89.1, 84.0, 99.4, 76.8, …
$ Bone.mineral.DXA <dbl> 3.54, 4.05, 3.77, 3.14, 4.09, 3.21, 3…
$ Fat.mass...DXA <dbl> 30.2, 36.4, 24.4, 27.5, 31.2, 20.0, 1…
$ Fat.mass.DXA <dbl> 27.9, 38.7, 21.9, 23.2, 31.2, 15.3, 1…
$ Fat.free.mass.DXA <dbl> 64.34, 67.75, 67.97, 61.14, 68.99, 60…
$ Fat.free.soft.tissue.DXA <dbl> 60.8, 63.7, 64.2, 58.0, 64.9, 57.7, 6…
$ FP.Glucose.screen <dbl> 5.1, 5.2, 4.8, 5.0, 5.5, 5.4, 4.9, 5.…
$ P.Glucose..5.OGTT <dbl> 5.1, 5.6, 5.0, 5.3, 5.6, 5.8, 5.1, 5.…
$ P.Glucose.0.OGTT <dbl> 5.10, 5.40, 4.90, 5.15, 5.55, 5.60, 5…
$ P.GLucose.30.OGTT <dbl> 9.4, 8.4, 6.8, 8.8, 9.9, 8.9, 7.6, 6.…
$ P.Glucose.60.OGTT <dbl> 6.9, 8.4, 4.3, 8.4, 10.6, 9.1, 6.3, 7…
$ P.Glucose.90.OGTT <dbl> 4.7, 8.7, 3.8, 7.5, 11.0, 7.1, 5.3, 6…
$ P.Glucose.120.OGTT <dbl> 4.3, 7.2, 4.3, 6.3, 7.6, 7.0, 3.5, 6.…
$ FS.Insulin.screen <dbl> 50.0, 97.2, 20.8, 41.0, 41.0, 30.6, 3…
$ Insulin..5.OGTT.X <dbl> 53.4765, 90.2850, 20.8350, 52.0875, 4…
$ Insulin.0.OGTT.X <dbl> 51.74025, 93.75750, 20.83500, 46.5315…
$ Insulin.0.OGTT <dbl> 310.4415, 562.5450, 125.0100, 279.189…
$ Insulin.30.OGTT <dbl> 368.085, 486.150, 173.625, 638.940, 4…
$ Insulin.60.OGTT <dbl> 645.885, 673.665, 118.065, 1180.650, …
$ Insulin.90.OGTT <dbl> 326.4150, 972.3000, 90.2850, 1250.100…
$ Insulin.120.OGTT <dbl> 201.4050, 694.5000, 69.4500, 902.8500…
$ PG.15 <dbl> 5.4, 5.2, 4.9, 4.8, 5.9, 5.0, 5.0, 4.…
$ PG.5 <dbl> 5.4, 5.4, 5.0, 4.8, 5.8, 4.9, 5.1, 4.…
$ PG1 <dbl> 5.4, 5.4, 5.1, 4.8, 5.8, 5.1, 5.0, 4.…
$ PG2 <dbl> 5.5, 5.4, 5.1, 4.8, 5.9, 5.1, 5.0, 4.…
$ PG3 <dbl> 5.3, 5.4, 5.1, 4.7, 5.8, 5.1, 5.1, 4.…
$ PG5 <dbl> 5.3, 5.4, 5.0, 4.9, 5.8, 5.0, 5.1, 4.…
$ PG8 <dbl> 5.5, 5.5, 5.1, 4.9, 5.7, 5.0, 5.1, 4.…
$ PG10 <dbl> 5.5, 5.6, 5.1, 5.1, 5.7, 5.0, 5.2, 4.…
$ PG15 <dbl> 5.8, 6.0, 5.0, 5.3, 6.2, 5.5, 5.7, 5.…
$ PG20 <dbl> 6.1, 6.5, 5.2, 5.1, 7.2, 5.6, 6.1, 5.…
$ PG30 <dbl> 6.9, 7.5, 5.5, 5.5, 8.0, 6.1, 7.9, 6.…
$ PG45 <dbl> 7.6, 7.8, 5.8, 6.2, 9.3, 6.4, 9.8, 7.…
$ PG60 <dbl> 6.5, 7.5, 5.6, 6.7, 9.1, 5.9, 9.4, 6.…
$ PG90 <dbl> 5.3, 7.5, 4.5, 5.0, 7.6, 4.8, 7.3, 4.…
$ PG120 <dbl> 5.1, 7.5, 5.0, 4.7, 5.8, 3.8, 5.0, 4.…
$ CP.15 <dbl> 0.93, 0.99, 0.39, 0.84, 0.69, 0.51, 0…
$ CP.5 <dbl> 0.08, 1.05, 0.40, 0.81, 0.69, 0.51, 0…
$ CP1 <dbl> 0.90, 1.05, 0.41, 0.79, 0.71, 0.53, 0…
$ CP2 <dbl> 0.96, 1.07, 0.41, 0.79, 0.74, 0.54, 0…
$ CP3 <dbl> 0.95, 1.10, 0.41, 0.80, 0.80, 0.53, 0…
$ CP5 <dbl> 1.01, 1.15, 0.44, 0.96, 0.84, 0.57, 0…
$ CP8 <dbl> 1.09, 1.24, 0.45, 0.92, 0.82, 0.62, 0…
$ CP10 <dbl> 1.10, 1.19, 0.45, 0.93, 0.78, 0.59, 0…
$ CP15 <dbl> 1.120, 1.390, 0.410, 1.000, 0.890, 0.…
$ CP20 <dbl> 1.23, 1.67, 0.46, 0.97, 1.22, 0.82, 0…
$ CP30 <dbl> 1.52, 2.10, 0.57, 0.95, 1.50, 0.93, 1…
$ CP45 <dbl> 2.60, 2.00, 0.75, 1.50, 1.83, 1.07, 1…
$ CP60 <dbl> 2.40, 2.20, 0.89, 2.20, 2.10, 1.19, 2…
$ CP90 <dbl> 2.20, 2.60, 0.76, 2.70, 2.70, 1.51, 2…
$ CP120 <dbl> 1.61, 3.00, 0.70, 2.20, 2.00, 1.34, 1…
$ Insulin.15 <dbl> 76.3950, 83.3400, 19.4460, 76.3950, 5…
$ Insulin.5 <dbl> 66.9450, 104.1750, 21.5295, 69.4500, …
$ Insulin1 <dbl> 67.3665, 97.2300, 22.2240, 65.9775, 5…
$ Insulin2 <dbl> 76.3950, 97.2300, 22.9185, 63.8940, 6…
$ Insulin3 <dbl> 83.3400, 90.2850, 23.6130, 69.4500, 5…
$ Insulin5 <dbl> 90.2850, 125.0100, 30.5580, 111.1200,…
$ Insulin8 <dbl> 104.1750, 152.7900, 29.1690, 97.2300,…
$ Insulin10 <dbl> 97.2300, 138.9000, 26.3910, 90.2850, …
$ Insulin15 <dbl> 111.1200, 194.4600, 22.2240, 125.0100…
$ Insulin20 <dbl> 138.9000, 291.6900, 32.6415, 111.1200…
$ Insulin30 <dbl> 187.515, 319.470, 54.171, 104.175, 27…
$ Insulin45 <dbl> 465.315, 381.975, 76.395, 270.855, 32…
$ Insulin60 <dbl> 305.5800, 333.3600, 97.2300, 493.0950…
$ Insulin90 <dbl> 222.2400, 395.8650, 44.4480, 458.3700…
$ Insulin120 <dbl> 138.9000, 465.3150, 50.6985, 250.0200…
$ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12…
$ iauc_cp <dbl> 118.24750, 144.46000, 35.96000, 120.6…
$ auc_cp <dbl> 236.900, 278.110, 88.610, 233.540, 23…
$ iauc_pg <dbl> 82.97500, 242.20000, 39.43182, 83.750…
$ auc_pg <dbl> 805.55, 944.20, 693.95, 731.15, 1003.…
$ iauc_ins <dbl> 18522.315, 30992.062, 4482.650, 27388…
$ auc_ins <dbl> 28728.44, 42242.96, 7107.86, 37592.94…
$ kef_ins <dbl> 145.49775, 147.23400, 34.03050, 164.2…
$ kef_cp <dbl> 145.49775, 147.23400, 34.03050, 164.2…
$ kef_glu <dbl> 145.49775, 147.23400, 34.03050, 164.2…
$ iauc_cp_e <dbl> 98.72500, 120.30000, 29.00500, 110.17…
$ iauc_pg_e <dbl> 75.00000, 193.75000, 23.28409, 57.088…
$ iauc_ins_e <dbl> 16095.038, 24307.500, 3638.138, 25713…
$ glykemi <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
Run this line of code with Ctrl-EnterCtrl-Enter to see the output. Using glimpse()
is a great way to see the structure of the data, including the column names, the data type of each column, and the first few rows of the data. Let’s render the document with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) to see what the output HTML document looks like.
Before moving on to the reading task, let’s add the change’s we’ve made to the Git history with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”). If you work with personally sensitive (health) data, you would not commit the data file so that we don’t accidentally put it publicly on GitHub. Or if your data is really large, we don’t normally want to keep large files on GitHub since that isn’t the best place to store those types of files. But because the dataset is already an open dataset and because it is small, for this course we will save it to the Git history to get practice using Git. We also normally don’t commit the generated .html
files, but for this course we will commit them to the Git history so that you get practice with Git. Once done, read over the next section.
Time: ~5 minutes
The here()
function from the here package, usually referred to as “here here” and written with here::here()
so that we don’t have to load the package, helps to make it easier to manage file paths within an R Project.
So, what is a file path and why is this here package necessary? A file path is the list of folders a file is found in. For instance, your CV may be found in /Users/Documents/personal_things/CV.docx
. The problem with file paths when running code (like with R) is that when you run a script interactively (e.g. what we do in class and normally), the file path and “working directory” (the R session) are located at the Project level (where the .Rproj
file is found). You can see the working directory by looking at the top of the RStudio Console.
But! When you render a Quarto document, run an R script, or run the code in a non-interactive way, the R code may likely run in the folder it is saved in, e.g. in the docs/
folder. So your file path data/post-meal-insulin.csv
won’t work because there isn’t a folder called data/
in the docs/
folder.
LearningR <-- R Project working directory starts here.
├── R
│ └── README.md
├── data
│ └── README.md
├── data-raw
│ └── README.md
├── docs
│ ├── learning.qmd <-- Working directory when running not interactively.
│ └── README.md
├── .gitignore
├── DESCRIPTION
├── LearningR.Rproj <-- here() moves file path to start in this file's folder.
├── README.md
└── TODO.md
Often people use the function setwd()
in scripts, but this is never a good idea since using it makes your code runnable only on your computer. Which means, it is not reproducible! We use the here()
function to tell R to go to the project folder (where the .Rproj
file is found) and then use that file path. This simple function can make your work more reproducible and easier for you and others to use later on.
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
One thing you may have noticed when we render the document to HTML, is that there is a bunch of extra output, especially below the setup
code chunk. You probably don’t want this text in your generated document, so we will add a chunk option to remove this message.
Chunk options are used to change how code chunks work. When adding them inside the code chunk, they always need to start with #|
. In this case, we don’t want the messages or warnings when we load tidyverse or read in the dataset. The options to remove those messages and warnings are #| message: false
and #| warning: false
, so let’s add those to the setup
code chunk.
Let’s render the HTML document again with Ctrl-Shift-KCtrl-Shift-K or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “render”) and see what changes.
Briefly show these other options below to them on the screen and explain it, but don’t get them to read it.
Other common options are:
include
: Whether to include all the code, code output, messages, and warnings in the rendered output document. Default is true
. Use false
to hide everything but still run the code.echo
: To show the code. Default value is true
. Use false
to hide.results
: To show the output. Default is markup
. Use hide
to hide or asis
as regular text (not inside a code block).eval
: To evaluate (run) the R code in the chunk. Default value is true
, while false
does not run the code.These options all work on the individual code chunk. If you want to set an option to all the code chunks (e.g. to hide all the code but keep the output), you can use Quarto’s execute
options. These options are added to the YAML header and will apply the settings to everything in the document. We won’t do this in this session, but here is what it looks like:
docs/learning.qmd
Before moving on, let’s commit the changes we’ve made to the Git history with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”).
For the reading section, emphasize the characteristics of a “tidy” dataset.
Time: ~12 minutes
The concept of “tidy” data was popularized in an article (2) by Hadley Wickham and described in more detail in the Tidy Data chapter of the R for Data Science online book. Before we continue with tidy data, we need to cover something that is related to the concept of “tidy” and that has already come up in this course: the tidyverse. The tidyverse is an ecosystem of R packages that are designed to work well together, that all follow a strong “design philosophy” and common style guide. This makes combining these packages in the tidyverse much easier. We teach the tidyverse because of these reasons.
Part of the “tidy” part of tidyverse revolves around tidy data. A tidy dataset is when:
Take a look at the example “tidy” and “messy” data frames (also called “tibbles” in the tidyverse) below. Think about why each may be considered “tidy” or “messy”. What do you notice between the tidy versions and the messier versions?
# Datasets come from tidyr
# Tidy:
table1
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
# Partly tidy:
table2
# A tibble: 12 × 4
country year type count
<chr> <dbl> <chr> <dbl>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
7 Brazil 2000 cases 80488
8 Brazil 2000 population 174504898
9 China 1999 cases 212258
10 China 1999 population 1272915272
11 China 2000 cases 213766
12 China 2000 population 1280428583
# Messier:
table3
# A tibble: 6 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
# Messy:
table4a
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
# Messy:
table4b
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 19987071 20595360
2 Brazil 172006362 174504898
3 China 1272915272 1280428583
The “most” tidy version is table1
as each column describes their values (e.g. population is population size), each row is unique (e.g. first row is for values from Afghanistan from 1999), and each cell is an explicit value representative of its column and row.
table2
is a “long” version of table1
so it is partly “tidy”, but it doesn’t satisfy the rule that each variable has a column, since count
represents both cases and population size.
On the other hand, table3
is messy because the rate
column values are a composite of two other column values (cases and population), when it should be a single number (a percent). Both table4a
and table4b
have columns with ambiguous values inside. For example, you can’t tell from the data what the values in the 1999
column contain.
Tidy data has a few notable benefits:
The concept of tidy data also gives rise to “tidy code” for visualizing and wrangling. By using “verbs” (R functions) and chaining them together in “sentences” (in a sequential pipeline), you can construct meaningful and readable code that describes in plainer English what you are doing to the data. This is one simple way that you can enhance the reproducibility of your code. There are other ways to make your code tidier and more readable, all of which we will cover in this course.
#
hashes as well as using Markdown text to describe things that can’t be easily explained by the code, for example, the “why” behind the code.Whether working with either messy or tidy data, there are a few principles to follow:
data-raw/
folder.
data-raw/
depends on how you collected the data and how many collaborators are on your team. You may end up storing and processing the data in another folder as a project of its own.data/
folder.We are saving to data/
because the dataset is already collected and published, so the original raw data can always be downloaded again.
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
Time: ~10
Now that we’ve learned about the concept of “tidy” data, let’s discuss how tidy our data is. At a glance, there are approximately 7 things in this data that are mildly untidy or could be improved. Can you identify them?
The 6 main things are:
.
in the names and some use abbreviations.Insulin1
is insulin measured at 1 minute after eating the meal. This data is in the wide format, when it should be in the long format. We’ll cover long and wide in Chapter 11...
instead of .
. This likely means it was minus the number (-5
, 5 minutes before the test). So it should be renamed to something like .minus5
.id
and OFS.ID
.Group
column has abbreviations as FDR
and CTR
instead of the full names.Insulin.0.OGTT.X
and Insulin.0.OGTT
.Briefly go over this section with them, especially emphasize “Restart R”, reading the error or warning message, and checking for missing commas, brackets or misspelled words.
Time: ~10 minutes
You will encounter problems and errors when working with R, and you will encounter them all the time. In fact, a large amount of your time in R will be spent figuring out solutions to these errors (“debugging”).
RStudio has many cheat sheets of its own that can help you in your learning journey, which you can find with the Command Palette (Ctrl-Shift-PCtrl-Shift-P, then type “cheat sheet”). However, even with these cheat sheets, you will still encounter other problems like errors or warnings.
Error messages will appear in red text in your Console and will start with the word “Error:”. Warning messages are also (usually) in red text, but are often either harmless or informative, so make sure to read the message first and see if it says “Error” or not. Here are some initial steps to take when you encounter an error:
]
, )
, or }
?library()
function.If you still can’t find the problem, here are some other steps to take:
Restart the R session with Ctrl-Shift-F10Ctrl-Shift-F10 or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “restart”) or with the menu item Session -> Restart R
. Then load your packages (and data if needed) and run the code from the beginning, tracking which objects get created, and if the proper object name is used later on.
(Rarely need to do) Close/re-open RStudio and try again.
Use help()
or ?
to access built-in documentation about a function or package. You may be using the function incorrectly, so find out more about the function by looking at the built-in documentation. The documentation will open up in the “Help” pane of RStudio (bottom right-hand corner). Try it out: Enter either of the following commands into the Console and run it (hit Enter
).
Console
?colnames
help(colnames)
Sometimes, this documentation can be hard to read and seem overly complex for a beginner. You can also try finding the website for the package you are having trouble with, as they often have guides that are a little easier to understand. The tidyverse packages all have amazing documentation that you can use to help you with problems you may have.
Consider explaining the problem out loud to a colleague or friend. (or to yourself!) You might find that, in verbally going through the problem and explaining it, you will likely come up with the solution yourself.
Take a break and come back to it later!
Google it. Chances are that someone has already encountered that error and has asked about it online. In fact, those who are “experts” in coding languages like R are experts largely because of their skill in knowing the right words or terms or questions to ask Google. Usually googling the error message will be enough to find the answer, but sometimes you’ll need to include “R” or “rstats” and the relevant package or function as a keyword in your search.
If all else fails, you can always turn to the trusty online R community. Check StackOverflow, a coding-related question and answer website, to see whether your issue has already been asked and solved by others. If it hasn’t and you are considering submitting a question, make sure to read the posting guides beforehand to ensure that you are asking the question in a helpful way.
Final words: It is important to always work towards writing “better” and “neater” code, as this can make it easier to break down pieces of code and troubleshoot problems. Knowing how to do that takes some experience, that you can only get by practicing more coding!
When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the instructor 👒 🎩
read_csv()
or read_csv2()
to import data from a .csv
(comma or tab separated) file.here::here()
to make file paths easier to manage in your R project.glimpse()
to get an overview of the data.echo
, message
, or warning
to control what is shown in the output of the code chunk.?
to get help on an R object. Use the cheat sheets to help guide you in learning R.This lists some, but not all, of the code used in the section. Some code is incorporated into Markdown content, so is harder to automatically list here in a code chunk. The code below also includes the code from the exercises.