r3::check_git_config()
If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.
On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.
6 Version control with Git
Session objectives:
- Learn about “formal” version control and its importance.
- Learn about Git for version control and apply RStudio’s integrated Git tools.
- Learn and apply the basic workflow of Git version control: View changes to files, record and save those changes to a history, and synchronize those changes to an online repository (GitHub).
- Use GitHub to collaborate with others on a project.
6.1 What is version control?
6.2 What is Git?
6.3 Basics of Git
6.4 Exercise: How might you use Git in your work?
Time: ~10 minutes.
One of the biggest comments people make when first learning about Git is: “I see some of the benefits, but can’t see how I would use it in my own work. How or why would I?” So, to tackle that question right away, take some time to brainstorm and discuss with your neighbour how you might use Git right now or in the near future.
- Take 1 minute to think to yourself.
- Take 6 minutes to discuss and brainstorm with your neighbour.
- In the final few minutes, we will all share with the group some thoughts.
6.5 Using Git in RStudio
Git was initially created to be used in the terminal (i.e command-line). However, because RStudio has a very nice interface for working with Git, we’ll be using that interface so we don’t have to switch to another application. While the terminal provides full access to Git’s power and features, the vast majority of daily use can be done through RStudio’s interface.
To access the Git interface in RStudio, click the Git icon beside the “Go to” search bar (see Figure 6.4) and then click the “Commit…” option (Ctrl-Alt-M
). You can also use the Command Palette by using Ctrl-Shift-P
and then typing “commit”.
The Git interface should look something like Figure 6.5 below. A short written description is given below the image.
- This is the “Changes” and “History” buttons that allow you to switch between views. Changes is what is currently changed in your files relative to the last history item. History is the record of what was done, to what file, when, and by whom.
- These are the “Push” and “Pull” buttons that are used to synchronize with GitHub, which we will cover later in the session.
- This is the panel that lists the files that have been modified in some way. You add (“stage”) files here that you want to be put into the history.
- This is the Commit Message box where you write the message about the changes that will be put into the history.
- This is the panel that shows what text has been modified, added, or removed from the file selected in panel 3. Green highlight indicates that something has been added, while red indicates a removal. Changes are detected at the line level (what line in the file). For files that aren’t plain text-based (e.g. Word), you can not see what specifically was changed, it will only say that there is a change.
So far, it should show a bunch of files that we’ve added and used over the previous session. In the Git interface, select the README.md
file. You should see the text in the file, all in green. Green means the text has been added. Red, which you will see shortly, means text was removed.
Now click the “Staged” checkbox besides the README.md
file to get it ready to be saved into the history. You’ve now “added” it to the staged area. Note that when you have a lot of files to stage, you can stage them all at once by pressing Ctrl-A
to highlight and select all the files and then clicking the “Staged” checkbox (or hitting Space
) on one of the files. The box on the right side is where you type out your “commit” message. “Commit” means you save something to the history of changes. You “commit” it to the history, like you “commit” something to your own memory.
Ok, now write something like “Add initial README file” in the commit message box and commit the change. After clicking “Commit”, you’ll notice that the README.md
file is no longer on the left side. That’s because we’ve put the change into the history. We can view the history by clicking the “History” button in the top-left corner of the RStudio Git interface. Here you can see what has been done in previous commits.
The “History” section is quite powerful. As long as you commit something into the Git history, it will never be completely gone1. For instance, we can see the full contents of a file at a specific commit by clicking the commit, moving to the file you want to look into, and clicking the link that says View file @ ...
. Try that with the first commit of the README file. See how it shows what was there before you made more changes?
1 This isn’t completely true, you can delete stuff, like if you accidentally add a password or personal data.
Next, open up the README.md
file in RStudio using the Files tab. At the top of the file, write your name and your field of research, and then save the file. Open up the Git interface again (with the Git icon or with Ctrl-Shift-P
, then type “commit”). You should now see the added text in green. Alright, now “Stage” the change (click the checkbox), write a message like “added my name to README file”, and commit the change. Go back to the history and you should see the two commits in your repository. If you don’t see it in the history, you likely need to click the refresh button at the top.
A question that may come up is “how often should I commit”? In general, it’s better to commit fairly frequently and to commit changes that are related to each other and to the commit message. Following this basic principle will make your history easier for you to read and make it easier for others as well.
6.6 Exercise: Committing to history
Time: 15 minutes.
When working on your own projects and when you use Git, you will be committing a lot of changes to your files into the Git history. Part of the initial barrier is simply getting used to this workflow of committing what you’ve changed. Use this exercise to get some practice. We will be using this workflow often throughout the rest of the course.
- Practice the add-commit (“add to staging”-“committing to history”) sequence by adding and committing each of the remaining files in your R project one at a time into the Git history (e.g. the
.gitignore
, the.R
files, and the.Rproj
file). While you could add and commit them all at once, we want you to do them one at a time so you practice using this workflow.- Make sure to write a meaningful and short message about what you added and why. In this case, the “why” is simply that you are saving the file into the history for the first time.
- Once all the files have been added and committed, add a new line to the
R/learning.R
with an R comment (starts with a#
). Type out something like “This will be used for testing out Git”. Add and commit that new line you’ve written.
6.7 “Remotes”: Storing your repository online
A version control system that didn’t include a type of external backup wouldn’t be a very good system, because if something happened to your computer, you’d lose your Git repository. In Git, this “external” backup is called a “remote” (meaning it is something that is separate from and in a different location, usually online, than the main repository). The remote repository is essentially a duplicate copy of the history (the .git/
folder) of your local repository (on your computer), so when you synchronize with the remote, as illustrated in Figure 6.8, it only copies over the changes made as commits in the history.
One of the biggest reasons why we teach Git is because of the popularity of several Git repository hosting sites. The most popular one is GitHub (which this course is hosted on). In this session, we’ll be covering GitHub not only because it is very popular, but also because the R community is almost entirely on GitHub.
graph TB linkStyle default interpolate basis A('Remote':<br>GitHub) --- B('Local':<br>Your computer) style A fill:White,stroke:DarkBlue,stroke-width:1.5px; style B fill:White,stroke:DarkBlue,stroke-width:1.5px;
Let’s get familiar with GitHub. More details about manually creating repositories on GitHub is found in Section E.1.
When using GitHub, especially in relation to health research, you need to be mindful of what you save into the Git history and what you put up online. Some things to think about are:
- Do not save any personal or sensitive data or files in your Git repository.
- Generally don’t save very large files, like big image files or large datasets.
In both cases, it’s better to use another tool to store files like that, rather than through Git and GitHub.
6.8 Using GitHub as a remote
6.9 Exercise: Creating a GitHub token with usethis
Time: ~20 minutes
Since we use R, there is a really useful set of functions from usethis to make it easy interact with and setup connections to GitHub from RStudio. Complete the Connect to GitHub guide for this exercise. In the end, you should have your LearningR
project on GitHub.
6.10 Synchronizing with GitHub
After creating the token, we can now push and pull any changes you make to the files.
- Make sure you are in the
LearningR
R Project, which you should see in the top right corner, above the Console pane. If you aren’t, switch to it by clicking the button in the top right corner and selecting theLearningR
project from the menu. - Open up the
README.md
and add a random sentence somewhere near the top of the file. - Save the file.
- Open the Git interface, by hitting
Ctrl-Alt-M
(orCtrl-Shift-P
, then type “commit”) anywhere in RStudio or going to theGit button -> Commit
. - Stage the file.
- Add a commit message.
- Commit the new change by clicking the “Commit” button.
- Click the “Push” button in the top right corner of the Git interface (Box 2 of Figure 6.5). A pop-up will indicate that it’s pushing and will tell you when it’s done.
Now let’s try the opposite by committing and pulling changes from GitHub to your local repository.
- Go to your
LearningR
GitHub repository. You should see the new change is also on the GitHub repository. - Click the
README.md
file on the GitHub website and then click the “Edit” button (see the video below, which shows it for random repository calledlearning-github
). - Add another random sentence somewhere near the top of the file.
- Scroll down to the commit message box, and type out a commit message.
- Click the “Commit” button.
- Go back to RStudio, open the Git interface and now click the “Pull” button in the top right corner beside the “Push” button.
- Wait for it to finish pulling and check your
README.md
file for the new change. You’ve now updated your project!
6.11 Dealing with file conflicts between the local and remote
Normally Git is very good at synchronizing and merging changes between a local repository and its remote repository. However, sometimes, when it encounters a problem it doesn’t know how to fix, it stops merging the two histories and lets you manually fix the problem. This is called a “merge conflict” and it is when one or more files have changes that conflict with one another.
An example would be when you make a change to some code on your work computer, then on another day are working on the code on your home computer and make a slightly different edit to the same code. Normally, if you keep your repository synchronized, this wouldn’t be a problem. But sometimes you forget to synchronize, so when you do, Git may detect a conflict on the same lines in a file between the histories of the local and remote repositories. At this point, Git stops and gives you control to resolve it.
Let’s force a conflict to happen. In your LearningR
RStudio R Project, open up the Git interface and make sure that you don’t see R/learning.R
anywhere in the list and that you’ve pushed and pulled already with your GitHub repository. Then, open up R/learning.R
and add the text # Here's an example of a conflict
to the very first line. In the Git interface, add and commit this change but don’t push the changes!
Then, go to your GitHub repository and open up the R/learning.R
file. Click the “Edit” button, as you learned previously in the session. In the first line of the file, add the text # When a conflict happens
. Write a simple commit message and commit the change.
Now, go back to your RStudio project, open the Git interface and click “Pull”. You can try to push first, but when there are differences between your local and remote repository, Git will prevent you from pushing to GitHub until you first pull. Once you pull, Git will detect if a file conflict exists and halt it’s “merging” process.
What you should see is something like the below text (it may be slightly different):
<<<<<<< HEAD
# Here's an example of a conflict
=======
# When a conflict happens
>>>>>>> ad3fsd45bsdd23lda2304
The text on the top between <<<<<<< HEAD
and =======
are the changes found on your local repository. The text on the bottom between =======
and >>>>>>> ad3fsd45bsdd23lda2304
are the changes found on your remote (GitHub) repository. HEAD
is the term to mean where your files are currently at. Think of HEAD
as meaning the “top” of the history. The long string with numbers and letters (like ad3fsd45bsdd23lda2304
) represents the ID for the commit (the “commit hash”) and, in this case, comes from main
(or sometimes called master
) branch of the origin
remote. The concept and use of branches is a powerful feature of Git, but due to time we won’t be covering them. You only need to know that every Git repository starts with the default main
(or master
) branch.
At this point, you decide what to keep and what to remove by deleting text within the R/learning.R
script in RStudio. You’ll need to also delete the lines with the <<<<<<< HEAD
, =======
, and >>>>>>> ad3fsd45bsdd23lda2304
(or something that looks similar).
After deciding what to keep and removing all the left over merge conflict tags, open up the Git interface in RStudio. The files listed in the staging area will show the conflict file with a yellow/orange “U”. To resolve the merge conflict, stage the file with the merge conflict. This will change the colour to blue. Next you commit the changes in the Git interface, without writing a commit message (this is not necessary when resolving merge conflicts). Push the changes to GitHub, then open up the Git repository in GitHub, refresh the browser, and view that the changes have taken place.
6.12 Exercise: Dealing with merge conflicts
Time: ~15 minutes.
Throughout this course you hopefully won’t encounter many merge conflicts, including during the group project. Because of this, we want you to get some more practice with dealing with them.
- In your GitHub
LearningR
repository, edit theREADME.md
file by replacing one word with a random word (e.g. “blahblah”). Commit the change. - Go to your RStudio
LearningR.Rproj
project and, without pulling from GitHub, replace the same word you did on the GitHub version of theREADME.md
file, but instead use a different random word (e.g. “ticktock” instead of “blahblah” from above). Add to the staging and commit the change with RStudio’s Git interface (Ctrl-Shift-P
, then type “commit”). - While in RStudio’s Git interface, pull the changes from GitHub. There should be a warning about merge conflicts. Now you can practice dealing with and fixing merge conflicts. Add the changes to the staging after you fixed them and click the commit button (you don’t need to type out a commit message).
- Push the changes up to GitHub and view them there to make sure they have been synchronized (you may need to refresh the browser).
6.13 Exercise: How can you use Git to collaborate better?
Time: ~10 minutes.
Before actually learning about how you might collaborate with others (and future you) by using Git and GitHub, let’s brainstorm and discuss how you might do it. Based on what you have learned so far, take some time to think about how you might use Git and GitHub to collaborate more easily between collaborators and future you.
- Take 1 minute to think to yourself.
- Take 6 minutes to brainstorm and discuss with your neighbour.
- In the final few minutes, we will share what you discussed with the group.
6.14 Collaborating using Git and GitHub
6.15 Summary
- Use the version control system Git to track changes to your files, to more easily manage your project, and to more easily collaborate with others.
- Git tracks files in three states: “Working directory”, “Staged”, and “History”.
- The Git repository contains the history.
- The main actions to move between states are:
- “Add to staging”
- “Commit to history”
- When committing to history, keep messages short and meaningful. Focus more on why the change was made, not what.
- “Remotes” are external storage locations for your Git repository. GitHub is a popular remote repository hosting service.
- Downloading a Git repository from GitHub is called “cloning”.
- “Pushing” and “pulling” are actions to upload and download to the remote repository (which usually is called “origin”).
- When there are differences in changes to the same lines in a file, a merge conflict occurs that you must deal with manually.
- Decide on which text to keep between the tags:
<<<<<<< HEAD
,=======
, and>>>>>>> origin/main
(the last name may sometimes look different), and remove tags.
- Decide on which text to keep between the tags:
- Almost all Git actions can be done using RStudio’s Git interface.