10 Collaborating with GitHub
10.1 Learning objectives
This session’s overall learning outcome is to:
- Explain how to collaborate with others (or your future self) through Git and GitHub, while also using it as a form of a backup.
Specific objectives are to:
- Describe the difference between a “remote” and a “local” repository.
- Use GitHub to store your Git repository by connecting your R project to GitHub.
- Use “pushing” and “pulling” to synchronize any changes you make between the local and remote repositories.
- Use GitHub to collaborate with others on a project.
10.2 “Remotes”: Storing your repository online
Briefly go over this next section, especially highlighting the image.
A version control system that didn’t include a type of external backup wouldn’t be a very good system, because if something happened to your computer, you’d lose your Git repository. In Git, this “external” backup is called a “remote”, meaning it is something that is separate and in a different location, usually online, than the main repository. The remote repository is essentially a duplicate copy of the history, found in the .git/
folder, of your local repository on your computer. So when you synchronize with the remote, as illustrated in Figure 10.1, it only copies over the changes made as commits in the history.
One of the biggest reasons why we teach Git is because of the popularity of several Git repository hosting sites. The most popular one is GitHub (which this course is hosted on). In this session, we’ll be covering GitHub not only because it is very popular, but also because the R community is almost entirely on GitHub.
Let’s get familiar with the GitHub interface.
Go over the interface of GitHub, especially where repositories are listed, the sidebar of the landing page (of your account), and where your account settings are.
Also, reinforce the warning below.
When using GitHub, especially in relation to health research, you need to be mindful of what you save into the Git history and what you put up online. Some things to think about are:
- Do not save any personal or sensitive data or files in your Git repository.
- Generally don’t save very large files, like big image files or large (non-personal) datasets.
In both cases, it’s better to use another tool to store files like that, rather than through Git and GitHub.
10.3 📖 Reading task: Using GitHub as a remote
Time: ~3 minutes
Making and cloning a GitHub repository is the first step to linking a local repository to a remote one. We are creating a GitHub repository from an existing local one, but you can also create one on GitHub first. More details about manually creating repositories on GitHub is found in Appendix C.
After connecting your local Git repository, to keep your GitHub repository synchronized, you need to “push” (upload) and “pull” (download) any changes you make to the repository on your computer, as shown in Figure 10.2. It isn’t done automatically because Git is designed with having control in mind, so you must do this synchronization manually. “Pushing” is when changes to the history are uploaded to GitHub while “pulling” is when the history is downloaded from GitHub.
So, when we put the concepts back into the framework of the “states”, first introduced in Section 7.4, pushing and pulling happen only to the history. Things that you’ve changed and then saved to the history, either on the remote or the local repository, are synchronized from or to GitHub. So, as shown in Figure 10.3, pushing copies of the history over to GitHub and pulling copies of the history from GitHub. Since changes saved in the history also reflect the working folder (the files and folders you actually see and interact with), “pulling” also updates the files and folders.
Interacting with GitHub through R requires us to use something called a “personal access token”, which you will learn about and create in the next exercise.
10.4 🧑💻 Exercise: Linking your project to GitHub
Time: ~20 minutes.
Since we use R, there is a really useful set of functions from {usethis}
to make it easy interact with and setup connections to GitHub from RStudio. Complete all of the Connect to GitHub guide for this exercise. In the end, you should have your LearningR
project on GitHub.
On your own, run the commands in the guides. After they complete the tasks, make sure that they have the “Push” and “Pull” buttons as well as having the LearningR
on GitHub.
10.5 Synchronizing with GitHub
After we’ve created the token and put our project onto GitHub, we can now push and pull any changes you make to the files. Let’s practice how it works.
Open up the README.md
and write a random sentence somewhere near the top of the file. Then save the file. Open the Git interface with Ctrl-Alt-MCtrl-Alt-M or with the Palette (Ctrl-Shift-PCtrl-Shift-P, then type “commit”), “stage” the file, and write a commit message. Click the “Commit” button.
Now time to test out pushing the change to GitHub. Click the “Push” button in the top right corner of the Git interface. A pop-up will indicate that it’s pushing and will show some text after it’s pushed. Go to our GitHub repository to see that it worked!
Now let’s try the opposite way, by committing and pulling changes from GitHub to your local repository.
While in your LearningR
GitHub repository, click the README.md
file on the GitHub website, and then click the “Edit” button (see the video below, which shows it for a random repository called learning-github
).
Write Add another random sentence somewhere near the top of the file. Scroll down to the commit message box, and type out a commit message. Then, click the “Commit” button. We’ve made a change on the repository in GitHub. Let’s synchronize it to our local repository.
Go back to RStudio, open the Git interface and now click the “Pull” button in the top right corner beside the “Push” button. Wait for it to finish pulling and check your README.md
file for the new change. You’ve now updated your project!
10.6 📖 Reading task: Collaborating using Git and GitHub
Time: ~10 minutes
While all of the previous Git tools we covered are extremely useful when working alone, we’ve been building up to using Git for it’s main and biggest advantage: to easily collaborate with others on a project.
Academia is unfortunately far behind when it comes to using modern tools to effectively collaborate together. Most researchers still use emails to send files back and forth, and while this is a very simple and easy way to start doing with no learning or training required, it is not very effective.
Usually this style of collaborating revolves around one or two people doing the most of the actual writing or direct contributing, while others give feedback or indirectly contribute through discussions or meetings. You might be familiar with using “Track changes” in Word when doing this style of collaborating. If your collaborators are a bit more technical, you all might be using Google Docs to do real-time collaboration together.
When using Git and GitHub, as well as Quarto documents, this style of collaborating isn’t possible. For one, there is no “track changes” feature. Instead, you’d need to learn another way of collaborating together, a way that is much more effective and extremely powerful. This workflow of using Git and GitHub has been tried and tested by tens of thousands of teams in tens of hundreds of companies globally. One of the goals of this course is to slowly move researchers more into the modern era, using more modern technology, tools, and workflows so that we can produce better research faster.
How does the workflow look like when using Git and GitHub? It works on the concept of remotes that we introduced earlier. Since a local repository is a copy of a remote repository, anyone else can collaborate on your project by copying the remote repository. When they want to contribute back, they make commits to their local copy and push those changes up to the remote. Then you can pull to your local copy and do the same thing. This is illustrated in Figure 10.4.
A disadvantage to this workflow is, in order to use it to effectively, it takes time to learn and get used to. For instance, while it is entirely possible to make changes to the same file as other collaborators if you use Git the way it was designed for, you may not be using it as intended since you are still learning. So when one of you pushes the changes to the file and the other pulls those changes, you end up with a merge conflict that you have to learn how to resolve. A work-around is to create separate files for each collaborator and merge them together when you are finished with the project. This is what you will do for the group project at the end of this course.
For public GitHub repositories, anyone can copy your repository and contribute back (only if you want their contribution), so working with collaborators is easy. When you have a private repository, you need to explicitly add collaborators in GitHub.
You add someone to a private (or public) repository by going to Settings -> Manage Access -> Invite a collaborator
(also shown in the video below). We won’t do this for the course, but we’re telling you how you can just in case you want to after the course.
As we mentioned, when working with others (or even yourself) through GitHub, you will eventually encounter “merge conflicts”. This happens when a change has been made to the same line in the same file, but in different commits, either by you or someone else. This usually happens if you make a change on GitHub as the remote while also making a change on your local repository without pulling first.
When this happens, Git will not know which change to keep and will ask you to resolve the conflict. You resolve the conflict by opening the file in RStudio, finding the conflict, and deciding which change to keep and which to remove. After you’ve resolved the conflict, you would then stage the file and commit it.
We won’t go into merge conflicts in this course, though you might deal with them during the group project. If you want to learn more about them after the course, GitHub has a practical tutorial on it. We also have an extra appendix section to deal with merge conflicts in Appendix D.
A big challenge you’ll encounter with becoming better with this way of collaborating is that most of your collaborators will likely not be familiar with it, at least until they take this course. 😉
Sadly, even experienced researchers struggle with this workflow (mainly due to their unfamiliarity with Git) and there is no easy answer on how to handle this. The best way (in our opinion) is to start training any colleague who is interested in collaborating this way and slowly surround yourself with collaborators who also work this way.
Want to see how others use Git and GitHub in their research as examples? Check out the Examples section of the Guides website.
10.7 Summary
- “Remotes” are external storage locations for your Git repository. GitHub is a popular remote repository hosting service.
- Downloading a Git repository from GitHub is called “cloning”.
- “Pushing” and “pulling” are actions to upload and download to the remote repository (which usually is called “origin”), so that you can synchronize your changes.
- Collaborating together using Git and GitHub is a powerful way of working, but does take some learning. Collaborating involves each collaborator creating a copy of the remote repository that you each push and pull to as you work together.