If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.
On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.
Appendix B — Group project assignment
To maximize how much you learn and how much you will retain, you as a group will take what you learn in the course and apply it to create a reproducible project. This project, based on a simple data analysis of a dataset of your choice, will have a GitHub repository and an HTML document as a report to demonstrate the reproducibility of the analysis. The dataset cannot be the same as the one we used in class and must be one of the ones listed below. The datasets we’ve selected are tidy enough to start working on without too much wrangling.
During the last session of the course you will work on this assignment. In the last ~20 minutes of this session, the lead instructor will download your project from GitHub and re-generate your report to check that it is reproducible.
B.1 Specific tasks
You will be collaborating as a team using Git and GitHub to manage your group assignment. We will set up the project with Git and GitHub for you so you can quickly start collaborating together on the project. You will be pushing and pulling a lot of content, so you will need to maintain regular and open communication with your team members.
Your specific tasks are:
All team members need to clone your team’s repository to their own computer. The steps are found through
File -> New Project -> Version Control -> Git
(partly learned in Section 6.6).Select one of the open datasets to use for your analysis and report:
- The Global Carbon Project’s fossil CO2 emissions dataset. Click this link to view and save the data directly (by right-clicking while in the page and selecting “Save As”).
- Data from a published study: Early-life disease exposure and associations with adult survival, cause of death, and reproductive success in preindustrial humans. Click this link to view and save the data directly (by right-clicking while in the page and selecting “Save As”).
- Data from a published study: Clinical and metabolic characteristics among Mexican children with different types of diabetes mellitus. Click this link to view and save the data directly (by right-clicking while in the page and selecting “Save As”).
Assign one team member to download the dataset and put it into the
data-raw/
folder of your project.- If there is any documentation on the dataset, e.g. variable definitions, put those files in the
data-raw/
folder as well. - This team member should then add and commit the new file(s) in the project, so it gets saved to the Git history. Then push the changes to the team’s GitHub repository.
- Next, have all team members pull the new changes to their own computers.
- If there is any documentation on the dataset, e.g. variable definitions, put those files in the
Each team member creates a Quarto file named
report-YOURNAME.qmd
(replacingYOURNAME
with your actual name) in thedoc/
folder of your project (covered in Chapter 8). This means that each team member will have their own document to work in without conflicting with the other team members. Note, if there are several members with the same name, make sure yourqmd
documents are unique (for instance, by using your full name).For each team member’s
.qmd
file, include asetup
code chunk at the top that lists the packages used. Include code to import your data here.Create a few figures of the data. Discuss with your team what figures to create and divide which figures to make between your team members.
Create at least one table. Discuss with your team what table to make, if you want to create more than one, and assign who will create it.
When making both the figures and the tables, make sure to incorporate some data wrangling code from
{dplyr}
.Use the “Git workflow” by adding to the staged area, commit, push, and pull the changes you’ve made.
Make sure to “render” your document to HTML often to ensure it is always reproducible.
Note: For now, do not add and commit the HTML output file.
Assign one team member as the “report coordinator” and have them merge all the individual
report-YOURNAME.qmd
into onereport.qmd
file on their computer (after having everyone push, and the coordinator will pull). They do this by copying and pasting the content of the reports into section headers (using##
) in the report. After they are done, add and commit thereport.qmd
. Lastly, delete all the report files except thereport.qmd
and add and commit these deleted files to the Git history. There should only be thereport.qmd
file in the repository.Create section headers (e.g.
# Results
,## Tables
,## Figures
, and# Discussion
).As the report coordinator merges the report file, assign another team member to update the
README.md
file with some details about the project and about who was on the team. Save, add, commit, and push these changes to the GitHub repository.Once there is one report, as a group discuss and write a few sentences in the “Discussion” section of their report file on some things you liked about doing the project with the tools you learned and a few sentences on some challenges you experienced.
Generate an HTML of the report and commit it to Git, then push it up to GitHub. Include all the updated code and files on GitHub for the presentation.
These tasks may seem like a lot, with a lot of new terminology and tools to use. But don’t worry! We will be going over many of these topics and you will have time to complete the project over the three days. Make use of the website to remind you and use our cheatsheet as a reference.
At the end, the lead instructor will download each of the teams’ Git projects, render the documents to test their reproducibility, and show them on the screen for each team to present on.
B.2 Quick “checklist” for a good project
- Project used Git and is on GitHub.
- Included a README with some basic details.
- Used Quarto for making figures and tables.
- Added some basic challenges and general experiences in the discussion section.
- Generated a final HTML file from a single
report.qmd
Quarto file.
B.3 Expectations for the project
What we expect you to do for the group project:
- Use Git and GitHub throughout your work.
- Work collaboratively as a group and share responsibilities and tasks.
- Use as much of what we covered in the course to practice what you learned.
What we don’t expect:
- Complicated analysis or coding. The simpler it is, the easier is to for you to do the coding and understand what is going on. It also helps us to see that you’ve practiced what you’ve learned.
- Clever or overly concise code. Clearly written and readable code is always better than clever or concise code. Keep it simple and understandable!
Essentially, the group project is a way to reinforce what you learned during the course, but in a more relaxed and collaborative setting.