Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.

On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.

Appendix B — Group project assignment

To maximize how much you learn and how much you will retain, you as a group will take what you learn in the course and apply it to create a reproducible project. This project, based on a simple data analysis of a dataset of your choice, will have a GitHub repository and an HTML document as a report to demonstrate the reproducibility of the analysis. The dataset cannot be the same as the one we used in class, and it must be an open dataset obtained from an online archive. We list several that are tidy enough to start working from.

During the last session of the course you will work on this assignment. In the last ~20 minutes of this session, the lead instructor will download your project from GitHub and re-generate your report to check that it is reproducible.

B.1 Specific tasks

Throughout the course, there are a few activities and exercises that you will work on together as a group. However, it is during the last session where you will work together to complete this assignment by applying the skills taught during the course.

It is important to note that you will be using Git and GitHub to manage your group assignment. We will set up the project with Git and GitHub for you so you can get right to collaborating together on the project. You will be pushing and pulling a lot of content, so you will need to maintain regular and open communication with your team members.

Your specific tasks are:

  1. All team members need to clone your team’s repository to their own computer. The steps are found through File -> New Project -> Version Control -> Git (partly learned in Section 6.7).

  2. Select one of the open datasets to use for your analysis and report:

  3. Assign one team member to download the dataset and put it into the data-raw/ folder of your project.

    • For the GitHub-linked datasets above, you should be able to click on the “Raw” or “Download” buttons, then use the keyboard shortcut (Ctrl-S) to save the page (as a .csv file).
    • If there is any documentation on the dataset, e.g. variable definitions, put those files in the data-raw/ folder as well.
    • This team member should then add and commit the new file(s) in the project, so it gets saved to the Git history. Then push the changes to the team’s GitHub repository.
    • Next, have all team members pull the new changes to their own computers.
  4. Use the created data-raw/original-data.R script to clean up and prepare the raw dataset (as covered in Chapter 7). A (strongly) suggested way of working here is to assign one team member as the “coder” and as a group discuss and decide on how to clean up the data by telling the “coder” what to do. After you’ve cleaned it up enough that you as a group are satisfied, save the new dataset in a folder called data/ with readr::write_csv(). The coder should eventually add and comment these changes to the data before pushing to the team GitHub repository so the other members can access the data too. Reminder: You load in data with read_csv() like we’ve done in the setup code chunk in the Quarto file.

  5. Create a Quarto file named report-YOURNAME.qmd (replacing YOURNAME with your actual name) in the doc/ folder of your project (covered in Chapter 8). This means that each team member will have their own document to work in without conflicting with the other team members. Note, if there are several members with the same name, make sure your qmd documents are unique (for instance, by using your full name). Distribute the tasks (listed below) so each team member is (mostly) doing something different, like doing some simple analyses of the dataset, making figures (as covered in Chapter 9), and so on. Keep using the “Git workflow” by adding to the staged area, commit, push, and pull the changes you’ve made. You may end up with some merge conflicts, ask instructors or helpers for support to resolving them.

    • Create section headers (e.g. # Results, ## Tables, ## Figures, and # Discussion).
    • Create at least one table in the report that is generated from the data.
    • Create several different figures of your data.
    • Try to “render” your document to HTML often, to make sure the analysis is reproducible.
    • Note: For now, do not add and commit the HTML output file.
  6. Once you as a group are happy with the results, each team member should write in the “Discussion” section of their report file a few sentences on some things you liked about doing the project with the tools you learned and a few sentences on some challenges you experienced. Add any thoughts that you as a group came up with as well.

  7. Assign one team member as the “report coordinator” and have them merge all the individual report-YOURNAME.Rmd into one report.Rmd file on their computer (after having everyone push, and the coordinator will pull). They do this by copying and pasting the content of the reports into the appropriate sections of the report. After they are done, add and commit the report.Rmd. Lastly, delete all the report files except the report.Rmd and add and commit these deleted files to the Git history. There should only be the report.Rmd file in the repository.

    • As the report coordinator merges the report file, assign another team member to update the README.md file with some details about the project and about who was on the team. Save, add, commit, and push these changes to the GitHub repository.
  8. Generate an HTML of the report and commit it to Git, then push it up to GitHub. Include all the updated code and files on GitHub for the presentation.

These tasks may seem like a lot, with a lot of new terminology and tools to use. But don’t worry! We will be going over many of these topics and you will have time to complete the project over the three days. Make use of the website to remind you and use our cheatsheet as a reference.

At the end, the lead instructor will download each of the teams’ Git projects, render the documents to test their reproducibility, and show them on the screen for each team to present on.

B.2 Quick “checklist” for a good project

  • Project used Git and is on GitHub.
  • Included a good README describing the project and the team.
  • Separated “raw data” from “cleaned data”.
  • Used scripts to clean the data.
  • Included R code within a Quarto file to show results and make figures.
  • Went over challenges and general experiences.
  • Generated an HTML file from an R Markdown file.

B.3 Expectations for the project

What we expect you to do for the group project:

  • Use Git and GitHub throughout your work.
  • Work collaboratively as a group and share responsibilities and tasks.
  • Use as much of what we covered in the course to practice what you learned.

What we don’t expect:

  • Complicated analysis or coding. The simpler it is, the easier is to for you to do the coding and understand what is going on. It also helps us to see that you’ve practiced what you’ve learned.
  • Clever or overly concise code. Clearly written and readable code is always better than clever or concise code. Keep it simple and understandable!

Essentially, the group project is a way to reinforce what you learned during the course, but in a more relaxed and collaborative setting.