Welcome to the Introductory R3 course!

  • ✔️ Pick a group name from the basket and go to that table
  • ✔️ Introduce yourself to your group members
  • ✔️ Accept the GitHub Organization invite
  • ✔️ If you want, join the Discord channel for asking questions after the course

🙋‍♀️ Before this course… How many knew or have heard about reproducibility?

🙋‍♀️ Before this course… How many knew or have heard about open science?

🙋‍♂️ Before this course… or even open access, open data, open methods/protocols, or open source?

🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?

🙋‍♂️ Have you ever received confusing code? Or maybe have written your own confusing code?

These highlight a problem in science…

The scientific principle of “reproducibility” and code sharing

… often confused with “replicability” (Plesser 2018) 1

Replicability

  • Same analysis + same methods + new data = same result?
  • Independently conducted study
  • Difficult, usually needs funding
  • Linked to the “irreproducibility crisis”2

Reproducibility

  • Same data + same code = same result?
  • Should be easy right? Wrong, often just as hard
  • Question: If we can’t even reproduce a studies results, how can we expect to replicate it?

Biomedical studies almost entirely don’t publish code with the published paper

See: (Leek and Jager 2017; Considine et al. 2017; Seibold et al. 2021)

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Recent study: Only 25% could be executed without some “cleaning up” (Trisovic et al. 2022)

  • Code taken from Dataverse Project data repositories
  • After some automatic cleaning, ~half could execute

Scientific culture is not well-prepared for analytic and computation era

These issues can be fixed by creating and nurturing a culture of openness

Goal of this course: Start changing the culture by providing the training

Course details

Setup and layout

  • Course is mix of:
    • “Code-alongs” (we type and explain, you type along)
    • Hands-on coding, discussing, and reading exercises
    • Group project (quickly go over it and the GitHub page)
  • All material is online (and openly licensed)
  • Resources Appendix
    • Material for further learning
  • Reading tasks are “callout” blocks marked by the blue line on the left side of the text
  • Schedule listed is a guide only, some sessions are longer, others shorter
  • Less about coding, more about connecting with others
    • During lunch, try to sit beside someone you don’t know
    • Several networking activities (usually after lunch)

Getting or asking for help 🙋‍♀️🙋‍♂️

  • Put the sticky on your laptop to get help
  • There are lots of helpers
  • Team members, try to help out too
  • We’re all learning here!
  • We have a cheatsheet!

Practice using stickies: Have you joined the GitHub Organization and the Slack group?

Activities

🚶🚶‍♀️ Who has not yet used R?

🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?

🙋Those who’ve used R or other coding tool (like Stata), have you had formal training in “coding” in it?

🚶🚶‍♂️🚶‍♀️How do you perceive your general skill in data analysis?

Ok, get back into your groups and read through the introduction section

References

Considine, E. C., G. Thomas, A. L. Boulesteix, A. S. Khashan, and L. C. Kenny. 2017. “Critical Review of Reporting of the Data Analysis Step in Metabolomics.” Metabolomics 14 (1). https://doi.org/10.1007/s11306-017-1299-3.
Leek, Jeffrey T., and Leah R. Jager. 2017. “Is Most Published Research Really False?” Annual Review of Statistics and Its Application 4 (1): 109–22. https://doi.org/10.1146/annurev-statistics-060116-054104.
Plesser, Hans E. 2018. “Reproducibility Vs. Replicability: A Brief History of a Confused Terminology.” Frontiers in Neuroinformatics 11 (January). https://doi.org/10.3389/fninf.2017.00076.
Seibold, Heidi, Severin Czerny, Siona Decke, Roman Dieterle, Thomas Eder, Steffen Fohr, Nico Hahn, et al. 2021. “A Computational Reproducibility Study of PLOS ONE Articles Featuring Longitudinal Data Analyses.” Edited by Jelte M. Wicherts. PLOS ONE 16 (6): e0251194. https://doi.org/10.1371/journal.pone.0251194.
Trisovic, Ana, Matthew K. Lau, Thomas Pasquier, and Mercè Crosas. 2022. “A Large-Scale Study on Research Code Quality and Execution.” Scientific Data 9 (1). https://doi.org/10.1038/s41597-022-01143-6.