Welcome to the Introductory R3 workshop!

  • ✔️ Pick a table name from the basket and go to that table
  • ✔️ Introduce yourself to your table mates
  • ✔️ Accept the GitHub Organization invite

🙋‍♀️ Before this workshop… How many knew or have heard about reproducibility?

🙋‍♀️ Before this workshop… How many knew or have heard about open science?

🙋‍♂️ Before this workshop… or even open access, open data, open methods/protocols, or open source?

🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?

These highlight a problem in science…

The scientific principle of “reproducibility” and code sharing

… often confused with “replicability” (Plesser 2018) 1

Replicability

  • Same analysis + same methods + new data = same result?
  • Independently conducted study
  • Difficult, usually needs funding
  • Linked to the “irreproducibility crisis”2

Reproducibility

  • Same data + same code = same result?
  • Should be easy right? Wrong, often just as hard
  • Question: If we can’t even reproduce a studies results, how can we expect to replicate it?

Biomedical studies almost entirely don’t publish code with the published paper

See: (Leek and Jager 2017; Considine et al. 2017; Seibold et al. 2021)

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Very low reproducibility in most of science (Samuel and Mietchen 2024)

Even in institutional code and data archive, executability is low! (Trisovic et al. 2022)

  • Code taken from Harvard Dataverse Project data repositories
  • Only 25% could be executed without some “cleaning up”
  • After some automatic cleaning, ~50% could execute

Scientific culture is not well-prepared for analytic and computation era

With “AI” tools, generating (or “vibing”) lots of code is easy… but…

… If you aren’t experienced in code, how can you assess it?

More code = means more reading, maintenance, and hidden bugs

Using “AI” tools means you need to be more skilled in coding, not less

These issues can be fixed by creating and nurturing a culture of openness

Goal of this workshop: Start changing the culture by providing the training

Workshop details

Setup and layout

Mix of activities

  • “Code-alongs” (we type and explain, you type along)
  • Hands-on coding, discussing, and reading exercises
  • Team project (quickly go over it and the GitHub page)
  • For some, the pace may feel too slow. For others, too fast.

Building networks of coders

  • Every morning, draw a table names from the bowl, sit at that table, and introduce yourself
  • After lunch social discussions
  • Introverts: It’s ok to take breaks alone!

Getting or asking for help 🙋‍♀️🙋‍♂️

  • Put the origami hats 🎩 on your laptop to get help
  • There are lots of helpers
  • Table mates, try to help out too
  • We’re all learning here!
  • This is a supportive and safe environment
  • Remember our Code of Conduct

Practice using stickies: Have you joined the GitHub Organization?

Activities

🚶‍♂️🚶‍♀️ Who has used any other coding tool (like Stata)?

🚶‍♂️🚶‍♀️ Those who have used other coding tools, have you had formal training in “coding” in it?

🚶‍♂️🚶‍♀️ How do you perceive your general skill in data analysis?

🚶‍♂️🚶‍♀️ How nervous are you about learning R?

🚶‍♂️🚶‍♀️ Who has not yet used R?

🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?

🪑 Ok, get back into your chairs

References

Considine, E. C., G. Thomas, A. L. Boulesteix, A. S. Khashan, and L. C. Kenny. 2017. “Critical Review of Reporting of the Data Analysis Step in Metabolomics.” Metabolomics 14 (1). https://doi.org/10.1007/s11306-017-1299-3.
Leek, Jeffrey T., and Leah R. Jager. 2017. “Is Most Published Research Really False?” Annual Review of Statistics and Its Application 4 (1): 109–22. https://doi.org/10.1146/annurev-statistics-060116-054104.
Plesser, Hans E. 2018. “Reproducibility Vs. Replicability: A Brief History of a Confused Terminology.” Frontiers in Neuroinformatics 11 (January). https://doi.org/10.3389/fninf.2017.00076.
Samuel, Sheeba, and Daniel Mietchen. 2024. “Computational Reproducibility of Jupyter Notebooks from Biomedical Publications.” GigaScience 13. https://doi.org/10.1093/gigascience/giad113.
Seibold, Heidi, Severin Czerny, Siona Decke, et al. 2021. “A Computational Reproducibility Study of PLOS ONE Articles Featuring Longitudinal Data Analyses.” PLOS ONE 16 (6): e0251194. https://doi.org/10.1371/journal.pone.0251194.
Trisovic, Ana, Matthew K. Lau, Thomas Pasquier, and Mercè Crosas. 2022. “A Large-Scale Study on Research Code Quality and Execution.” Scientific Data 9 (1). https://doi.org/10.1038/s41597-022-01143-6.