Welcome to the Introductory R3 course!

  • ✔️ Pick a group name from the basket and go to that table
  • ✔️ Introduce yourself to your group members
  • ✔️ Accept the GitHub Organization invite

🙋‍♀️ Before this course… How many knew or have heard about reproducibility?

🙋‍♀️ Before this course… How many knew or have heard about open science?

🙋‍♂️ Before this course… or even open access, open data, open methods/protocols, or open source?

🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?

🙋‍♂️ Have you ever received confusing code? Or maybe have written your own confusing code?

These highlight a problem in science…

The scientific principle of “reproducibility” and code sharing

… often confused with “replicability” (1) 1

Replicability

  • Same analysis + same methods + new data = same result?
  • Independently conducted study
  • Difficult, usually needs funding
  • Linked to the “irreproducibility crisis”2

Reproducibility

  • Same data + same code = same result?
  • Should be easy right? Wrong, often just as hard
  • Question: If we can’t even reproduce a studies results, how can we expect to replicate it?

Biomedical studies almost entirely don’t publish code with the published paper

See: (24)

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Very low reproducibility in most of science (5)

Even in institutional code and data archive, executability is low! (6)

  • Code taken from Harvard Dataverse Project data repositories
  • Only 25% could be executed without some “cleaning up”
  • After some automatic cleaning, ~50% could execute

Scientific culture is not well-prepared for analytic and computation era

These issues can be fixed by creating and nurturing a culture of openness

Goal of this course: Start changing the culture by providing the training

Course details

Setup and layout

  • Course is mix of:
    • “Code-alongs” (we type and explain, you type along)
    • Hands-on coding, discussing, and reading exercises
    • Group project (quickly go over it and the GitHub page)
  • All material is online (and openly licensed)
  • Resources Appendix
    • Material for further learning
  • Coding is just as much social as it is solo
    • Every morning, draw a table names from the bowl and sit at that table
    • Introduce yourself to your table mates
    • During lunch, sit beside someone you don’t know
    • Several networking activities after most lunch

Getting or asking for help 🙋‍♀️🙋‍♂️

  • Put the sticky/origami hats 🎩 on your laptop to get help
  • There are lots of helpers
  • Table mates, try to help out too
  • We’re all learning here!
  • This is a supportive and safe environment
  • Remember our Code of Conduct

Practice using stickies: Have you joined the GitHub Organization?

Activities

🚶‍♂️🚶‍♀️ Who has used any other coding tool (like Stata)?

🚶‍♂️🚶‍♀️ Those who have used other coding tools, have you had formal training in “coding” in it?

🚶‍♂️🚶‍♀️ How do you perceive your general skill in data analysis?

🚶‍♂️🚶‍♀️ How nervous are you about learning R?

🚶‍♂️🚶‍♀️ Who has not yet used R?

🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?

🪑 Ok, get back into your chairs

References

1.
Plesser HE. Reproducibility vs. Replicability: A brief history of a confused terminology. Frontiers in Neuroinformatics. 2018 Jan;11.
2.
Leek JT, Jager LR. Is most published research really false? Annual Review of Statistics and Its Application. 2017 Mar;4(1):109–22.
3.
Considine EC, Thomas G, Boulesteix AL, Khashan AS, Kenny LC. Critical review of reporting of the data analysis step in metabolomics. Metabolomics. 2017 Dec;14(1).
4.
Seibold H, Czerny S, Decke S, Dieterle R, Eder T, Fohr S, et al. A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. Wicherts JM, editor. PLOS ONE. 2021 Jun;16(6):e0251194.
5.
6.
Trisovic A, Lau MK, Pasquier T, Crosas M. A large-scale study on research code quality and execution. Scientific Data. 2022 Feb;9(1).