Welcome to the Introductory R3 course!

  • ✔️ Pick a group name from the basket and go to that table
  • ✔️ Introduce yourself to your group members
  • ✔️ Accept the GitHub Organization invite
  • ✔️ If you want, join the Discord channel for asking questions after the course

🙋‍♀️ Before this course… How many knew or have heard about reproducibility?

🙋‍♀️ Before this course… How many knew or have heard about open science?

🙋‍♂️ Before this course… or even open access, open data, open methods/protocols, or open source?

🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?

🙋‍♂️ Have you ever received confusing code? Or maybe have written your own confusing code?

These highlight a problem in science…

The scientific principle of “reproducibility” and code sharing

… often confused with “replicability” (1) 1

Replicability

  • Same analysis + same methods + new data = same result?
  • Independently conducted study
  • Difficult, usually needs funding
  • Linked to the “irreproducibility crisis”2

Reproducibility

  • Same data + same code = same result?
  • Should be easy right? Wrong, often just as hard
  • Question: If we can’t even reproduce a studies results, how can we expect to replicate it?

Biomedical studies almost entirely don’t publish code with the published paper

See: (24)

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Recent study: Only 25% could be executed without some “cleaning up” (5)

  • Code taken from Dataverse Project data repositories
  • After some automatic cleaning, ~half could execute

Scientific culture is not well-prepared for analytic and computation era

These issues can be fixed by creating and nurturing a culture of openness

Goal of this course: Start changing the culture by providing the training

Course details

Setup and layout

  • Course is mix of:
    • “Code-alongs” (we type and explain, you type along)
    • Hands-on coding, discussing, and reading exercises
    • Group project (quickly go over it and the GitHub page)
  • All material is online (and openly licensed)
  • Resources Appendix
    • Material for further learning
  • Reading tasks are “callout” blocks marked by the blue line on the left side of the text
  • Schedule listed is a guide only, some sessions are longer, others shorter
  • Less about coding, more about connecting with others
    • During lunch, try to sit beside someone you don’t know
    • Several networking activities (usually after lunch)

Getting or asking for help 🙋‍♀️🙋‍♂️

  • Put the sticky on your laptop to get help
  • There are lots of helpers
  • Team members, try to help out too
  • We’re all learning here!
  • We have a cheatsheet!

Practice using stickies: Have you joined the GitHub Organization and the Slack group?

Activities

🚶🚶‍♀️ Who has not yet used R?

🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?

🙋Those who’ve used R or other coding tool (like Stata), have you had formal training in “coding” in it?

🚶🚶‍♂️🚶‍♀️How do you perceive your general skill in data analysis?

Ok, get back into your groups and read through the introduction section

References

1.
Plesser HE. Reproducibility vs. Replicability: A brief history of a confused terminology. Frontiers in Neuroinformatics. 2018 Jan;11.
2.
Leek JT, Jager LR. Is most published research really false? Annual Review of Statistics and Its Application. 2017 Mar;4(1):109–22.
3.
Considine EC, Thomas G, Boulesteix AL, Khashan AS, Kenny LC. Critical review of reporting of the data analysis step in metabolomics. Metabolomics. 2017 Dec;14(1).
4.
Seibold H, Czerny S, Decke S, Dieterle R, Eder T, Fohr S, et al. A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. Wicherts JM, editor. PLOS ONE. 2021 Jun;16(6):e0251194.
5.
Trisovic A, Lau MK, Pasquier T, Crosas M. A large-scale study on research code quality and execution. Scientific Data. 2022 Feb;9(1).