Welcome to the Introductory R3 course!

  • ✔️ Pick a group name from the basket and go to that table
  • ✔️ Introduce yourself to your group members
  • ✔️ Accept the GitHub Organization invite

Introduce instructors and helpers after welcoming everyone and getting them to do this.

🙋‍♀️ Before this course… How many knew or have heard about reproducibility?

Raise your hands.

🙋‍♀️ Before this course… How many knew or have heard about open science?

🙋‍♂️ Before this course… or even open access, open data, open methods/protocols, or open source?

🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?

Because you are trying to do the same or similar?

And you’ve probably realize by now, way more is done than shown in the “Methods”.

🙋‍♂️ Have you ever received confusing code? Or maybe have written your own confusing code?

For those that have worked with code.

I definitely have in my research career. We want to change the culture around code by encouraging and teaching how to share code and to write better code in general.

These highlight a problem in science…

The scientific principle of “reproducibility” and code sharing

… often confused with “replicability” (1) 1

How many could tell me the difference between replicable and reproducible?

Replicability

  • Same analysis + same methods + new data = same result?
  • Independently conducted study
  • Difficult, usually needs funding
  • Linked to the “irreproducibility crisis”2

Reproducibility

  • Same data + same code = same result?
  • Should be easy right? Wrong, often just as hard
  • Question: If we can’t even reproduce a studies results, how can we expect to replicate it?
  1. Also from American Statistical Association.

  2. Or rather “irreplicability crisis”.

Biomedical studies almost entirely don’t publish code with the published paper

See: (2–4)

Vast majority of papers still don’t provide code. Except for maybe in bioinformatics, where a bit more than half of studies do. There are lots of reasons for this, that I talk more about tomorrow.

And this is no joke. Getting data on this is difficult, but the research that has been done shows that almost no one is sharing their code. The estimates range between fields in health science from zero to maybe five percent of published studies. The only area that is doing pretty well is bioinformatics, at about 60% of published studies.

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Very low reproducibility in most of science (5)

Even in institutional code and data archive, executability is low! (6)

  • Code taken from Harvard Dataverse Project data repositories
  • Only 25% could be executed without some “cleaning up”
  • After some automatic cleaning, ~50% could execute

Recent large study on general reproducibility of projects that shared code.

Initially only 25% of the R scripts could be executed (doesn’t mean results were reproduced though). After doing automatic and some manual code cleaning, than about half could be executed. That’s not bad.

Since scripts were taken from Dataverse.org, researchers who upload their code and projects to it probably are a bit more aware and knowledgeable about general reproducibility and coding then the average researcher, so the results are a bit biased.

Scientific culture is not well-prepared for analytic and computation era

These issues can be fixed by creating and nurturing a culture of openness

All of this is because of a problem with our culture in research. We aren’t open, we don’t really share, and don’t often follow basic principles of science. To fix this, we need to start creating and nurturing a better and healthier culture. We all can be involved in that, we all have that power to do something, even if its small thing.

Goal of this course: Start changing the culture by providing the training

Course details

Setup and layout

  • Course is mix of:
    • “Code-alongs” (we type and explain, you type along)
    • Hands-on coding, discussing, and reading exercises
    • Group project (quickly go over it and the GitHub page)
  • All material is online (and openly licensed)
  • Resources Appendix
    • Material for further learning
  • Coding is just as much social as it is solo
    • Every morning, draw a table names from the bowl and sit at that table
    • Introduce yourself to your table mates
    • During lunch, sit beside someone you don’t know
    • Several networking activities after most lunch

Explain a bit more about the reading, why doing it, and that this course in particular has a lot more of it than more advanced courses.

With the final group project, you’ll be in the same group for the course, working together on it and on the final exercises. As a team, you’ll help each other out with learning and overcoming any struggles, with of course our help too!

I’ve tried to organize the groups to include a range of skills and experiences, so there is a mix of novice and more experienced users.

Getting or asking for help 🙋‍♀️🙋‍♂️

  • Put the sticky/origami hats 🎩 on your laptop to get help
  • There are lots of helpers
  • Table mates, try to help out too
  • We’re all learning here!
  • This is a supportive and safe environment
  • Remember our Code of Conduct

Practice using stickies: Have you joined the GitHub Organization?

Activities

🚶‍♂️🚶‍♀️ Who has used any other coding tool (like Stata)?

🚶‍♂️🚶‍♀️ Those who have used other coding tools, have you had formal training in “coding” in it?

🚶‍♂️🚶‍♀️ How do you perceive your general skill in data analysis?

🚶‍♂️🚶‍♀️ How nervous are you about learning R?

🚶‍♂️🚶‍♀️ Who has not yet used R?

Go to different sides of the room for “Yes” and “No”

🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?

Arrange along the wall from beginner to advanced.

🪑 Ok, get back into your chairs

References

1.
Plesser HE. Reproducibility vs. Replicability: A brief history of a confused terminology. Frontiers in Neuroinformatics. 2018 Jan;11.
2.
Leek JT, Jager LR. Is most published research really false? Annual Review of Statistics and Its Application. 2017 Mar;4(1):109–22.
3.
Considine EC, Thomas G, Boulesteix AL, Khashan AS, Kenny LC. Critical review of reporting of the data analysis step in metabolomics. Metabolomics. 2017 Dec;14(1).
4.
Seibold H, Czerny S, Decke S, Dieterle R, Eder T, Fohr S, et al. A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. Wicherts JM, editor. PLOS ONE. 2021 Jun;16(6):e0251194.
5.
Samuel S, Mietchen D. Computational reproducibility of jupyter notebooks from biomedical publications. GigaScience. 2024;13.
6.
Trisovic A, Lau MK, Pasquier T, Crosas M. A large-scale study on research code quality and execution. Scientific Data. 2022 Feb;9(1).
Welcome to the Introductory R 3 course! ✔️ Pick a group name from the basket and go to that table ✔️ Introduce yourself to your group members ✔️ Accept the GitHub Organization invite

  1. Slides

  2. Tools

  3. Close
  • Welcome to the Introductory R3 course!
  • 🙋‍♀️ Before this course… How many knew or have heard about reproducibility?
  • 🙋‍♀️ Before this course… How many knew or have heard about open science?
  • 🙋‍♂️ Before this course… or even open access, open data, open methods/protocols, or open source?
  • 🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?
  • 🙋‍♂️ Have you ever received confusing code? Or maybe have written your own confusing code?
  • These highlight a problem in science…
  • The scientific principle of “reproducibility” and code sharing
  • Biomedical studies almost entirely don’t publish code with the published paper
  • How can we check reproducibility if no code is given?
  • Very low reproducibility in most of science (5)
  • Even in institutional code and data archive, executability is low! (6)
  • Scientific culture is not well-prepared for analytic and computation era
  • These issues can be fixed by creating and nurturing a culture of openness
  • Goal of this course: Start changing the culture by providing the training
  • Course details
  • Setup and layout
  • Getting or asking for help 🙋‍♀️🙋‍♂️
  • Practice using stickies: Have you joined the GitHub Organization?
  • Activities
  • 🚶‍♂️🚶‍♀️ Who has used any other coding tool (like Stata)?
  • 🚶‍♂️🚶‍♀️ Those who have used other coding tools, have you had formal training in “coding” in it?
  • 🚶‍♂️🚶‍♀️ How do you perceive your general skill in data analysis?
  • 🚶‍♂️🚶‍♀️ How nervous are you about learning R?
  • 🚶‍♂️🚶‍♀️ Who has not yet used R?
  • 🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?
  • 🪑 Ok, get back into your chairs
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help