Welcome to the Introductory R3 workshop!

  • ✔️ Pick a group name from the basket and go to that table
  • ✔️ Introduce yourself to your group members
  • ✔️ Accept the GitHub Organization invite

Introduce instructors and helpers after welcoming everyone and getting them to do this.

🙋‍♀️ Before this workshop… How many knew or have heard about reproducibility?

Raise your hands.

🙋‍♀️ Before this workshop… How many knew or have heard about open science?

🙋‍♂️ Before this workshop… or even open access, open data, open methods/protocols, or open source?

🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?

Because you are trying to do the same or similar?

And you’ve probably realize by now, way more is done than shown in the “Methods”.

🙋‍♂️ Have you ever received confusing code? Or maybe have written your own confusing code?

For those that have worked with code.

I definitely have in my research career. We want to change the culture around code by encouraging and teaching how to share code and to write better code in general.

These highlight a problem in science…

The scientific principle of “reproducibility” and code sharing

… often confused with “replicability” [@Plesser2018a] 1

How many could tell me the difference between replicable and reproducible?

Replicability

  • Same analysis + same methods + new data = same result?
  • Independently conducted study
  • Difficult, usually needs funding
  • Linked to the “irreproducibility crisis”2

Reproducibility

  • Same data + same code = same result?
  • Should be easy right? Wrong, often just as hard
  • Question: If we can’t even reproduce a studies results, how can we expect to replicate it?
  1. Also from American Statistical Association.

  2. Or rather “irreplicability crisis”.

Biomedical studies almost entirely don’t publish code with the published paper

See: [@Leek2017a; @Considine2017a; @Seibold2021]

Vast majority of papers still don’t provide code. Except for maybe in bioinformatics, where a bit more than half of studies do. There are lots of reasons for this, that I talk more about tomorrow.

And this is no joke. Getting data on this is difficult, but the research that has been done shows that almost no one is sharing their code. The estimates range between fields in health science from zero to maybe five percent of published studies. The only area that is doing pretty well is bioinformatics, at about 60% of published studies.

How can we check reproducibility if no code is given?

This is a little bit of a rhetorical question 😝

Very low reproducibility in most of science [@Samuel2024]

Even in institutional code and data archive, executability is low! [@Trisovic2022]

  • Code taken from Harvard Dataverse Project data repositories
  • Only 25% could be executed without some “cleaning up”
  • After some automatic cleaning, ~50% could execute

Recent large study on general reproducibility of projects that shared code.

Initially only 25% of the R scripts could be executed (doesn’t mean results were reproduced though). After doing automatic and some manual code cleaning, than about half could be executed. That’s not bad.

Since scripts were taken from Dataverse.org, researchers who upload their code and projects to it probably are a bit more aware and knowledgeable about general reproducibility and coding then the average researcher, so the results are a bit biased.

Scientific culture is not well-prepared for analytic and computation era

These issues can be fixed by creating and nurturing a culture of openness

All of this is because of a problem with our culture in research. We aren’t open, we don’t really share, and don’t often follow basic principles of science. To fix this, we need to start creating and nurturing a better and healthier culture. We all can be involved in that, we all have that power to do something, even if its small thing.

Goal of this workshop: Start changing the culture by providing the training

workshop details

Setup and layout

  • The workshop is mix of:
    • “Code-alongs” (we type and explain, you type along)
    • Hands-on coding, discussing, and reading exercises
    • Group project (quickly go over it and the GitHub page)
  • All material is online (and openly licensed)
  • Resources Appendix
    • Material for further learning
  • Coding is just as much social as it is solo
    • Every morning, draw a table names from the bowl and sit at that table
    • Introduce yourself to your table mates
    • During lunch, sit beside someone you don’t know
    • Several networking activities after most lunch

Explain a bit more about the reading, why doing it, and that this workshop in particular has a lot more of it than more advanced workshops.

With the final group project, you’ll be in the same group for the workshop, working together on it and on the final exercises. As a team, you’ll help each other out with learning and overcoming any struggles, with of workshop our help too!

I’ve tried to organize the groups to include a range of skills and experiences, so there is a mix of novice and more experienced users.

Getting or asking for help 🙋‍♀️🙋‍♂️

  • Put the sticky/origami hats 🎩 on your laptop to get help
  • There are lots of helpers
  • Table mates, try to help out too
  • For some, the pace may feel too slow. For others, too fast. We want fewer people to say it was too fast.
  • We’re all learning here!
  • This is a supportive and safe environment
  • Remember our Code of Conduct

We always get feedback that for some it is too fast and for others it is too slow, especially during this introductory workshop. Ideally, everyone would say it was the perfect speed. But that is impossible to achieve. So instead, we aim as much as possible to have fewer people say it was too fast than people who say it was too slow. This is an introduction workshop, so we’re trying to assume as little to no knowledge on many of these concepts. So, for those with some knowledge, it will feel slow at times! You can always help your neighbour out to help pass the time

Practice using stickies: Have you joined the GitHub Organization?

Activities

🚶‍♂️🚶‍♀️ Who has used any other coding tool (like Stata)?

🚶‍♂️🚶‍♀️ Those who have used other coding tools, have you had formal training in “coding” in it?

🚶‍♂️🚶‍♀️ How do you perceive your general skill in data analysis?

🚶‍♂️🚶‍♀️ How nervous are you about learning R?

🚶‍♂️🚶‍♀️ Who has not yet used R?

Go to different sides of the room for “Yes” and “No”

🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?

Arrange along the wall from beginner to advanced.

🪑 Ok, get back into your chairs

References

Welcome to the Introductory R 3 workshop! ✔️ Pick a group name from the basket and go to that table ✔️ Introduce yourself to your group members ✔️ Accept the GitHub Organization invite

  1. Slides

  2. Tools

  3. Close
  • Welcome to the Introductory R3 workshop!
  • 🙋‍♀️ Before this workshop… How many knew or have heard about reproducibility?
  • 🙋‍♀️ Before this workshop… How many knew or have heard about open science?
  • 🙋‍♂️ Before this workshop… or even open access, open data, open methods/protocols, or open source?
  • 🙋‍♀️ How many have read a method in a paper and wondered how they actually did it?
  • 🙋‍♂️ Have you ever received confusing code? Or maybe have written your own confusing code?
  • These highlight a problem in science…
  • The scientific principle of “reproducibility” and code sharing
  • Biomedical studies almost entirely don’t publish code with the published paper
  • How can we check reproducibility if no code is given?
  • Very low reproducibility in most of science [@Samuel2024]
  • Even in institutional code and data archive, executability is low! [@Trisovic2022]
  • Scientific culture is not well-prepared for analytic and computation era
  • These issues can be fixed by creating and nurturing a culture of openness
  • Goal of this workshop: Start changing the culture by providing the training
  • workshop details
  • Setup and layout
  • Getting or asking for help 🙋‍♀️🙋‍♂️
  • Practice using stickies: Have you joined the GitHub Organization?
  • Activities
  • 🚶‍♂️🚶‍♀️ Who has used any other coding tool (like Stata)?
  • 🚶‍♂️🚶‍♀️ Those who have used other coding tools, have you had formal training in “coding” in it?
  • 🚶‍♂️🚶‍♀️ How do you perceive your general skill in data analysis?
  • 🚶‍♂️🚶‍♀️ How nervous are you about learning R?
  • 🚶‍♂️🚶‍♀️ Who has not yet used R?
  • 🚶‍♀️🚶 Those who’ve used R, how do you perceive your skill in R?
  • 🪑 Ok, get back into your chairs
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • ? Keyboard Help