1 Course syllabus

Reproducibility and open scientific practices are increasingly required of scientists and researchers. We as researchers are also facing more challenging and complex analyses for larger and larger datasets. Nor do we receive the necessary support and training on how to do these analyses and apply these open and reproducible practices. This course will introduce many of the core concepts and practices for doing reproducible and open data analysis to get you familiar with and prepared for the type of work needed in research now and in the future. We use a very practical approach based largely on code-along sessions (instructor and learner coding together), hands-on exercises, reading activities, and a group project. Our overarching learning outcome is that at the end of the course, participants will be able to:

Describe the fundamentals of what an open and reproducible data analysis looks like and then create a project that applies some of the basics of these concepts using R.

Our specific learning objectives are to:

Explain what an open and reproducible data analysis workflow is, what it looks like, and why it is important.
Explain and demonstrate why R is rapidly becoming the standard program of choice for doing modern data analysis in science.
Demonstrate and apply collaborative tools and techniques when working in team settings (including working with your future self).
Show and apply the fundamental tools and skills for conducting a reproducible and modern analysis for a research project, focusing on the basics of wrangling and visualizing data, writing reproducible reports, and structuring projects.
Show where to go to get help and to continue learning modern data analysis skills.

Because learning and coding is ultimately not just a solo activity, in addition to the group project work, during this course we also aim to provide opportunities to chat with fellow participants, learn about their work and how they do analyses, and to build networks of support and collaboration.

The course will place particular emphasis on research in diabetes, health, and metabolism; it will be taught by instructors working in this field and it will use relevant examples where possible.

The specific software and technologies we will cover in this course are R, RStudio, Git, GitHub, and Quarto, while the specific R packages are dplyr and ggplot2 packages.

1.1 Is this course for you?

To help manage expectations and develop the material for this course, we make a few assumptions about who you are as a participant in the course:

You are a researcher, likely working in the biomedical or health field (ranging from experimental to epidemiology).
You currently or will soon do some quantitative data analysis.
You either:
- know nothing or little about R (or computing in general);
- haven’t used code-based programs for doing data analysis (e.g. have used SPSS);
- have used coding programs before (e.g. used SAS or Stata), but not R;
- or know how to use R, but haven’t used the tidyverse or RStudio.

While we have these assumptions to help focus the content of the course, if you have an interest in learning R but don’t fit any of the above assumptions, you are still welcome to attend the course! We welcome everyone, that is until the course capacity is reached.

In addition to the assumptions, we also have a fairly focused scope for teaching and expectations for learning. So this may also help you decide if this course is for you.

We teach

How to use R, starting from the very basics and targeted to beginners.
Using a team science, reproducible research, and open scientific perspective (i.e. by including a collaborative group project that uses a transparent and reproducible analysis workflow).
Using practical, applied, and hands-on lessons and exercises, with a few short lectures that introduce a topic.

We do not teach

Statistics (these are already covered by most universities).