Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.

On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.

4  Introduction to course

Introduction slides

The slides contain speaking notes that you can view by pressing ‘S’ on the keyboard.

4.1 The Big Picture

You’ve read the syllabus and already know an overall idea of why we are teaching this course. But we’d like to take another chance to emphasize the big picture context of this course and its material.

We are in a special time in research. We are facing several large scale technological and societal changes:

  • We researchers are experiencing higher demands from funding agencies, universities, and peers for transparency and rigor in our research.
  • Our work is getting more and more complex, which requires higher degrees of (potentially highly distributed and virtual) team-based science.
  • There is a higher public attention on research, with mass participation and attention through the Internet and social media.
  • Our access to powerful computing resources and massive datasets is leading to an increasing rise in more complex analytics and data processing, such as through machine learning and AI.
  • Increasingly your research output is someone else’s research input, such as with meta-research1 or meta-analysis.
  • 1 Evidence-based evaluation and development of research methods.

  • In this course, our ultimate aim is to start creating data analysis projects that are: self-contained (within a single folder); have a record of changes made to the files; make it easier for others to collaborate; make it simpler to connect the project with a scientific output like a paper; and, to structure analyses to be more reproducible (or at least more easily inspectable).

    4.2 Common questions

    Reading task: ~5 minutes

    Over the number of times we’ve taught this course (and the others), we get asked a lot of questions. Well, sometimes questions, sometimes comments, and sometimes complaints. We value feedback because it improves on the material! We have a Frequently Asked Questions page for keeping track of these questions. But there are a few with a common theme that deserve to be mentioned sooner than later.

    If you want to get help virtually or after the course, you can join the Discord channel where we run virtual coding club sessions as well as where you can ask for help for issues you might have.

    4.2.1 Why R?

    We often get asked: So why are we using R and why learn it? There are many many reasons, some of which are listed below:

    • It is open source and free. Which means that you can take the knowledge and skills you gain for using R anywhere you go in your career.
    • There is very large, fairly friendly online community.
      • So many learning resources, support, and help!
    • There is a massive selection of packages. Need to do something? There’s probably already a package to do it for you.
      • Latest statistical methods.
      • Productivity tools.
      • Report writing.
      • Visualization.
      • Many many more.
    • Recent push to improve teaching, usability, for example with the tidyverse and RStudio.
    • R has one of the best data visualization tools available with ggplot2.
    • So many more powerful capabilities when it comes to:
      • Big Data
      • Programming
      • Reproducibility

    4.3 Why are we learning things other than R?

    At least a few times we’ve gotten feedback in our survey about the fact that we didn’t spend enough time learning R or they expected more R. That’s because there is more to doing data analysis than just R.

    We need to teach about being open, about being reproducible, and about using better research practices in our work. All of which involves more than just R.

    Some reasons why we teach what we do in each of the sessions.
    Session Reason
    Management of R projects Reproducibility starts at the file level.
    Version control Openness and reproducibility is about transparency and inspection.
    Data management and wrangling This one is about R 😺
    Creating reproducible documents Hopefully obvious 😝
    Data visualization Also about R!

    4.4 Small reminder

    We always get feedback that for some it is too fast and for others it is too slow. Ideally, everyone would say it was the perfect speed. But that won’t likely happen. So instead, we aim as much as possible to have fewer people say it was too fast than there are people saying it was too slow. This is an introduction course, so we’re trying to assume as little to no knowledge on many of these concepts. So, for those with some knowledge, it will feel slow at times! You can always help your neighbour out!