Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.

On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.

Appendix D — Extra exercises

source(here::here("R/functions.R"))

D.1 Calculate some basic statistics

Practice using summarise() by calculating various summary statistics. Copy and paste the code below into the R/learning.R script file.

# 1.
nhanes_small %>%
    summarise(mean_bp_sys = ___,
              mean_age = ___)

# 2.
nhanes_small %>%
    summarise(max_bp_dia = ___,
              min_bp_dia = ___)

Then, start replacing the ___ with the appropriate code to complete the tasks below. Don’t forget to use na.rm = TRUE in the basic statistic functions.

  1. Calculate the mean of bp_sys_ave and age.
  2. Calculate the max and min of bp_dia.
  3. Lastly, add and commit any changes made to the Git history with the RStudio Git interface.
Click for the solution. Only click if you are struggling or are out of time.
# 1.
nhanes_small %>%
    summarise(mean_bp_sys = mean(bp_sys_ave, na.rm = TRUE),
              mean_age = mean(age, na.rm = TRUE))

# 2.
nhanes_small %>%
    summarise(max_bp_dia = max(bp_dia_ave, na.rm = TRUE),
              min_bp_dia = min(bp_dia_ave, na.rm = TRUE))

D.2 Answer some statistical questions with group by and summarise

Copy and paste the code below into the R/learning.R script file.

# 1. 
nhanes_small %>% 
    filter(!is.na(diabetes)) %>% 
    ___(___, ___) %>% 
    ___(
        ___,
        ___,
        ___
    )

# 2. 
nhanes_small %>% 
    filter(!is.na(diabetes)) %>% 
    ___(___, ___) %>% 
    ___(
        ___,
        ___,
        ___,
        ___,
        ___,
        ___
    )

Then, start replacing the ___ with the appropriate code including group_by() with summarise(), to answer these questions:

  1. What is the mean, max, and min differences in age between active and inactive persons with or without diabetes?
  2. What is the mean, max, and min differences in systolic BP and diastolic BP between active and inactive persons with or without diabetes?
  3. Once done, add and commit the changes to the file to the Git history.
Click for the solution. Only click if you are struggling or are out of time.
# 1. 
nhanes_small %>% 
    filter(!is.na(diabetes)) %>% 
    group_by(diabetes, phys_active) %>% 
    summarise(
        mean_age = mean(age, na.rm = TRUE),
        max_age = max(age, na.rm = TRUE),
        min_age = min(age, na.rm = TRUE)
    )

# 2. 
nhanes_small %>% 
    filter(!is.na(diabetes)) %>% 
    group_by(diabetes, phys_active) %>% 
    summarise(
        mean_bp_sys = mean(bp_sys_ave, na.rm = TRUE),
        max_bp_sys = max(bp_sys_ave, na.rm = TRUE),
        min_bp_sys = min(bp_sys_ave, na.rm = TRUE),
        mean_bp_dia = mean(bp_dia_ave, na.rm = TRUE),
        max_bp_dia = max(bp_dia_ave, na.rm = TRUE),
        min_bp_dia = min(bp_dia_ave, na.rm = TRUE)
    )

D.3 Practicing the dplyr functions

Practice using dplyr by using the NHANES dataset and wrangling the data into a summary output. Don’t create any intermediate objects by only using the pipe operator to link each task below with the next one.

  1. Rename all columns to use snakecase.
  2. Select the columns gender, age and BMI.
  3. Exclude "NAs" from all of the selected columns.
  4. Rename gender to sex.
  5. Create a new column called age_class, where anyone under 50 years old is labeled "under 50" and those 50 years and older are labeled "over 50".
  6. Group the data according to sex and age_class.
  7. Calculate the mean and median BMI according to the grouping to determine the difference in BMI between age classes and sex.
  8. Run styler on the file (Ctrl-Shift-P, then type “style file”).
  9. Add and commit changes to the Git history with the RStudio Git interface (Ctrl-Shift-P, then type “commit”).
Click for the solution. Only click if you are struggling or are out of time.
NHANES %>% 
    rename_with(snakecase::to_snake_case) %>% 
    select(gender, age,  bmi) %>% 
    filter(!is.na(gender) & !is.na(age) & !is.na(bmi)) %>% 
    rename(sex = gender) %>% 
    mutate(age_class = if_else(age < 50, "under 50", "over 50")) %>%
    group_by(age_class, sex) %>% 
    summarize(bmi_mean = mean(bmi, na.rm = TRUE), 
              bmi_median = median(bmi, na.rm = TRUE))