In this lesson, you’ll learn how to wrangle data using the dplyr
package in the tidyverse
When you are finished, you should be able to…
Understand what the tidyverse is
Use the pipe operator (%>%
)
Use the five main dplyr
verbs:
filter()
arrange()
select()
mutate()
summarize()
Use group_by()
to perform groupwise operations
Question 1: Suppose we would like to study how the ratio of penguin body mass to flipper size differs across the species. Rearrange the following steps in the pipeline into an order that accomplishes this goal.
# a
arrange(avg_mass_flipper_ratioo)
# b
group_by(species)
# c
penguins
# d
summarize(
avg_mass_flipper_ratioo = median(mass_flipper_ratio)
)
# e
mutate(
mass_flipper_ratio = body_mass_g/flipper_length_mm
)
Question 2:
Consider the base R code below.
For each of the following dplyr
pipelines, indicate if it
# a
penguins %>%
filter("body_mass_g") %>%
pull("Adelie") %>%
mean()
# b
penguins %>%
filter(species == "Adelie") %>%
select(body_mass_g) %>%
summarize(mean(body_mass_g))
# c
penguins %>%
pull(body_mass_g) %>%
filter(species == "Adelie") %>%
mean()
# d
penguins %>%
filter(species == "Adelie") %>%
select(body_mass_g) %>%
mean()
# e
penguins %>%
filter(species == "Adelie") %>%
pull(body_mass_g) %>%
mean()
# f
penguins %>%
select(species == "Adelie") %>%
filter(body_mass_g) %>%
summarize(mean(body_mass_g))