The Tidyverse and dplyr

In this lesson, you’ll learn how to wrangle data using the dplyr package in the tidyverse

When you are finished, you should be able to…

Understand what the tidyverse is
Use the pipe operator (%>%)
Use the five main dplyr verbs:
- filter()
- arrange()
- select()
- mutate()
- summarize()
Use group_by() to perform groupwise operations

Time Estimates:

Videos: 40 min

Readings: 0-60 min

Activities: 20 min

Check-ins: 1

What is the Tidyverse?

Required Video: Intro to the Tidyverse

Optional Video: The beginning of the word ‘tidyverse’

Wrangling data with dplyr

Required Reading: Tibbles

Required Video: dplyr

See the slides.

Recommended Reading: Data Wrangling

Recommended Reading: Data Transformation

Recommended Tutorial: Practice with Dplyr

Check-In 1: dplyr

Question 1: Suppose we would like to study how the ratio of penguin body mass to flipper size differs across the species. Rearrange the following steps in the pipeline into an order that accomplishes this goal.

# a
arrange(avg_mass_flipper_ratioo)


# b
group_by(species)

# c
penguins 
  

# d
summarize(
  avg_mass_flipper_ratioo = median(mass_flipper_ratio)
)
  
# e
mutate(
  mass_flipper_ratio = body_mass_g/flipper_length_mm
)

Question 2:

Consider the base R code below.

mean(penguins[penguins$species == "Adelie", "body_mass_g"])

For each of the following dplyr pipelines, indicate if it

Returns the exact same thing as the Base R code;
Returns the correct information, but the wrong object type;
Returns incorrect information; or
Returns an error

# a
penguins %>%
  filter("body_mass_g") %>%
  pull("Adelie") %>%
  mean()


# b
penguins %>%
  filter(species == "Adelie") %>%
  select(body_mass_g) %>%
  summarize(mean(body_mass_g))


# c
penguins %>%
  pull(body_mass_g) %>%
  filter(species == "Adelie") %>%
  mean()

# d
penguins %>%
  filter(species == "Adelie") %>%
  select(body_mass_g) %>%
  mean()

# e
penguins %>%
  filter(species == "Adelie") %>%
  pull(body_mass_g) %>%
  mean()

# f
penguins %>%
  select(species == "Adelie") %>%
  filter(body_mass_g) %>%
  summarize(mean(body_mass_g))

Walkthrough of `cereals` activity

Optional Video: Live coding of cereals dataset

What is the Tidyverse?

Wrangling data with dplyr

Walkthrough of cereals activity

Walkthrough of `cereals` activity