This exam’s dataset concerns nutrition information about breakfast cereals. Documentation for the dataset can be found here: https://www.kaggle.com/crawford/80-cereals/version/2
You may download the dataset at this link, or on the course website.
Most cereals are made from either wheat or oats. We would like to explore the health and taste benefits of these base ingredients.
Make a new variable called Seed_Type
. This variable should have the possible values wheat
, oat
, bran
, and unknown
, depending on if the words “wheat”, “oat”, or “bran” appear in the name of the cereal.
Special Cases: “Wheat ‘n’ Bran” should count as wheat; “Oat Bran” should count as oat.
Suppose you are hired by a wheat farming group to convince the world that wheat cereals are better in some way. Make a convincing plot for the superiority of wheat-based cereal. Provide a one-sentence conclusion from your plot.
Repeat Task 2, this time assuming you are hired by an oat farming collective, so that your plot should imply that oat cereals are in some way better. Provide a one-sentence interpretation of your plot.
Like all children, I loved sugary cereal, especially Lucky Charms. My mother was not pleased about how much cereal my brother and I were eating, so she made a rule: We were only allowed to eat cereal with less than six grams of sugar per cup.
Under my mother’s rules, how many cereals in this dataset would we be allowed to eat?
The tyrannical rule was defeated when my brother pointed out that my mother’s favorite cereal, Raisin Bran is in fact less healthy than Lucky Charms. Make a plot illustrating this discovery. Your plot should compare calories, fat, sodium, and sugars between the two cereals. These four nutrients should be plotted in separate facets, not as four different plots.
Recall from lecture that the nutritional measurements in this dataset (calories
, protein
, fat
, sodium
, fiber
, carbo
, sugars
, and potass
) are measured per serving. However, not all cereals have the same serving size! In order to fairly compare them, we need to adjust our data first.
This dataset includes a variable called cups
, which gives the number of cups in a “single serving” of the cereal. It also includes a variable called weight
, which gives the weight in ounces of a single serving.
Create a function called adjust_cereal
to fix this problem. This function should take in as input:
If the user inputs the string “volume”, the function should adjust the measurements to be per cup instead of per serving.
If the user inputs the string “weight”, the function should adjust the measurements to be per ounce instead of per serving.
You may check your function by comparing your output to the ones below:
test_sugars <- c(100, 200)
test_cups <- c(0.75, 1.25)
test_weights <- c(1.3, 1.8)
adjust_cereal(test_sugars, test_cups, test_weights, "volume")
## [1] 133.3333 160.0000
## [1] 76.92308 111.11111
Use your function to update the dataset, such that the 8 nutrients (calories
, protein
, fat
, sodium
, fiber
, carbo
, sugars
, and potass
) are adjusted for the number of cups in a serving (i.e., by volume).
Come up with a visualization or summary with an interesting insight into cereal names. For example, are there any words or phrases that are associated with higher sugars? With higher ratings? With different shelf placement?
For full credit, make use of regular expressions in your analysis.