HW 04 - Legos and instructors

Photo by Daniel Cheung on Unsplash Photo by Daniel Cheung on Unsplash

This week we’ll do some data gymnastics to refesh and review what we learned over the past few weeks.

RStudio Cloud

We will continue to use RStudio Cloud on this assignment. If you have not yet joined the RStudio Cloud workspace for this course, you can do so here. If you were in workshop this week, you probably already did it, but if not, this might be new to you.

Note: If you have already joined the course workspace, you can simply go to rstudio.cloud and log on. However it’s important that then you navigate to the course workspace (it should be listed as IDS - Fall 2019) on the left menu. If you are in the correct workspace, your top bar should look like the following:

And if you are not, once you get started on your work you will see a message about packages you need not being installed. This should prompt you to navigate to the course workspace, and continue your work there.

Once you’re in the workspace click on the Projects tab on top, and create a New project from Git Repo.

Then, copy and paste the URL of your repo, as you do each time. This will clone the repo and get you started with your assignment.

The only new thing you need to do is to introduce yourself to Git once again. Run the following, with replacing "Your Name" with your real name and last name, and "your.email@address.com" with the email address you used for GitHub.

library(usethis)
use_git_config(user.name = "Your Name", 
               user.email = "your.email@address.com")

Packages

In this assignment we will work with the tidyverse as usual.

library(tidyverse)

Lego sales

We have data from lego sales in 2018 for a sample of customers who bought legos in the US. Load the data using the following:

The codebook for the dataset is as follows.

Answer the following questions using pipelines. For each question, state your answer in a sentence, e.g. “The first three common names of purchasers are …”.

  1. What are the three most common first names of purchasers?

  2. What are the three most common themes of lego sets purchased?

  3. Among the most common theme of lego sets purchased, what is the most common subtheme?

Hint: Use the case_when() function.

  1. Create a new variabled called age_group and group the ages into the following categories: “18 and under”, “19 - 25”, “26 - 35”, “36 - 50”, “51 and over”.

Hint: You will need to consider quantity of purchases.

  1. Which age group has purchased the highest number of lego sets.

Hint: You will need to consider quantity of purchases as well as price of lego sets.

  1. Which age group has spent the most money on legos?

  2. Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.