This week we’ll do some data gymnastics to refesh and review what we learned over the past few weeks.

RStudio Cloud

We will continue to use RStudio Cloud on this assignment. If you have not yet joined the RStudio Cloud workspace for this course, you can do so here. If you were in workshop this week, you probably already did it, but if not, this might be new to you.

Note: If you have already joined the course workspace, you can simply go to rstudio.cloud and log on. However it’s important that then you navigate to the course workspace (it should be listed as IDS - Fall 2019) on the left menu. If you are in the correct workspace, your top bar should look like the following:

And if you are not, once you get started on your work you will see a message about packages you need not being installed. This should prompt you to navigate to the course workspace, and continue your work there.

Once you’re in the workspace click on the Projects tab on top, and create a New project from Git Repo.

Then, copy and paste the URL of your repo, as you do each time. This will clone the repo and get you started with your assignment.

The only new thing you need to do is to introduce yourself to Git once again. Run the following, with replacing "Your Name" with your real name and last name, and "your.email@address.com" with the email address you used for GitHub.

library(usethis)
use_git_config(user.name = "Your Name", 
               user.email = "your.email@address.com")

Packages

In this assignment we will work with the tidyverse as usual.

library(tidyverse)

Lego sales

We have data from lego sales in 2018 for a sample of customers who bought legos in the US. Load the data using the following:

lego_sales <- read_csv("data/lego_sales.csv")

The codebook for the dataset is as follows.

first_name: First name of customer
last_name: Last name of customer
age: Age of customer
phone_number: Phone number of customer
set_id: Set ID of lego set purchased
number: Item number of lego set purchased
theme: Theme of lego set purchased
subtheme: Sub theme of lego set purchased
year: Year of purchase
name: Name of lego set purchased
pieces: Number of pieces of legos in set purchased
us_price: Price of set purchase in US Dollars
image_url: Image URL of lego set purchased
quantity: Quantity of lego set(s) purchased

Answer the following questions using pipelines. For each question, state your answer in a sentence, e.g. “The first three common names of purchasers are …”.

What are the three most common first names of purchasers?
What are the three most common themes of lego sets purchased?
Among the most common theme of lego sets purchased, what is the most common subtheme?

⊕Hint: Use the case_when() function.

Create a new variabled called age_group and group the ages into the following categories: “18 and under”, “19 - 25”, “26 - 35”, “36 - 50”, “51 and over”.

⊕Hint: You will need to consider quantity of purchases.

Which age group has purchased the highest number of lego sets.

⊕Hint: You will need to consider quantity of purchases as well as price of lego sets.

Which age group has spent the most money on legos?
Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.

Instructional staff employment trends

You’ve already seen this data set in this week’s lab. It’s the one about trends in instructional staff employees between 1975 and 2011. You can load this data set using the following:

staff <- read_csv("data/instructional-staff.csv")

The data visualization in the report where these data come from was provided in your lab (and can be accessed here). During the workshop you had a chance to discuss with yout teammates how you would improve upon this visualization if the main objective was to communicate that the proportion of part-time faculty have gone up over time compared to other instructional staff types.

Implement the improvements and provide your improved visualization as part of your answer. Also write a few sentences about why you chose to make these improvements and how they address the main goal stated above. Feel free to reuse any code from the lab.

HW 04 - Legos and instructors

RStudio Cloud

Packages

Lego sales

Instructional staff employment trends