Photo by Daniel Cheung on Unsplash
This week we’ll do some data gymnastics to refesh and review what we learned over the past few weeks.
We will continue to use RStudio Cloud on this assignment. If you have not yet joined the RStudio Cloud workspace for this course, you can do so here. If you were in workshop this week, you probably already did it, but if not, this might be new to you.
Note: If you have already joined the course workspace, you can simply go to rstudio.cloud and log on. However it’s important that then you navigate to the course workspace (it should be listed as IDS - Fall 2019) on the left menu. If you are in the correct workspace, your top bar should look like the following:
And if you are not, once you get started on your work you will see a message about packages you need not being installed. This should prompt you to navigate to the course workspace, and continue your work there.
Once you’re in the workspace click on the Projects tab on top, and create a New project from Git Repo.
Then, copy and paste the URL of your repo, as you do each time. This will clone the repo and get you started with your assignment.
The only new thing you need to do is to introduce yourself to Git once again. Run the following, with replacing "Your Name"
with your real name and last name, and "your.email@address.com"
with the email address you used for GitHub.
In this assignment we will work with the tidyverse
as usual.
We have data from lego sales in 2018 for a sample of customers who bought legos in the US. Load the data using the following:
The codebook for the dataset is as follows.
first_name
: First name of customerlast_name
: Last name of customerage
: Age of customerphone_number
: Phone number of customerset_id
: Set ID of lego set purchasednumber
: Item number of lego set purchasedtheme
: Theme of lego set purchasedsubtheme
: Sub theme of lego set purchasedyear
: Year of purchasename
: Name of lego set purchasedpieces
: Number of pieces of legos in set purchasedus_price
: Price of set purchase in US Dollarsimage_url
: Image URL of lego set purchasedquantity
: Quantity of lego set(s) purchasedAnswer the following questions using pipelines. For each question, state your answer in a sentence, e.g. “The first three common names of purchasers are …”.
What are the three most common first names of purchasers?
What are the three most common themes of lego sets purchased?
Among the most common theme of lego sets purchased, what is the most common subtheme?
Hint: Use the case_when()
function.
age_group
and group the ages into the following categories: “18 and under”, “19 - 25”, “26 - 35”, “36 - 50”, “51 and over”.Hint: You will need to consider quantity of purchases.
Hint: You will need to consider quantity of purchases as well as price of lego sets.
Which age group has spent the most money on legos?
Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.
You’ve already seen this data set in this week’s lab. It’s the one about trends in instructional staff employees between 1975 and 2011. You can load this data set using the following:
The data visualization in the report where these data come from was provided in your lab (and can be accessed here). During the workshop you had a chance to discuss with yout teammates how you would improve upon this visualization if the main objective was to communicate that the proportion of part-time faculty have gone up over time compared to other instructional staff types.