HW 02 - Bike crashes

Photo by Andhika Soreng on Unsplash Photo by Andhika Soreng on Unsplash

Biking is the environmentally friendly way to commute, and a fun activity for kids. But bike crashes are no joke!

Note, this is the Durham in North Carolina, US, not the Durham in England.

The data for this assignment comes from Durham Open Data. The data contains bike crashes between 2007 and 2014.

Learning goals

The goal of this assignment is to keep you practicing your data visualization skills while also adding on tasks for data manipulation like filtering and transforming.

Getting help

If you have any questions about the assignment, please post them on Piazza!

Getting started

IMPORTANT: If there is no GitHub repo created for you for this assignment, it means I didn’t have your GitHub username as of when I assigned the homework. Please let me know your GitHub username asap, and I can create your repo.

Go to the course GitHub organization and locate your HW 1 repo, which should be named hw-02-bike-crash-YOUR_GITHUB_USERNAME. Grab the URL of the repo, and clone it in RStudio. Refer to Lab 01 if you would like to see step-by-step instructions for cloning a repo into an RStudio project.

First, open the R Markdown document hw-02-bike-crash.Rmd and Knit it. Make sure it compiles without errors. The output will be in the file markdown .md file with the same name.

Packages

We’ll use the tidyverse package for the analysis, as usual. This package is already installed for you, so you load it as usual by running the following in your Console:

library(tidyverse)

Data

The data is in a CSV (comma separated values) file called ncbikecrash.csv in the data/ folder in your repository. You can load this file into R using the read_csv() function.

ncbikecrash <- read_csv("data/ncbikecrash.csv")

Below is the full data dictionary. Note that it is long (there are lots of variables in the data), but we will be using a limited sed of the variables for our analysis.

  1. Load the data in the Console with the code above, and observe that an object called ncbikecrash has been added to your environment (in the Environment tab) in the top right. Click on this object to view the data in the data viewer. What does each row in the dataset represent?

Hint: The Markdown Quick Reference sheet has an example of inline R code that might be helpful. You can access it from the Help menu in RStudio. Last week you used nrow() to find the number of rows. Use ncol() for the number of columns.

  1. How many bike crashes were recorded in NC between 2007 and 2014? How many variables are recorded on these crashes? Use inline R code when answering this question.

✅ ⬆️ Now is a good time to commit and push your changes to GitHub with an appropriate commit message (e.g. “Dimensions of data”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. How many bike crashes occurred in residential development areas where the driver was between 0 and 19 years old?

✅ ⬆️ This is again a good time to commit and push your changes to GitHub with an appropriate commit message (e.g. “Filter for residential and young driver”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Hint: See the help for the count() function, specifically the sort argument for reporting the frequency table in descending order of counts, i.e. highest on top.

  1. Create a frequency table of the estimated speed of the car (driver_est_speed) involved in the crash. What is the most common estimated speed range in the dataset?

Don’t forget to label your R chunk as well (where it says label-me-1). Your label should be short, informative, and shouldn’t include spaces. It also shouldn’t repeat a previuous label, otherwise R Markdown will give you an error about repeated R chunk labels.

✅ ⬆️ Commit and push your changes again with an appropriate commit message (e.g. “Most common estimated speed”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

  1. Recreate the following plot, and describe in context of the data what it shows.

Don’t forget to label your R chunk as well (where it says label-me-2). Your label should be short, informative, shouldn’t include spaces, and shouldn’t shouldn’t repeat a previuous label.

Play around with the fig.height and fig.width options in the R chunk definitions until you’re satisfied with the dimensions of the figure.

Hint: To match the colors, you can use scale_fill_viridis_d().

✅ ⬆️ This is a another good place to pause, commit changes with the commit message like “Recreated crash severity and alcohol figure”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Hint: Instead of changing the legend, change how the data are represented in the crash_severity variable with mutate().

  1. Recreate the same figure, but this time change the labels of the crash severity variable such that text like A:, B:, etc. doesn’t show up.

For this question you’ll need to add an R chunk, label it, and define preferences for the figure’s height and width.

Not sure how to use emojis on your computer? Maybe a classmate can help? Or you can ask on Piazza or student hours!

✅ ⬆️ Yay, you’re done! Commit all remaining changes, use the commit message “Recreated figure with cleaner labels, done with HW 2! 💪”, and push. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.