Photo by Andhika Soreng on Unsplash
Biking is the environmentally friendly way to commute, and a fun activity for kids. But bike crashes are no joke!
Note, this is the Durham in North Carolina, US, not the Durham in England.
The data for this assignment comes from Durham Open Data. The data contains bike crashes between 2007 and 2014.
The goal of this assignment is to keep you practicing your data visualization skills while also adding on tasks for data manipulation like filtering and transforming.
If you have any questions about the assignment, please post them on Piazza!
IMPORTANT: If there is no GitHub repo created for you for this assignment, it means I didn’t have your GitHub username as of when I assigned the homework. Please let me know your GitHub username asap, and I can create your repo.
Go to the course GitHub organization and locate your HW 1 repo, which should be named hw-02-bike-crash-YOUR_GITHUB_USERNAME
. Grab the URL of the repo, and clone it in RStudio. Refer to Lab 01 if you would like to see step-by-step instructions for cloning a repo into an RStudio project.
First, open the R Markdown document hw-02-bike-crash.Rmd
and Knit it. Make sure it compiles without errors. The output will be in the file markdown .md
file with the same name.
We’ll use the tidyverse package for the analysis, as usual. This package is already installed for you, so you load it as usual by running the following in your Console:
The data is in a CSV (comma separated values) file called ncbikecrash.csv
in the data/
folder in your repository. You can load this file into R using the read_csv()
function.
Below is the full data dictionary. Note that it is long (there are lots of variables in the data), but we will be using a limited sed of the variables for our analysis.
object_id
: Crash IDcity
: City of crashcounty
: County of crashregion
: Region of crashdevelopment
: Development area of crashlocality
: Locality of crashon_road
: Road where crash happenedrural_urban
: Whether crash happened on rural or urban roadspeed_limit
: Speed limit where crash happenedtraffic_control
: Type of traffic control where crash happenedweather
: Weather at the time of crashworkzone
: Whether crash happened in a work zonebike_age
: Age of bikerbike_age_group
: Age group of bikerbike_alcohol
: Whether biker had alcoholbike_alcohol_drugs
: Whether biker had alcohol or drugsbike_direction
: Direction of bike at the time of crashbike_injury
: Injury of bikerbike_position
: Position of bike at the time of crashbike_race
: Race of bikerbike_sex
: Sex of bikerdriver_age
: Age of driverdriver_age_group
: Age group of driverdriver_alcohol
: Whether driver had alcoholdriver_alcohol_drugs
: Whether driver had alcohol or drugsdriver_est_speed
: Estimated speed of driverdriver_injury
: Injury of driverdriver_race
: Race of driverdriver_sex
: Sex of driverdriver_vehicle_type
: Type of vehicle involved in crashcrash_alcohol
: Whether alcohol was involved in crashcrash_date
: Date of crashcrash_day
: Day of crashcrash_group
: Type of crashcrash_hour
: Hour of crashcrash_location
: Location of crashcrash_month
: Month of crashcrash_severity
: Severity of crashcrash_time
: Time of crashcrash_type
: Type of crashcrash_year
: Year of crashambulance_req
: Whether ambulance was requiredhit_run
: Whether accident was a hit and runlight_condition
: Light condition at the time of crashroad_character
: Road characteristicsroad_class
: Road classroad_condition
: Road conditionroad_configuration
: Road configurationroad_defects
: Road defectsroad_feature
: Road featureroad_surface
: Road surfacenum_lanes
: Number of lanesgeo_point
: Latitude and longitude of crashncbikecrash
has been added to your environment (in the Environment tab) in the top right. Click on this object to view the data in the data viewer. What does each row in the dataset represent?Hint: The Markdown Quick Reference sheet has an example of inline R code that might be helpful. You can access it from the Help menu in RStudio. Last week you used nrow()
to find the number of rows. Use ncol()
for the number of columns.
✅ ⬆️ Now is a good time to commit and push your changes to GitHub with an appropriate commit message (e.g. “Dimensions of data”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
✅ ⬆️ This is again a good time to commit and push your changes to GitHub with an appropriate commit message (e.g. “Filter for residential and young driver”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Hint: See the help for the count()
function, specifically the sort
argument for reporting the frequency table in descending order of counts, i.e. highest on top.
driver_est_speed
) involved in the crash. What is the most common estimated speed range in the dataset?Don’t forget to label your R chunk as well (where it says label-me-1
). Your label should be short, informative, and shouldn’t include spaces. It also shouldn’t repeat a previuous label, otherwise R Markdown will give you an error about repeated R chunk labels.
✅ ⬆️ Commit and push your changes again with an appropriate commit message (e.g. “Most common estimated speed”). Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Don’t forget to label your R chunk as well (where it says label-me-2
). Your label should be short, informative, shouldn’t include spaces, and shouldn’t shouldn’t repeat a previuous label.
Play around with the fig.height
and fig.width
options in the R chunk definitions until you’re satisfied with the dimensions of the figure.
Hint: To match the colors, you can use scale_fill_viridis_d()
.
✅ ⬆️ This is a another good place to pause, commit changes with the commit message like “Recreated crash severity and alcohol figure”. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Hint: Instead of changing the legend, change how the data are represented in the crash_severity
variable with mutate()
.
A:
, B:
, etc. doesn’t show up.For this question you’ll need to add an R chunk, label it, and define preferences for the figure’s height and width.
Not sure how to use emojis on your computer? Maybe a classmate can help? Or you can ask on Piazza or student hours!
✅ ⬆️ Yay, you’re done! Commit all remaining changes, use the commit message “Recreated figure with cleaner labels, done with HW 2! 💪”, and push. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.