Given below are three data visualizations that violate many data visualization best practices. Improve these visualizations using R and the tips for effective visualizations that we introduced in class. You should produce one visualization per dataset. Your visualizaiton should be accompanies by a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plots and why, and how you addressed them in the visualization you created.

On the due date you will give a brief presentation describing one of your improved visualizations and the reasoning for the choices you made.

Learning goals

Telling a story with data.
Data visualization best practices.
Reshaping data.
Merge conflicts

Hello (again) teams!

First things first: get to know your team members. You can find your team assignment for the rest of the semester here.

If there are any issues with the team roster, please let one of the tutors or the professor know asap!

Getting started

Go to the course GitHub organization and locate your Lab 04 repo, which should be named lab-04-ugly-charts-YOUR_TEAMNAME. Grab the URL of the repo, and clone it in RStudio. Refer to Lab 01 if you would like to see step-by-step instructions for cloning a repo into an RStudio project.

First, open the R Markdown document lab-04-ugly-charts.Rmd and Knit it. Make sure it compiles without errors. The output will be in the file markdown .md file with the same name.

Hello Git!

⊕Your email address is the address tied to your GitHub account and your name should be first and last name.

Before we can get started we need to take care of some required housekeeping. Specifically, we need to do some configuration so that RStudio can communicate with GitHub. This requires two pieces of information: your email address and your name.

Run the following (but update it for your name and email!) in the Console to configure git:

library(usethis)
use_git_config(user.name = "Your Name", 
               user.email = "your.email@address.com")

Workflow

This is the second week you’re working in teams, so we’re going to make things a little more interesting and let all of you make changes and push those changes to your team repository. Sometimes things will go swimmingly, and sometimes you’ll run into merge conflicts. So our first task today is to walk you through a merge conflict!

Merge conflicts

When two collaborators make changes to a file and push the file to their repo, git merges these two files.
If these two files have conflicting content on the same line, git will produce a merge conflict.

Set up

Clone the repo for your next assignment in RStudio, and open the .Rmd file.
Assign the numbers 1, 2, 3, and 4 to each of the team members.

Let’s cause a merge conflict!

Take turns in completing the exercise, only one member at a time.

Member 1: Change the team name to your actual team name, knit, commit, push.
🛑 Wait for instructions before moving on to the next step.
Member 2: Change the team name to some other word, knit, commit, push. You should get an error. Pull. Take a look at the document with the merge conflict. Clear the merge conflict by choosing the correct/preferred change. Commit, and push.
🛑 Wait for instructions before moving on to the next step.
Member 3: Add a label to the first code chunk, knit, commit, push. You should get an error. Pull. No merge conflicts should occur. Now push.
🛑 Wait for instructions before moving on to the next step.
Member 4: Add a different label to the first code chunk, knit, commit, push. You should get an error. Pull. Take a look at the document with the merge conflict. Clear the merge conflict by choosing the correct/preferred change. Commit, and push.
🛑 Wait for instructions before moving on to the next step.
All members: Pull, and observe the changes in your document.

Tips for collaborating via GitHub

Always pull first before you start working.
Commit, and push, often to minimize merge conflicts and/or to make merge conflicts easier to resolve.
If you find yourself in a situation that is difficult to resolve, ask questions asap, don’t let it linger and get bigger.

Packages

Run the following code in the Console to load this package.

library(tidyverse)

Take a sad plot and make it better

Instructional staff employment trends

The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.

Let’s start by loadong the data used to create this plot.

staff <- read_csv("data/instructional-staff.csv")

Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.

## # A tibble: 5 x 12
##   faculty_type `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005`
##   <chr>         <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Full-Time T…   29     27.6   25     24.8   21.8   20.3   19.3   17.8
## 2 Full-Time T…   16.1   11.4   10.2    9.6    8.9    9.2    8.8    8.2
## 3 Full-Time N…   10.3   14.1   13.6   13.6   15.2   15.5   15     14.8
## 4 Part-Time F…   24     30.4   33.1   33.2   35.5   36     37     39.3
## 5 Graduate St…   20.5   16.5   18.1   18.8   18.7   19     20     19.9
## # … with 3 more variables: `2007` <dbl>, `2009` <dbl>, `2011` <dbl>

In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the long format to wide format.

But before we do so, a thought exercise: If the long data will have a row for each year/faculty type combination, and there are 5 faculty types and 11 years of data, how many rows will the data have?

We do the wide to long converstion using a new function: pivot_longer(). The animation below show how this function works, as well as its counterpart pivot_wider().

The function has the following arguments:

pivot_longer(data, cols, names_to = "name")

The first argument is data as usual.
The second argument, cols, is where you specify which columns to pivot into longer format – in this case all columns except for the faculty_type
The third argument, names_to, is a string specifying the name of the column to create from the data stored in the column names of data – in this case year

staff_long <- staff %>%
  pivot_longer(cols = -faculty_type, names_to = "year") %>%
  mutate(value = as.numeric(value))

Let’s take a look at what the new longer data frame looks like.

staff_long

## # A tibble: 55 x 3
##    faculty_type              year  value
##    <chr>                     <chr> <dbl>
##  1 Full-Time Tenured Faculty 1975   29  
##  2 Full-Time Tenured Faculty 1989   27.6
##  3 Full-Time Tenured Faculty 1993   25  
##  4 Full-Time Tenured Faculty 1995   24.8
##  5 Full-Time Tenured Faculty 1999   21.8
##  6 Full-Time Tenured Faculty 2001   20.3
##  7 Full-Time Tenured Faculty 2003   19.3
##  8 Full-Time Tenured Faculty 2005   17.8
##  9 Full-Time Tenured Faculty 2007   17.2
## 10 Full-Time Tenured Faculty 2009   16.8
## # … with 45 more rows

And now let’s plot is as a line plot. A possible approach for creating a line plot where we color the lines by faculty type is the following:

staff_long %>%
  ggplot(aes(x = year, y = value, color = faculty_type)) +
  geom_line()

## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?

But note that this resuls in a message as well as an unexpected plot. The message is saying that there is only one observation for each faculty type year combination. We can fix this using the group aesthetic following.

staff_long %>%
  ggplot(aes(x = year, y = value, group = faculty_type, color = faculty_type)) +
  geom_line()

Include the line plot you made above in your report and make sure the figure width is large enough to make it legible. Also fix the title, axis labels, and legend label.
Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story. (You don’t need to implement these changes now, you will get to do that as part of this week’s homework. But work as a team to come up with ideas and list them as bullet points. The more precise you are, the easier your homework will be.)

✅ ⬆️ Commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Fisheries

Fisheries and Aquaculture Department of the Food and Agriculture Organization of the United Nations collects data on fisheries production of countries. This Wikipedia page lists fishery production of countries for 2016. For each country tonnage from capture and aquaculture are listed. Note that countries whose total harvest was less than 100,000 tons are not included in the visualization.

A researcher shared with you the following visualization they created based on these data 😳.

Can you help them make improve it? First, brainstorm how you would improve it. Then create the improved visualization and write up the changes/decisions you made as bullet points. It’s ok if some of your improvements are aspirational, i.e. you don’t know how to implement it, but you think it’s a good idea. Ask a tutor for help, but also keep an eye on the time. Implement what you can and leave note identfying the aspirational improvements.

fisheries <- read_csv("data/fisheries.csv")