NEW: Join RStudio Cloud here: https://rstudio.cloud/spaces/34062/join?access_code=%2FoMkTUWkzo8%2B7q86yRe0FlKhrcpmgwsk7sCNnNNH
And then click on new project from Git.
Given below are three data visualizations that violate many data visualization best practices. Improve these visualizations using R and the tips for effective visualizations that we introduced in class. You should produce one visualization per dataset. Your visualizaiton should be accompanies by a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plots and why, and how you addressed them in the visualization you created.
On the due date you will give a brief presentation describing one of your improved visualizations and the reasoning for the choices you made.
First things first: get to know your team members. You can find your team assignment for the rest of the semester here.
If there are any issues with the team roster, please let one of the tutors or the professor know asap!
Go to the course GitHub organization and locate your Lab 04 repo, which should be named lab-04-ugly-charts-YOUR_TEAMNAME
. Grab the URL of the repo, and clone it in RStudio. Refer to Lab 01 if you would like to see step-by-step instructions for cloning a repo into an RStudio project.
First, open the R Markdown document lab-04-ugly-charts.Rmd
and Knit it. Make sure it compiles without errors. The output will be in the file markdown .md
file with the same name.
Your email address is the address tied to your GitHub account and your name should be first and last name.
Before we can get started we need to take care of some required housekeeping. Specifically, we need to do some configuration so that RStudio can communicate with GitHub. This requires two pieces of information: your email address and your name.
Run the following (but update it for your name and email!) in the Console to configure git:
This is the second week you’re working in teams, so we’re going to make things a little more interesting and let all of you make changes and push those changes to your team repository. Sometimes things will go swimmingly, and sometimes you’ll run into merge conflicts. So our first task today is to walk you through a merge conflict!
Take turns in completing the exercise, only one member at a time.
The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.
Let’s start by loadong the data used to create this plot.
Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.
## # A tibble: 5 x 12
## faculty_type `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Full-Time T… 29 27.6 25 24.8 21.8 20.3 19.3 17.8
## 2 Full-Time T… 16.1 11.4 10.2 9.6 8.9 9.2 8.8 8.2
## 3 Full-Time N… 10.3 14.1 13.6 13.6 15.2 15.5 15 14.8
## 4 Part-Time F… 24 30.4 33.1 33.2 35.5 36 37 39.3
## 5 Graduate St… 20.5 16.5 18.1 18.8 18.7 19 20 19.9
## # … with 3 more variables: `2007` <dbl>, `2009` <dbl>, `2011` <dbl>
In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the long format to wide format.
But before we do so, a thought exercise: If the long data will have a row for each year/faculty type combination, and there are 5 faculty types and 11 years of data, how many rows will the data have?
We do the wide to long converstion using a new function: pivot_longer()
. The animation below show how this function works, as well as its counterpart pivot_wider()
.
The function has the following arguments:
data
as usual.cols
, is where you specify which columns to pivot into longer format – in this case all columns except for the faculty_type
names_to
, is a string specifying the name of the column to create from the data stored in the column names of data – in this case year
staff_long <- staff %>%
pivot_longer(cols = -faculty_type, names_to = "year") %>%
mutate(value = as.numeric(value))
Let’s take a look at what the new longer data frame looks like.
## # A tibble: 55 x 3
## faculty_type year value
## <chr> <chr> <dbl>
## 1 Full-Time Tenured Faculty 1975 29
## 2 Full-Time Tenured Faculty 1989 27.6
## 3 Full-Time Tenured Faculty 1993 25
## 4 Full-Time Tenured Faculty 1995 24.8
## 5 Full-Time Tenured Faculty 1999 21.8
## 6 Full-Time Tenured Faculty 2001 20.3
## 7 Full-Time Tenured Faculty 2003 19.3
## 8 Full-Time Tenured Faculty 2005 17.8
## 9 Full-Time Tenured Faculty 2007 17.2
## 10 Full-Time Tenured Faculty 2009 16.8
## # … with 45 more rows
And now let’s plot is as a line plot. A possible approach for creating a line plot where we color the lines by faculty type is the following:
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
But note that this resuls in a message as well as an unexpected plot. The message is saying that there is only one observation for each faculty type year combination. We can fix this using the group
aesthetic following.
staff_long %>%
ggplot(aes(x = year, y = value, group = faculty_type, color = faculty_type)) +
geom_line()
Include the line plot you made above in your report and make sure the figure width is large enough to make it legible. Also fix the title, axis labels, and legend label.
Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story. (You don’t need to implement these changes now, you will get to do that as part of this week’s homework. But work as a team to come up with ideas and list them as bullet points. The more precise you are, the easier your homework will be.)
✅ ⬆️ Commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Fisheries and Aquaculture Department of the Food and Agriculture Organization of the United Nations collects data on fisheries production of countries. This Wikipedia page lists fishery production of countries for 2016. For each country tonnage from capture and aquaculture are listed. Note that countries whose total harvest was less than 100,000 tons are not included in the visualization.
A researcher shared with you the following visualization they created based on these data 😳.
✅ ⬆️ Commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Go back through your write up to make sure you’re following coding style guidelines we discussed in class. Make any edits as needed.
Also, make sure all of your R chunks are properly labeled, and your figures are reasonably sized.
Once the team leader for the week pushes their final changes, others should pull the changes and knit the R Markdown document to confirm that they can reproduce the report.
Want to see more ugly charts?