Any questions?
"The simple graph has brought more information to the data analyst’s mind than any other device." — John Tukey
gg
in "ggplot2" stands for Grammar of GraphicsA grammar of graphics is a tool that enables us to concisely describe the components of a graphic
Source: BloggoType
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
## Warning: Removed 28 rows containing missing values (geom_point).
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs( title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)" )
## Warning: Removed 28 rows containing missing values (geom_point).
What does geom_smooth()
do?
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + geom_smooth() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
ggplot()
is the main function in ggplot2 ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) + geom_xxx() + other options
library(tidyverse)
starwars
## # A tibble: 87 x 13## name height mass hair_color skin_color eye_color birth_year gender## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke… 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA> ## 4 Dart… 202 136 none white yellow 41.9 male ## 5 Leia… 150 49 brown light brown 19 female## 6 Owen… 178 120 brown, gr… light blue 52 male ## 7 Beru… 165 75 brown light blue 47 female## 8 R5-D4 97 32 <NA> white, red red NA <NA> ## 9 Bigg… 183 84 black light brown 24 male ## 10 Obi-… 182 77 auburn, w… fair blue-gray 57 male ## # … with 77 more rows, and 5 more variables: homeworld <chr>,## # species <chr>, films <list>, vehicles <list>, starships <list>
Take a glimpse
at the data:
glimpse(starwars)
## Observations: 87## Variables: 13## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "L…## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, …## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.…## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "bro…## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "lig…## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "…## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, …## $ gender <chr> "male", NA, NA, "male", "female", "male", "female", N…## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaa…## $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human",…## $ films <list> [<"Revenge of the Sith", "Return of the Jedi", "The …## $ vehicles <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <…## $ starships <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanc…
How many rows and columns does this dataset have? What does each row represent? What does each column represent?
?starwars
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point()
## Warning: Removed 28 rows containing missing values (geom_point).
## Warning: Removed 28 rows containing missing values (geom_point).
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
How would you describe this relationship? What other variables would help us understand data points that don't follow the overall trend? Who is the not so tall but really chubby character?
We can map additional variables to various features of the plot:
Visual characteristics of plotting characters that can be mapped to a specific variable in the data are
color
size
shape
alpha
(transparency)ggplot(data = starwars, mapping = aes(x = height, y = mass, color = gender)) + geom_point()
ggplot(data = starwars, mapping = aes(x = height, y = mass, color = gender, size = birth_year)) + geom_point()
Let's now increase the size of all points not based on the values of a variable in the data:
ggplot(data = starwars, mapping = aes(x = height, y = mass, color = gender)) + geom_point(size = 2)
aesthetics | discrete | continuous |
---|---|---|
color | rainbow of colors | gradient |
size | discrete steps | linear mapping between radius and value |
shape | different shape for each | shouldn't (and doesn't) work |
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + facet_grid(. ~ gender) + geom_point() + labs(title = "Mass vs. height of Starwars characters", subtitle = "Faceted by gender")
In the next few slides describe what each plot displays. Think about how the code relates to the output.
In the next few slides describe what each plot displays. Think about how the code relates to the output.
The plots in the next few slides do not have proper titles, axis labels, etc. because we want you to figure out what's happening in the plots. But you should always label your plots!
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + facet_grid(gender ~ .)
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + facet_grid(. ~ gender)
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + facet_wrap(~ eye_color)
facet_grid()
: rows ~ cols
.
for no splitfacet_wrap()
: 1d ribbon wrapped into 2dmean
), median (median
), mode (not always useful)range
), standard deviation (sd
), inter-quartile range (IQR
)ggplot(data = starwars, mapping = aes(x = height)) + geom_histogram(binwidth = 10)
## Warning: Removed 6 rows containing non-finite values (stat_bin).
ggplot(data = starwars, mapping = aes(x = height)) + geom_density()
## Warning: Removed 6 rows containing non-finite values (stat_density).
ggplot(data = starwars, mapping = aes(y = height, x = gender)) + geom_boxplot()
## Warning: Removed 6 rows containing non-finite values (stat_boxplot).
ggplot(data = starwars, mapping = aes(x = gender)) + geom_bar()
ggplot(data = starwars, mapping = aes(x = gender, fill = hair_color)) + geom_bar()
starwars <- starwars %>% mutate(hair_color2 = fct_other(hair_color, keep = c("black", "brown", "brown", "blond") ) )
ggplot(data = starwars, mapping = aes(x = gender, fill = hair_color2)) + geom_bar() + coord_flip()
ggplot(data = starwars, mapping = aes(x = gender, fill = hair_color2)) + geom_bar(position = "fill") + coord_flip()
labs(y = "proportion")
## $y## [1] "proportion"## ## attr(,"class")## [1] "labels"
Which bar plot is a more useful representation for visualizing the relationship between gender and hair color?
Any questions?
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |