class: center, middle, inverse, title-slide # Scientific studies and confounding
π ### Dr.Β Γetinkaya-Rundel --- layout: true <div class="my-footer"> <span> Dr. Mine Γetinkaya-Rundel - <a href="https://introds.org" target="_blank">introds.org </a> </span> </div> --- ## Week 5 - Scientific studies, confounding, planning, and effective communication - Project proposals - Course and team evals - Team meetings - Regrade requests .question[ .large[ Any questions? ] ] --- class: center, middle # Scientific studies --- ## Scientific studies .pull-left[ **Observational** - Collect data in a way that does not interfere with how the data arise ("observe") - Only establish an association ] .pull-right[ **Experimental** - Randomly assign subjects to treatments - Establish causal connections ] <br> -- .question[ π₯ Design a study comparing average energy levels of people who do and do not exercise -- both as an observational study and as an experiment. ]
03
:
00
--- ## Study: Breakfast cereal keeps girls slim .midi[ *Girls who ate breakfast of any type had a lower average body mass index, a common obesity gauge, than those who said they didn't. The index was even lower for girls who said they ate cereal for breakfast, according to findings of the study conducted by the Maryland Medical Research Institute with funding from the National Institutes of Health (NIH) and cereal-maker General Mills.* [...] *The results were gleaned from a larger NIH survey of 2,379 girls in California, Ohio, and Maryland who were tracked between the ages of 9 and 19.* [...] *As part of the survey, the girls were asked once a year what they had eaten during the previous three days.* [...] ] .question[ What is the explanatory and what is the response variable? ] .footnote[ Souce: [Study: Cereal Keeps Girls Slim](https://www.cbsnews.com/news/study-cereal-keeps-girls-slim/), Retrieved Sep 13, 2018. ] --- ### 3 possible explanations -- - Eating breakfast causes girls to be slimmer -- - Being slim causes girls to eat breakfast -- - A third variable is responsible for both -- a **confounding** variable: an extraneous variable that affects both the explanatory and the response variable, and that make it seem like there is a relationship between them --- ## Correlation != causation <img src="img/xkcdcorrelation.png" width="80%" height="50%" style="display: block; margin: auto;" /> .footnote[ Randall Munroe CC BY-NC 2.5 http://xkcd.com/552/ ] --- ## Stu!dies and conclusions <img src="img/random_sample_assign_grid.png" width="80%" height="50%" style="display: block; margin: auto;" /> --- class: center, middle # Conditional probability --- ## Conditional probability **Notation**: `\(P(A | B)\)`: Probability of event A given event B - What is the probability that it be unseasonably warm tomorrow? - What is the probability that it be unseasonably warm tomorrow, given that it it was unseasonably warm tomorrow? --- .midi[ A July 2019 YouGov survey asked 1633 GB and 1333 USA randomly selected adults which of the following statements about the global environment best describes their view: .small[ - The climate is changing and human activity is mainly responsible - The climate is changing and human activity is partly responsible, together with other factors - The climate is changing but human activity is not responsible at all - The climate is not changing ] The distribution of the responses by country of respondent is shown below. ] <br> .small[ <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> The climate is changing and human activity is mainly responsible </th> <th style="text-align:right;"> The climate is changing and human activity is partly responsible, together with other factors </th> <th style="text-align:right;"> The climate is changing but human activity is not responsible at all </th> <th style="text-align:right;"> The climate is not changing </th> <th style="text-align:right;"> Don't know </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> GB </td> <td style="text-align:right;width: 0.5 in; "> 833 </td> <td style="text-align:right;width: 0.5 in; "> 604 </td> <td style="text-align:right;width: 0.5 in; "> 49 </td> <td style="text-align:right;width: 0.5 in; "> 33 </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 1633 </td> </tr> <tr> <td style="text-align:left;"> US </td> <td style="text-align:right;width: 0.5 in; "> 507 </td> <td style="text-align:right;width: 0.5 in; "> 493 </td> <td style="text-align:right;width: 0.5 in; "> 120 </td> <td style="text-align:right;width: 0.5 in; "> 80 </td> <td style="text-align:right;"> 133 </td> <td style="text-align:right;"> 1333 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;width: 0.5 in; "> 1340 </td> <td style="text-align:right;width: 0.5 in; "> 1097 </td> <td style="text-align:right;width: 0.5 in; "> 169 </td> <td style="text-align:right;width: 0.5 in; "> 113 </td> <td style="text-align:right;"> 247 </td> <td style="text-align:right;"> 2966 </td> </tr> </tbody> </table> ] .footnote[ Source: [YouGov - International Climate Change Survey](https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/epjj0nusce/YouGov%20-%20International%20climate%20change%20survey.pdf) ] --- .question[ π₯ - What percent of (1) all respondents, (2) GB respondents, (3) US respondents think the climate is changing and human activity is mainly responsible? - Based on the percentages you calculate, does there appear to be a relationship between country ands beliefs about climate change? Explain your reasoning. - If yes, could there be another variable that explains this relationship? ] .small[ <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> The climate is changing and human activity is mainly responsible </th> <th style="text-align:right;"> The climate is changing and human activity is partly responsible, together with other factors </th> <th style="text-align:right;"> The climate is changing but human activity is not responsible at all </th> <th style="text-align:right;"> The climate is not changing </th> <th style="text-align:right;"> Don't know </th> <th style="text-align:right;"> Sum </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> GB </td> <td style="text-align:right;width: 0.5 in; "> 833 </td> <td style="text-align:right;width: 0.5 in; "> 604 </td> <td style="text-align:right;width: 0.5 in; "> 49 </td> <td style="text-align:right;width: 0.5 in; "> 33 </td> <td style="text-align:right;"> 114 </td> <td style="text-align:right;"> 1633 </td> </tr> <tr> <td style="text-align:left;"> US </td> <td style="text-align:right;width: 0.5 in; "> 507 </td> <td style="text-align:right;width: 0.5 in; "> 493 </td> <td style="text-align:right;width: 0.5 in; "> 120 </td> <td style="text-align:right;width: 0.5 in; "> 80 </td> <td style="text-align:right;"> 133 </td> <td style="text-align:right;"> 1333 </td> </tr> <tr> <td style="text-align:left;"> Sum </td> <td style="text-align:right;width: 0.5 in; "> 1340 </td> <td style="text-align:right;width: 0.5 in; "> 1097 </td> <td style="text-align:right;width: 0.5 in; "> 169 </td> <td style="text-align:right;width: 0.5 in; "> 113 </td> <td style="text-align:right;"> 247 </td> <td style="text-align:right;"> 2966 </td> </tr> </tbody> </table> ]
04
:
00
--- ## Independence .question[ π₯ Inspired by the previous example and how we used the conditional probabilities to make conclusions, come up with a definition of independent events. If easier, you can keep the context limited to the example (independence/dependence of beliefs about climate change and country), but try to push yourself to make a more general statement. ]
03
:
00
--- class: center, middle # Simpson's paradox --- ## Relationships between variables - Relationship between two variables: Fitness `\(\rightarrow\)` Heart health - Relationship between multiple variables: Calories + Age + Fitness `\(\rightarrow\)` Heart health --- ## Relationship between two variables <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> x </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> y </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 8 </td> </tr> </tbody> </table> -- <img src="w5_d1-studies-confounding_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ## Relationship between two variables <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> x </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> y </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 8 </td> </tr> </tbody> </table> <img src="w5_d1-studies-confounding_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ## Considering a third variable <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> x </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 9 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> y </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 8 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> z </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> B </td> <td style="text-align:left;"> B </td> <td style="text-align:left;"> B </td> <td style="text-align:left;"> B </td> </tr> </tbody> </table> <img src="w5_d1-studies-confounding_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Relationship between three variables <table> <tbody> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> x </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 9 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> y </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> 8 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;border-right:1px solid;"> z </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> B </td> <td style="text-align:left;"> B </td> <td style="text-align:left;"> B </td> <td style="text-align:left;"> B </td> </tr> </tbody> </table> <img src="w5_d1-studies-confounding_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ## Simpson's paradox - Not considering an important variable when studying a relationship can result in **Simpson's paradox**. - Simpson's paradox illustrates the effect the omission of an explanatory variable can have on the measure of association between another explanatory variable and a response variable. - In other words, the inclusion of a third variable in the analysis can change the apparent relationship between the other two variables. --- ## Berkeley admission data - Study carried out by the graduate Division of the University of California, Berkeley in the early 70βs to evaluate whether there was a sex bias in graduate admissions. - The data come from six departments. For confidentiality we'll call them A-F. - We have information on whether the applicant was male or female and whether they were admitted or rejected. - First, we will evaluate whether the percentage of males admitted is indeed higher than females, overall. Next, we will calculate the same percentage for each department. --- ## Data ``` ## # A tibble: 4,526 x 3 ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` --- ## Skim the data ```r library(skimr) *skim(ucb_admit) ``` ``` ## Skim summary statistics ## n obs: 4526 ## n variables: 3 ## ## ββ Variable type:character βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## variable missing complete n min max empty n_unique ## admit 0 4526 4526 8 8 0 2 ## dept 0 4526 4526 1 1 0 6 ## sex 0 4526 4526 4 6 0 2 ``` --- ## Overall sex distribution .question[ What can you say about the overall sex distribution? Hint: Calculate the following probabilities: `\(P(Admit | Male)\)` and `\(P(Admit | Female)\)`. ] ```r ucb_admit %>% count(sex, admit) ``` ``` ## # A tibble: 4 x 3 ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` --- ## Overall sex distribution .question[ What type of visualization would be appropriate for representing these data? ] ```r ucb_admit %>% count(sex, admit) %>% group_by(sex) %>% mutate(prop_admit = n / sum(n)) ``` ``` ## # A tibble: 4 x 4 ## # Groups: sex [2] ## sex admit n prop_admit ## <chr> <chr> <int> <dbl> ## 1 Female Admitted 557 0.304 ## 2 Female Rejected 1278 0.696 ## 3 Male Admitted 1198 0.445 ## 4 Male Rejected 1493 0.555 ``` --- ## Overall sex distribution ```r ggplot(ucb_admit, mapping = aes(x = sex, fill = admit)) + geom_bar(position = "fill") + labs(y = "", title = "Admit by sex") ``` ![](w5_d1-studies-confounding_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- ## Sex distribution, by department .question[ What can you say about the sex distribution by department ? ] ```r ucb_admit %>% count(dept, sex, admit) ``` ``` ## # A tibble: 24 x 4 ## dept sex admit n ## <chr> <chr> <chr> <int> ## 1 A Female Admitted 89 ## 2 A Female Rejected 19 ## 3 A Male Admitted 512 ## 4 A Male Rejected 313 ## 5 B Female Admitted 17 ## 6 B Female Rejected 8 ## 7 B Male Admitted 353 ## 8 B Male Rejected 207 ## 9 C Female Admitted 202 ## 10 C Female Rejected 391 ## # β¦ with 14 more rows ``` --- ## Sex distribution, by department .question[ π₯ Let's try again... What can you say about the sex distribution by department? ] ```r ucb_admit %>% count(dept, sex, admit) %>% pivot_wider(names_from = dept, values_from = n) ``` ``` ## # A tibble: 4 x 8 ## sex admit A B C D E F ## <chr> <chr> <int> <int> <int> <int> <int> <int> ## 1 Female Admitted 89 17 202 131 94 24 ## 2 Female Rejected 19 8 391 244 299 317 ## 3 Male Admitted 512 353 120 138 53 22 ## 4 Male Rejected 313 207 205 279 138 351 ```
03
:
00
--- ## Sex distribution, by department .question[ What type of visualization would be appropriate for representing these data? ] .small[ ```r ucb_admit %>% count(dept, sex, admit) %>% group_by(dept, sex) %>% mutate(perc_admit = n / sum(n)) %>% filter(admit == "Admitted") ``` ``` ## # A tibble: 12 x 5 ## # Groups: dept, sex [12] ## dept sex admit n perc_admit ## <chr> <chr> <chr> <int> <dbl> ## 1 A Female Admitted 89 0.824 ## 2 A Male Admitted 512 0.621 ## 3 B Female Admitted 17 0.68 ## 4 B Male Admitted 353 0.630 ## 5 C Female Admitted 202 0.341 ## 6 C Male Admitted 120 0.369 ## 7 D Female Admitted 131 0.349 ## 8 D Male Admitted 138 0.331 ## 9 E Female Admitted 94 0.239 ## 10 E Male Admitted 53 0.277 ## 11 F Female Admitted 24 0.0704 ## 12 F Male Admitted 22 0.0590 ``` ] --- ## Sex distribution, by department ```r ggplot(ucb_admit, mapping = aes(x = sex, fill = admit)) + geom_bar(position = "fill") + facet_grid(. ~ dept) + labs(x = "Sex", y = "", fill = "Admission", title = "Admit by sex by department") ``` ![](w5_d1-studies-confounding_files/figure-html/unnamed-chunk-26-1.png)<!-- --> --- ## Sex distribution, by department .small[ ```r ggplot(ucb_admit, mapping = aes(x = sex, fill = admit)) + geom_bar(position = "fill") + scale_y_continuous(labels = percent) + facet_wrap(. ~ dept) + coord_flip() + labs(x = "", y = "", fill = "", title = "Admissions by sex and department") + theme(legend.position = "bottom") ``` ![](w5_d1-studies-confounding_files/figure-html/unnamed-chunk-27-1.png)<!-- --> ] --- ![](w5_d1-studies-confounding_files/figure-html/unnamed-chunk-28-1.png)<!-- --> ![](w5_d1-studies-confounding_files/figure-html/unnamed-chunk-29-1.png)<!-- --> --- class: center, middle # group_by --- ## What does group_by() do? `group_by()` takes an existing `tbl` and converts it into a grouped `tbl` where operations are performed "by group": .pull-left[ ```r ucb_admit ``` ``` ## # A tibble: 4,526 x 3 ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] .pull-right[ ```r ucb_admit %>% group_by(sex) ``` ``` ## # A tibble: 4,526 x 3 ## # Groups: sex [2] ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] --- ## What does group_by() not do? `group_by()` does not sort the data, `arrange()` does: .pull-left[ ```r ucb_admit %>% group_by(sex) ``` ``` ## # A tibble: 4,526 x 3 ## # Groups: sex [2] ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] .pull-right[ ```r ucb_admit %>% arrange(sex) ``` ``` ## # A tibble: 4,526 x 3 ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Female A ## 2 Admitted Female A ## 3 Admitted Female A ## 4 Admitted Female A ## 5 Admitted Female A ## 6 Admitted Female A ## 7 Admitted Female A ## 8 Admitted Female A ## 9 Admitted Female A ## 10 Admitted Female A ## # β¦ with 4,516 more rows ``` ] --- ## What does group_by() not do? `group_by()` does not create frequency tables, `count()` does: .pull-left[ ```r ucb_admit %>% group_by(sex) ``` ``` ## # A tibble: 4,526 x 3 ## # Groups: sex [2] ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] .pull-right[ ```r ucb_admit %>% count(sex) ``` ``` ## # A tibble: 2 x 2 ## sex n ## <chr> <int> ## 1 Female 1835 ## 2 Male 2691 ``` ] --- ## Undo grouping with ungroup() .pull-left[ ```r ucb_admit %>% count(sex, admit) %>% group_by(sex) %>% mutate(prop_admit = n / sum(n)) %>% select(sex, prop_admit) ``` ``` ## # A tibble: 4 x 2 ## # Groups: sex [2] ## sex prop_admit ## <chr> <dbl> ## 1 Female 0.304 ## 2 Female 0.696 ## 3 Male 0.445 ## 4 Male 0.555 ``` ] .pull-right[ ```r ucb_admit %>% count(sex, admit) %>% group_by(sex) %>% mutate(prop_admit = n / sum(n)) %>% select(sex, prop_admit) %>% ungroup() ``` ``` ## # A tibble: 4 x 2 ## sex prop_admit ## <chr> <dbl> ## 1 Female 0.304 ## 2 Female 0.696 ## 3 Male 0.445 ## 4 Male 0.555 ``` ] --- class: center, middle # count --- ## count() is a short-hand `count()` is a short-hand for `group_by()` and then `summarise()` to count the number of observations in each group: .pull-left[ ```r ucb_admit %>% group_by(sex) %>% summarise(n = n()) ``` ``` ## # A tibble: 2 x 2 ## sex n ## <chr> <int> ## 1 Female 1835 ## 2 Male 2691 ``` ] .pull-right[ ```r ucb_admit %>% count(sex) ``` ``` ## # A tibble: 2 x 2 ## sex n ## <chr> <int> ## 1 Female 1835 ## 2 Male 2691 ``` ] --- ## count can take multiple arguments .pull-left[ ```r ucb_admit %>% group_by(sex, admit) %>% summarise(n = n()) ``` ``` ## # A tibble: 4 x 3 ## # Groups: sex [2] ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] .pull-right[ ```r ucb_admit %>% count(sex, admit) ``` ``` ## # A tibble: 4 x 3 ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] --- .question[ What is the difference between the two outputs? ] .small[ .pull-left[ ```r ucb_admit %>% group_by(sex, admit) %>% summarise(n = n()) ``` ``` ## # A tibble: 4 x 3 ## # Groups: sex [2] ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] .pull-right[ ```r ucb_admit %>% count(sex, admit) ``` ``` ## # A tibble: 4 x 3 ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] ] <br><br> -- - `count()` ungroups after itself - `summarise()` peels off one layer of grouping - The question mark just means that the number of groups is unkown right now, it will only be computed when/if the next line is executed