class: center, middle, inverse, title-slide # Communicating data science results effectively
π ### Dr.Β Γetinkaya-Rundel --- layout: true <div class="my-footer"> <span> Dr. Mine Γetinkaya-Rundel - <a href="https://introds.org" target="_blank">introds.org </a> </span> </div> --- ## So far this week... - Designing studies, multivariable relationships, interpreting results - Check your HW 4 repo for styling suggestions! - Mid-semester feedback - due Friday - Team peer evaluations - due next Wednesday - Project: Find a dataset, and do something with it - Proposal: - Find a dataset, and come up with a question you want to answer with it - Perform preliminary EDA to show the question is answerable - More details to follow in this week's assignment .question[ .large[ Any questions? ] ] --- class: center, middle # Simpson's paradox --- ## Berkeley admission data - Study carried out by the graduate Division of the University of California, Berkeley in the early 70βs to evaluate whether there was a sex bias in graduate admissions. - The data come from six departments. For confidentiality we'll call them A-F. - We have information on whether the applicant was male or female and whether they were admitted or rejected. - First, we will evaluate whether the percentage of males admitted is indeed higher than females, overall. Next, we will calculate the same percentage for each department. --- ## Data ``` ## # A tibble: 4,526 x 3 ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` --- ## Skim the data ```r library(skimr) *skim(ucb_admit) ``` ``` ## Skim summary statistics ## n obs: 4526 ## n variables: 3 ## ## ββ Variable type:character βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## variable missing complete n min max empty n_unique ## admit 0 4526 4526 8 8 0 2 ## dept 0 4526 4526 1 1 0 6 ## sex 0 4526 4526 4 6 0 2 ``` --- ## Overall sex distribution .question[ What can you say about the overall sex distribution? What type of visualization would be appropriate for representing these data? ] ```r ucb_admit %>% count(sex, admit) %>% group_by(sex) %>% mutate(prop_admit = n / sum(n)) ``` ``` ## # A tibble: 4 x 4 ## # Groups: sex [2] ## sex admit n prop_admit ## <chr> <chr> <int> <dbl> ## 1 Female Admitted 557 0.304 ## 2 Female Rejected 1278 0.696 ## 3 Male Admitted 1198 0.445 ## 4 Male Rejected 1493 0.555 ``` --- ## Overall sex distribution ```r ggplot(ucb_admit, mapping = aes(x = sex, fill = admit)) + geom_bar(position = "fill") + labs(y = "", title = "Admit by sex") ``` ![](w5_d2-effective-communication_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## Sex distribution, by department ```r ggplot(ucb_admit, mapping = aes(x = sex, fill = admit)) + geom_bar(position = "fill") + facet_grid(. ~ dept) + labs(x = "Sex", y = "", fill = "Admission", title = "Admit by sex by department") ``` ![](w5_d2-effective-communication_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ![](w5_d2-effective-communication_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ![](w5_d2-effective-communication_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- class: center, middle # group_by --- ## What does `group_by()` do? `group_by()` takes an existing `tbl` and converts it into a grouped `tbl` where operations are performed "by group": .pull-left[ ```r ucb_admit ``` ``` ## # A tibble: 4,526 x 3 ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] .pull-right[ ```r ucb_admit %>% group_by(sex) ``` ``` ## # A tibble: 4,526 x 3 ## # Groups: sex [2] ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] --- ## What does `group_by()` not do? `group_by()` does not sort the data, `arrange()` does: .pull-left[ ```r ucb_admit %>% group_by(sex) ``` ``` ## # A tibble: 4,526 x 3 ## # Groups: sex [2] ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] .pull-right[ ```r ucb_admit %>% arrange(sex) ``` ``` ## # A tibble: 4,526 x 3 ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Female A ## 2 Admitted Female A ## 3 Admitted Female A ## 4 Admitted Female A ## 5 Admitted Female A ## 6 Admitted Female A ## 7 Admitted Female A ## 8 Admitted Female A ## 9 Admitted Female A ## 10 Admitted Female A ## # β¦ with 4,516 more rows ``` ] --- ## What does `group_by()` not do? `group_by()` does not create frequency tables, `count()` does: .pull-left[ ```r ucb_admit %>% group_by(sex) ``` ``` ## # A tibble: 4,526 x 3 ## # Groups: sex [2] ## admit sex dept ## <chr> <chr> <chr> ## 1 Admitted Male A ## 2 Admitted Male A ## 3 Admitted Male A ## 4 Admitted Male A ## 5 Admitted Male A ## 6 Admitted Male A ## 7 Admitted Male A ## 8 Admitted Male A ## 9 Admitted Male A ## 10 Admitted Male A ## # β¦ with 4,516 more rows ``` ] .pull-right[ ```r ucb_admit %>% count(sex) ``` ``` ## # A tibble: 2 x 2 ## sex n ## <chr> <int> ## 1 Female 1835 ## 2 Male 2691 ``` ] --- ## Undo grouping with `ungroup()` .pull-left[ ```r ucb_admit %>% count(sex, admit) %>% group_by(sex) %>% mutate(prop_admit = n / sum(n)) %>% select(sex, prop_admit) ``` ``` ## # A tibble: 4 x 2 ## # Groups: sex [2] ## sex prop_admit ## <chr> <dbl> ## 1 Female 0.304 ## 2 Female 0.696 ## 3 Male 0.445 ## 4 Male 0.555 ``` ] .pull-right[ ```r ucb_admit %>% count(sex, admit) %>% group_by(sex) %>% mutate(prop_admit = n / sum(n)) %>% select(sex, prop_admit) %>% ungroup() ``` ``` ## # A tibble: 4 x 2 ## sex prop_admit ## <chr> <dbl> ## 1 Female 0.304 ## 2 Female 0.696 ## 3 Male 0.445 ## 4 Male 0.555 ``` ] --- class: center, middle # `count()` --- ## `count()` is a short-hand `count()` is a short-hand for `group_by()` and then `summarise()` to count the number of observations in each group: .pull-left[ ```r ucb_admit %>% group_by(sex) %>% summarise(n = n()) ``` ``` ## # A tibble: 2 x 2 ## sex n ## <chr> <int> ## 1 Female 1835 ## 2 Male 2691 ``` ] .pull-right[ ```r ucb_admit %>% count(sex) ``` ``` ## # A tibble: 2 x 2 ## sex n ## <chr> <int> ## 1 Female 1835 ## 2 Male 2691 ``` ] --- ## `count()` can take multiple arguments .pull-left[ ```r ucb_admit %>% group_by(sex, admit) %>% summarise(n = n()) ``` ``` ## # A tibble: 4 x 3 ## # Groups: sex [2] ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] .pull-right[ ```r ucb_admit %>% count(sex, admit) ``` ``` ## # A tibble: 4 x 3 ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] --- ## `count()` vs. `group_by() %>% summarise()` - `count()` ungroups after itself - `summarise()` peels off one layer of grouping - The question mark just means that the number of groups is unkown right now, it will only be computed when/if the next line is executed .small[ .pull-left[ ```r ucb_admit %>% group_by(sex, admit) %>% summarise(n = n()) ``` ``` ## # A tibble: 4 x 3 ## # Groups: sex [2] ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] .pull-right[ ```r ucb_admit %>% count(sex, admit) ``` ``` ## # A tibble: 4 x 3 ## sex admit n ## <chr> <chr> <int> ## 1 Female Admitted 557 ## 2 Female Rejected 1278 ## 3 Male Admitted 1198 ## 4 Male Rejected 1493 ``` ] ] --- class: center, middle # Communicating data science results effectively --- # Five core activities of data analysis 1. Stating and refining the question 1. Exploring the data 1. Building formal statistical models 1. Interpreting the results 1. Communicating the results .footnote[ Peng, Roger D., and Elizabeth Matsui. "The Art of Data Science." A Guide for Anyone Who Works with Data. Skybrude Consulting, LLC (2015). ] --- class: center, middle # Stating and refining the question --- # Six types of questions 1. **Descriptive:** summarize a characteristic of a set of data 1. **Exploratory:** analyze to see if there are patterns, trends, or relationships between variables (hypothesis generating) 1. **Inferential:** analyze patterns, trends, or relationships in representative data from a population 1. **Predictive:** make predictions for individuals or groups of individuals 1. **Causal:** whether changing one factor will change another factor, on average, in a population 1. **Mechanistic:** explore "how" as opposed to whether .footnote[ Leek, Jeffery T., and Roger D. Peng. "What is the question?." Science 347.6228 (2015): 1314-1315. ] --- # Ex: Viral illnesses 1. **Descriptive:** frequency of viral illnesses in a set of data collected from a group of individuals -- 1. **Exploratory:** examine relationships between a range of dietary factors and viral illnesses -- 1. **Inferential:** examine whether any relationship between dietary factors and viral illnesses found in the sample hold for the population at large -- 1. **Predictive:** what types of people will eat a diet high in fresh fruits and vegetables during the next year -- 1. **Causal:** whether people who were randomly assigned to eat a diet high in fresh fruits and vegetables or one that was low in fresh fruits and vegetables contract viral illnesses -- 1. **Mechanistic:** how a diet high in fresh fruits and vegetables leads to a reduction in the number of viral illnesses --- # Questions to data science problems - Do you have appropriate data to answer your question? - Do you have information on confounding variables? - Was the data you're working with collected in a way that introduces bias? .question[ Suppose I want to estimate the average number of children in households in Edinburgh. I conduct a survey at an elementary school in Edinburgh and ask students at this elementary school how many children, including themselves, live in their house. Then, I take the average of the responses. Is this a biased or an unbiased estimate of the number of children in households in Edinburgh? If biased, will the value be an overestimate or underestimate? ] --- class: center, middle # Exploratory data analysis --- ## Checklist - Formulate your question - Read in your data - Check the dimensions - Look at the top and the bottom of your data - Validate with at least one external data source - Make a plot - Try the easy solution first --- ## Formulate your question - Consider scope: - Are air pollution levels higher on the east coast than on the west coast? - Are hourly ozone levels on average higher in New York City than they are in Los Angeles? - Do counties in the eastern United States have higher ozone levels than counties in the western United States? - Most importantly: "Do I have the right data to answer this question?" --- ## Read in your data - Place your data in a folder called `data` - Read it into R with `read_csv()` or friends (`read_delim()`, `read_excel()`, etc.) ```r library(readxl) fav_food <- read_excel("data/favourite-food.xlsx") ``` --- ## `clean_names()` If the variable names are malformatted, use `janitor::clean_names()` .small[ ```r fav_food ``` ``` ## # A tibble: 5 x 4 ## `Student ID` `Full Name` favourite.food mealPlan ## <dbl> <chr> <chr> <chr> ## 1 1 Sunil Huffmann Strawberry yoghurt Lunch only ## 2 2 Barclay Lynn French fries Lunch only ## 3 3 Jayendra Lyne Peaches Breakfast and lunch ## 4 4 Leon Rossini Anchovies Lunch only ## 5 5 Chidiegwu Dunkel Pizza Breakfast and lunch ``` ```r library(janitor) fav_food %>% clean_names() ``` ``` ## # A tibble: 5 x 4 ## student_id full_name favourite_food meal_plan ## <dbl> <chr> <chr> <chr> ## 1 1 Sunil Huffmann Strawberry yoghurt Lunch only ## 2 2 Barclay Lynn French fries Lunch only ## 3 3 Jayendra Lyne Peaches Breakfast and lunch ## 4 4 Leon Rossini Anchovies Lunch only ## 5 5 Chidiegwu Dunkel Pizza Breakfast and lunch ``` ] --- ## Case study: NYC Squirrels! - The Squirrel Census (https://www.thesquirrelcensus.com/) is a multimedia science, design, and storytelling project focusing on the Eastern gray (Sciurus carolinensis). They count squirrels and present their findings to the public. - This table contains squirrel data for each of the 3,023 sightings, including location coordinates, age, primary and secondary fur color, elevation, activities, communications, and interactions between squirrels and with humans. ```r #install_github("mine-cetinkaya-rundel/nycsquirrels18") library(nycsquirrels18) ``` --- ## Locate the codebook [mine-cetinkaya-rundel.github.io/nycsquirrels18/reference/squirrels.html](https://mine-cetinkaya-rundel.github.io/nycsquirrels18/reference/squirrels.html) <br><br> -- ## Check the dimensions ```r dim(squirrels) ``` ``` ## [1] 3023 38 ``` --- ## Look at the top... .small[ ```r squirrels %>% head() ``` ``` ## # A tibble: 6 x 38 ## long lat unique_squirrelβ¦ hectare NS EW shift date ## <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> ## 1 -74.0 40.8 13A-PM-1014-04 13A 13 A PM 1.01e7 ## 2 -74.0 40.8 15F-PM-1010-06 15F 15 F PM 1.01e7 ## 3 -74.0 40.8 19C-PM-1018-02 19C 19 C PM 1.02e7 ## 4 -74.0 40.8 21B-AM-1019-04 21B 21 B AM 1.02e7 ## 5 -74.0 40.8 23A-AM-1018-02 23A 23 A AM 1.02e7 ## 6 -74.0 40.8 38H-PM-1012-01 38H 38 H PM 1.01e7 ## # β¦ with 30 more variables: hectare_squirrel_number <dbl>, age <chr>, ## # primary_fur_color <chr>, highlight_fur_color <chr>, ## # combination_of_primary_and_highlight_color <chr>, color_notes <chr>, ## # location <chr>, above_ground_sighter_measurement <chr>, ## # specific_location <chr>, running <lgl>, chasing <lgl>, climbing <lgl>, ## # eating <lgl>, foraging <lgl>, other_activities <chr>, kuks <lgl>, ## # quaas <lgl>, moans <lgl>, tail_flags <lgl>, tail_twitches <lgl>, ## # approaches <lgl>, indifferent <lgl>, runs_from <lgl>, ## # other_interactions <chr>, zip_codes <dbl>, community_districts <dbl>, ## # borough_boundaries <dbl>, city_council_districts <dbl>, ## # police_precincts <dbl>, where <chr> ``` ] --- ## ...and the bottom .small[ ```r squirrels %>% tail() ``` ``` ## # A tibble: 6 x 38 ## long lat unique_squirrelβ¦ hectare NS EW shift date ## <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> ## 1 -74.0 40.8 6D-PM-1020-01 06D 06 D PM 1.02e7 ## 2 -74.0 40.8 21H-PM-1018-01 21H 21 H PM 1.02e7 ## 3 -74.0 40.8 31D-PM-1006-02 31D 31 D PM 1.01e7 ## 4 -74.0 40.8 37B-AM-1018-04 37B 37 B AM 1.02e7 ## 5 -74.0 40.8 21C-PM-1006-01 21C 21 C PM 1.01e7 ## 6 -74.0 40.8 7G-PM-1018-04 07G 07 G PM 1.02e7 ## # β¦ with 30 more variables: hectare_squirrel_number <dbl>, age <chr>, ## # primary_fur_color <chr>, highlight_fur_color <chr>, ## # combination_of_primary_and_highlight_color <chr>, color_notes <chr>, ## # location <chr>, above_ground_sighter_measurement <chr>, ## # specific_location <chr>, running <lgl>, chasing <lgl>, climbing <lgl>, ## # eating <lgl>, foraging <lgl>, other_activities <chr>, kuks <lgl>, ## # quaas <lgl>, moans <lgl>, tail_flags <lgl>, tail_twitches <lgl>, ## # approaches <lgl>, indifferent <lgl>, runs_from <lgl>, ## # other_interactions <chr>, zip_codes <dbl>, community_districts <dbl>, ## # borough_boundaries <dbl>, city_council_districts <dbl>, ## # police_precincts <dbl>, where <chr> ``` ] --- ## Validate with at least one external data source .pull-left[ ``` ## long lat ## 1 -73.97430 40.77705 ## 2 -73.96805 40.77734 ## 3 -73.96881 40.78120 ## 4 -73.96886 40.78378 ## 5 -73.96901 40.78534 ## 6 -73.95312 40.79411 ## 7 -73.97722 40.76884 ## 8 -73.95658 40.79925 ## 9 -73.97680 40.77436 ## 10 -73.97529 40.77393 ## 11 -73.97077 40.78004 ## 12 -73.96931 40.77110 ## 13 -73.97296 40.76526 ## 14 -73.97073 40.77239 ## 15 -73.97566 40.76725 ## 16 -73.96865 40.77058 ## 17 -73.97495 40.76678 ## 18 -73.96279 40.79394 ## 19 -73.96837 40.78385 ## 20 -73.97674 40.77351 ## 21 -73.97683 40.77178 ## 22 -73.96196 40.79228 ## 23 -73.96068 40.79173 ## 24 -73.97716 40.76930 ## 25 -73.97993 40.76761 ## 26 -73.95896 40.79517 ## 27 -73.97339 40.76563 ## 28 -73.96181 40.79441 ## 29 -73.95407 40.79552 ## 30 -73.96134 40.79099 ## 31 -73.96557 40.77637 ## 32 -73.97036 40.76785 ## 33 -73.96943 40.78430 ## 34 -73.96237 40.79218 ## 35 -73.97143 40.77695 ## 36 -73.97305 40.77009 ## 37 -73.96261 40.79051 ## 38 -73.96828 40.77245 ## 39 -73.96963 40.77041 ## 40 -73.95694 40.79354 ## 41 -73.96420 40.78908 ## 42 -73.96641 40.78880 ## 43 -73.95559 40.79695 ## 44 -73.95308 40.79599 ## 45 -73.95668 40.79904 ## 46 -73.97416 40.77525 ## 47 -73.95778 40.79938 ## 48 -73.96897 40.77120 ## 49 -73.98028 40.76928 ## 50 -73.97951 40.76914 ## 51 -73.96362 40.79275 ## 52 -73.95480 40.79455 ## 53 -73.95841 40.79123 ## 54 -73.97121 40.76987 ## 55 -73.95526 40.79581 ## 56 -73.95760 40.79361 ## 57 -73.95642 40.79575 ## 58 -73.95737 40.79798 ## 59 -73.97107 40.77788 ## 60 -73.95568 40.78885 ## 61 -73.96777 40.78470 ## 62 -73.97142 40.77750 ## 63 -73.96124 40.79448 ## 64 -73.95206 40.79751 ## 65 -73.97132 40.77141 ## 66 -73.98101 40.76839 ## 67 -73.97780 40.76686 ## 68 -73.96835 40.78158 ## 69 -73.97677 40.77419 ## 70 -73.97358 40.77863 ## 71 -73.96552 40.77649 ## 72 -73.96847 40.78401 ## 73 -73.97186 40.77659 ## 74 -73.97798 40.76982 ## 75 -73.95611 40.79491 ## 76 -73.97060 40.78321 ## 77 -73.95412 40.79318 ## 78 -73.97572 40.76682 ## 79 -73.97032 40.77007 ## 80 -73.97431 40.76879 ## 81 -73.96747 40.77688 ## 82 -73.95330 40.79201 ## 83 -73.97555 40.76809 ## 84 -73.97955 40.77095 ## 85 -73.97658 40.76928 ## 86 -73.96326 40.78174 ## 87 -73.96927 40.78305 ## 88 -73.95866 40.79635 ## 89 -73.97016 40.77787 ## 90 -73.97638 40.77061 ## 91 -73.97699 40.77076 ## 92 -73.96486 40.78045 ## 93 -73.97625 40.77502 ## 94 -73.95838 40.78522 ## 95 -73.95760 40.79937 ## 96 -73.96582 40.77982 ## 97 -73.97272 40.77953 ## 98 -73.96763 40.77880 ## 99 -73.95802 40.79387 ## 100 -73.97780 40.76763 ## 101 -73.95964 40.79719 ## 102 -73.95989 40.79734 ## 103 -73.97487 40.77608 ## 104 -73.95681 40.78990 ## 105 -73.95571 40.79745 ## 106 -73.95657 40.79002 ## 107 -73.97798 40.76752 ## 108 -73.96629 40.78433 ## 109 -73.95796 40.79186 ## 110 -73.96935 40.77704 ## 111 -73.97096 40.77669 ## 112 -73.95875 40.79825 ## 113 -73.95455 40.79839 ## 114 -73.95303 40.79209 ## 115 -73.96280 40.78842 ## 116 -73.95581 40.79689 ## 117 -73.95965 40.79404 ## 118 -73.96808 40.77625 ## 119 -73.96722 40.78239 ## 120 -73.97264 40.77527 ## 121 -73.96410 40.78993 ## 122 -73.96757 40.77668 ## 123 -73.96647 40.78790 ## 124 -73.95292 40.79607 ## 125 -73.97294 40.76724 ## 126 -73.96986 40.77847 ## 127 -73.97199 40.77205 ## 128 -73.96908 40.78180 ## 129 -73.95409 40.79420 ## 130 -73.96391 40.78091 ## 131 -73.97257 40.76688 ## 132 -73.97984 40.77008 ## 133 -73.97568 40.77385 ## 134 -73.96009 40.79161 ## 135 -73.96243 40.79132 ## 136 -73.97149 40.77406 ## 137 -73.95121 40.79479 ## 138 -73.96739 40.77868 ## 139 -73.96459 40.77804 ## 140 -73.95808 40.79580 ## 141 -73.96942 40.77444 ## 142 -73.97048 40.77623 ## 143 -73.97207 40.76594 ## 144 -73.97869 40.77208 ## 145 -73.96432 40.78208 ## 146 -73.97001 40.76967 ## 147 -73.95901 40.79071 ## 148 -73.97615 40.77096 ## 149 -73.97536 40.76832 ## 150 -73.95648 40.79931 ## 151 -73.95443 40.79878 ## 152 -73.96982 40.77020 ## 153 -73.95646 40.79652 ## 154 -73.95628 40.78806 ## 155 -73.97156 40.76959 ## 156 -73.95886 40.79453 ## 157 -73.97522 40.77037 ## 158 -73.95893 40.79134 ## 159 -73.96496 40.78144 ## 160 -73.97878 40.77075 ## 161 -73.97554 40.77336 ## 162 -73.97420 40.76915 ## 163 -73.96323 40.79147 ## 164 -73.95951 40.78991 ## 165 -73.96933 40.77571 ## 166 -73.97634 40.77412 ## 167 -73.96295 40.79368 ## 168 -73.95321 40.79347 ## 169 -73.95828 40.79895 ## 170 -73.96365 40.77758 ## 171 -73.96385 40.78269 ## 172 -73.97347 40.76943 ## 173 -73.95808 40.79055 ## 174 -73.97215 40.76901 ## 175 -73.95502 40.79513 ## 176 -73.97616 40.77559 ## 177 -73.96077 40.79396 ## 178 -73.97662 40.76619 ## 179 -73.96827 40.77119 ## 180 -73.96529 40.79044 ## 181 -73.97311 40.77866 ## 182 -73.97159 40.77462 ## 183 -73.95258 40.79549 ## 184 -73.95560 40.79775 ## 185 -73.97204 40.77635 ## 186 -73.96154 40.79241 ## 187 -73.98020 40.76971 ## 188 -73.97471 40.77285 ## 189 -73.95950 40.78401 ## 190 -73.97103 40.77665 ## 191 -73.97909 40.76761 ## 192 -73.97566 40.76566 ## 193 -73.96587 40.77540 ## 194 -73.96967 40.77919 ## 195 -73.96809 40.77693 ## 196 -73.95701 40.79960 ## 197 -73.95779 40.79609 ## 198 -73.96082 40.79543 ## 199 -73.97379 40.77266 ## 200 -73.96024 40.79078 ## 201 -73.96761 40.77728 ## 202 -73.95516 40.79086 ## 203 -73.97209 40.77608 ## 204 -73.96704 40.77844 ## 205 -73.97671 40.76758 ## 206 -73.96991 40.77445 ## 207 -73.96881 40.77797 ## 208 -73.95662 40.78834 ## 209 -73.96795 40.78371 ## 210 -73.97772 40.76686 ## 211 -73.95871 40.78503 ## 212 -73.96186 40.79213 ## 213 -73.96029 40.79106 ## 214 -73.97530 40.77525 ## 215 -73.97550 40.76782 ## 216 -73.95448 40.79066 ## 217 -73.96918 40.77093 ## 218 -73.96466 40.78277 ## 219 -73.96170 40.79541 ## 220 -73.96033 40.79002 ## 221 -73.95990 40.79333 ## 222 -73.95941 40.79179 ## 223 -73.97013 40.77680 ## 224 -73.96985 40.77699 ## 225 -73.97069 40.77236 ## 226 -73.95454 40.78986 ## 227 -73.96750 40.78726 ## 228 -73.97008 40.76806 ## 229 -73.97110 40.76898 ## 230 -73.95848 40.79342 ## 231 -73.97556 40.76680 ## 232 -73.96675 40.77606 ## 233 -73.96531 40.77600 ## 234 -73.96811 40.77318 ## 235 -73.97258 40.77011 ## 236 -73.97695 40.77401 ## 237 -73.96137 40.79434 ## 238 -73.95914 40.79608 ## 239 -73.97516 40.76710 ## 240 -73.95944 40.78412 ## 241 -73.95543 40.79538 ## 242 -73.98082 40.76829 ## 243 -73.95872 40.79921 ## 244 -73.96830 40.78054 ## 245 -73.97709 40.76947 ## 246 -73.97936 40.76793 ## 247 -73.97753 40.76879 ## 248 -73.97961 40.77021 ## 249 -73.95970 40.79558 ## 250 -73.97684 40.77285 ## 251 -73.96997 40.77744 ## 252 -73.95762 40.78739 ## 253 -73.98020 40.76784 ## 254 -73.97470 40.76644 ## 255 -73.97631 40.76888 ## 256 -73.97851 40.77183 ## 257 -73.97680 40.76764 ## 258 -73.98009 40.76844 ## 259 -73.97195 40.78105 ## 260 -73.95490 40.79448 ## 261 -73.95744 40.79610 ## 262 -73.97744 40.76981 ## 263 -73.95917 40.79376 ## 264 -73.96505 40.79082 ## 265 -73.97041 40.78240 ## 266 -73.96789 40.78197 ## 267 -73.95638 40.79060 ## 268 -73.97454 40.76918 ## 269 -73.96956 40.77056 ## 270 -73.96472 40.78061 ## 271 -73.96563 40.78340 ## 272 -73.95980 40.79735 ## 273 -73.97506 40.77234 ## 274 -73.97334 40.77824 ## 275 -73.95227 40.79340 ## 276 -73.96589 40.77565 ## 277 -73.96378 40.79047 ## 278 -73.96810 40.77233 ## 279 -73.97687 40.76848 ## 280 -73.95973 40.79071 ## 281 -73.96329 40.78900 ## 282 -73.96512 40.78124 ## 283 -73.95394 40.79800 ## 284 -73.96316 40.79265 ## 285 -73.96879 40.77886 ## 286 -73.97545 40.76986 ## 287 -73.98010 40.76775 ## 288 -73.97191 40.77588 ## 289 -73.96937 40.77687 ## 290 -73.96310 40.78858 ## 291 -73.96015 40.79437 ## 292 -73.96955 40.78211 ## 293 -73.97139 40.77501 ## 294 -73.95650 40.79374 ## 295 -73.97252 40.76642 ## 296 -73.95614 40.79827 ## 297 -73.96416 40.78166 ## 298 -73.97661 40.77178 ## 299 -73.97105 40.77236 ## 300 -73.95947 40.79845 ## 301 -73.97948 40.77096 ## 302 -73.95613 40.79408 ## 303 -73.97574 40.76926 ## 304 -73.98082 40.76894 ## 305 -73.96848 40.78588 ## 306 -73.97190 40.77035 ## 307 -73.97681 40.77082 ## 308 -73.95402 40.79201 ## 309 -73.97112 40.77243 ## 310 -73.97123 40.77145 ## 311 -73.97225 40.76671 ## 312 -73.96429 40.79170 ## 313 -73.96167 40.79297 ## 314 -73.96544 40.77989 ## 315 -73.95755 40.78892 ## 316 -73.96951 40.78027 ## 317 -73.95486 40.79467 ## 318 -73.97931 40.76782 ## 319 -73.96933 40.77616 ## 320 -73.97439 40.77454 ## 321 -73.96632 40.78251 ## 322 -73.97450 40.76526 ## 323 -73.96713 40.78389 ## 324 -73.97058 40.77585 ## 325 -73.95881 40.79129 ## 326 -73.95583 40.79521 ## 327 -73.95335 40.79330 ## 328 -73.95459 40.79000 ## 329 -73.95474 40.78965 ## 330 -73.95452 40.79438 ## 331 -73.97016 40.77021 ## 332 -73.96360 40.79008 ## 333 -73.95944 40.79178 ## 334 -73.95684 40.78803 ## 335 -73.96942 40.77559 ## 336 -73.95460 40.79566 ## 337 -73.96151 40.79452 ## 338 -73.96120 40.79160 ## 339 -73.96802 40.77439 ## 340 -73.97547 40.76814 ## 341 -73.97163 40.77965 ## 342 -73.96398 40.79095 ## 343 -73.97120 40.76725 ## 344 -73.95384 40.79806 ## 345 -73.95621 40.79670 ## 346 -73.97566 40.77330 ## 347 -73.95963 40.79066 ## 348 -73.95795 40.79613 ## 349 -73.97687 40.77352 ## 350 -73.96715 40.77513 ## 351 -73.96010 40.79523 ## 352 -73.97956 40.76890 ## 353 -73.97102 40.77249 ## 354 -73.95704 40.79485 ## 355 -73.95889 40.79100 ## 356 -73.97910 40.76859 ## 357 -73.95453 40.79102 ## 358 -73.97739 40.77086 ## 359 -73.95257 40.79773 ## 360 -73.96261 40.78872 ## 361 -73.95890 40.79117 ## 362 -73.96113 40.79187 ## 363 -73.97386 40.77372 ## 364 -73.98026 40.76818 ## 365 -73.97215 40.77162 ## 366 -73.95809 40.79484 ## 367 -73.97044 40.77990 ## 368 -73.97450 40.77333 ## 369 -73.97109 40.77637 ## 370 -73.97563 40.77065 ## 371 -73.96996 40.77659 ## 372 -73.95837 40.79139 ## 373 -73.97040 40.77293 ## 374 -73.96586 40.78259 ## 375 -73.96154 40.79074 ## 376 -73.96518 40.77856 ## 377 -73.97911 40.76770 ## 378 -73.97510 40.77434 ## 379 -73.96185 40.79464 ## 380 -73.97358 40.76871 ## 381 -73.95705 40.79910 ## 382 -73.97686 40.77419 ## 383 -73.97106 40.77943 ## 384 -73.96113 40.79501 ## 385 -73.95969 40.78938 ## 386 -73.97371 40.77137 ## 387 -73.97703 40.77324 ## 388 -73.95678 40.79601 ## 389 -73.95197 40.79382 ## 390 -73.97134 40.77507 ## 391 -73.97813 40.77250 ## 392 -73.95895 40.79348 ## 393 -73.96200 40.79433 ## 394 -73.97683 40.76672 ## 395 -73.97289 40.76994 ## 396 -73.96679 40.77932 ## 397 -73.95652 40.79856 ## 398 -73.96518 40.78028 ## 399 -73.95684 40.79757 ## 400 -73.97261 40.77102 ## 401 -73.97081 40.77740 ## 402 -73.96868 40.77667 ## 403 -73.95994 40.79575 ## 404 -73.97322 40.77488 ## 405 -73.97697 40.77275 ## 406 -73.95607 40.79479 ## 407 -73.97490 40.76788 ## 408 -73.97469 40.76594 ## 409 -73.97868 40.76976 ## 410 -73.95300 40.79229 ## 411 -73.97944 40.76839 ## 412 -73.97213 40.76669 ## 413 -73.96980 40.77823 ## 414 -73.97483 40.76689 ## 415 -73.96731 40.77782 ## 416 -73.97915 40.76727 ## 417 -73.96004 40.79065 ## 418 -73.95683 40.79741 ## 419 -73.96844 40.78243 ## 420 -73.96914 40.78054 ## 421 -73.97070 40.78215 ## 422 -73.98112 40.76844 ## 423 -73.97143 40.77459 ## 424 -73.97343 40.77819 ## 425 -73.97223 40.76905 ## 426 -73.95657 40.79360 ## 427 -73.97553 40.76683 ## 428 -73.97865 40.76797 ## 429 -73.97516 40.76701 ## 430 -73.96155 40.79316 ## 431 -73.96350 40.79282 ## 432 -73.97607 40.77033 ## 433 -73.97533 40.76874 ## 434 -73.95629 40.79386 ## 435 -73.96884 40.77030 ## 436 -73.96541 40.78069 ## 437 -73.97133 40.77719 ## 438 -73.96411 40.78866 ## 439 -73.95714 40.79559 ## 440 -73.97731 40.76965 ## 441 -73.97871 40.76937 ## 442 -73.96991 40.77430 ## 443 -73.96322 40.79213 ## 444 -73.96595 40.78337 ## 445 -73.96764 40.78699 ## 446 -73.97010 40.78390 ## 447 -73.97303 40.77025 ## 448 -73.97595 40.76929 ## 449 -73.95496 40.79854 ## 450 -73.96469 40.77600 ## 451 -73.97119 40.77850 ## 452 -73.97480 40.77681 ## 453 -73.97714 40.77236 ## 454 -73.95887 40.79058 ## 455 -73.95035 40.79765 ## 456 -73.95444 40.79421 ## 457 -73.98020 40.76802 ## 458 -73.97010 40.76889 ## 459 -73.96906 40.78237 ## 460 -73.97300 40.77980 ## 461 -73.96030 40.79535 ## 462 -73.97735 40.77288 ## 463 -73.97182 40.78020 ## 464 -73.97849 40.76692 ## 465 -73.95788 40.79556 ## 466 -73.95731 40.78753 ## 467 -73.97446 40.77285 ## 468 -73.97608 40.76742 ## 469 -73.95415 40.79822 ## 470 -73.96253 40.78121 ## 471 -73.95978 40.79062 ## 472 -73.96935 40.77450 ## 473 -73.97442 40.77496 ## 474 -73.95985 40.78345 ## 475 -73.98108 40.76879 ## 476 -73.95847 40.79801 ## 477 -73.97639 40.77055 ## 478 -73.97507 40.76656 ## 479 -73.95446 40.79129 ## 480 -73.96902 40.77604 ## 481 -73.97673 40.76612 ## 482 -73.95748 40.79500 ## 483 -73.97201 40.77016 ## 484 -73.96129 40.79208 ## 485 -73.97080 40.77267 ## 486 -73.95934 40.79182 ## 487 -73.96779 40.78266 ## 488 -73.97465 40.77505 ## 489 -73.95191 40.79731 ## 490 -73.96821 40.77799 ## 491 -73.96405 40.79074 ## 492 -73.96172 40.79228 ## 493 -73.96254 40.78113 ## 494 -73.97060 40.77604 ## 495 -73.96856 40.78071 ## 496 -73.96045 40.79380 ## 497 -73.96351 40.79128 ## 498 -73.97458 40.76545 ## 499 -73.96904 40.77672 ## 500 -73.96384 40.78290 ## [ reached 'max' / getOption("max.print") -- omitted 2523 rows ] ``` ] .pull-right[ ![](img/central-park-coords.png)<!-- --> ] --- ## Make a plot .small[ ```r ggplot(squirrels, aes(x = long, y = lat)) + geom_point(alpha = 0.2) ``` <img src="w5_d2-effective-communication_files/figure-html/unnamed-chunk-33-1.png" width="1500" /> ] -- **Hypothesis:** There will be a higher density of sightings on the perimeter than inside the park. --- ## Try the easy solution first .small[ ```r squirrels <- squirrels %>% separate(hectare, into = c("NS", "EW"), sep = 2, remove = FALSE) %>% mutate(where = if_else(NS %in% c("01", "42") | EW %in% c("A", "I"), "perimeter", "inside")) ggplot(squirrels, aes(x = long, y = lat, color = where)) + geom_point(alpha = 0.2) ``` <img src="w5_d2-effective-communication_files/figure-html/unnamed-chunk-34-1.png" width="1500" /> ] --- ## Then go deeper... <img src="w5_d2-effective-communication_files/figure-html/unnamed-chunk-35-1.png" width="1500" /> --- ```r hectare_counts <- squirrels %>% group_by(hectare) %>% summarise(n = n()) hectare_centroids <- squirrels %>% group_by(hectare) %>% summarise( centroid_x = mean(long), centroid_y = mean(lat) ) squirrels %>% left_join(hectare_counts, by = "hectare") %>% left_join(hectare_centroids, by = "hectare") %>% ggplot(aes(x = centroid_x, y = centroid_y, color = n)) + geom_hex() ``` --- ## The squirrel is staring at me! .midi[ ```r squirrels %>% filter(str_detect(other_interactions, "star")) %>% select(shift, age, other_interactions) ``` ``` ## # A tibble: 11 x 3 ## shift age other_interactions ## <chr> <chr> <chr> ## 1 AM Adult staring at us ## 2 PM Adult he took 2 steps then turned and stared at me ## 3 PM Adult stared ## 4 PM Adult stared ## 5 PM Adult stared ## 6 PM Adult stared & then went back up treeβthen ran to different tree ## 7 PM Adult stared at me ## 8 AM Adult approaches (saw me & came forward),runs from (startled by aβ¦ ## 9 AM Adult started climbing down to me ## 10 PM Adult stared at me ## 11 PM Adult stared ``` ] --- ## Communicating for your audience - Avoid: Jargon, uninterpreted results, lengthy output - Pay attention to: Organization, presentation, flow - Don't forget about: Code style, coding best practices, meaningful commits - Be open to: Suggestions, feedback, taking (calculated) risks