class: center, middle, inverse, title-slide # Meet the toolkit
⚒ ### Dr. Çetinkaya-Rundel --- layout: true <div class="my-footer"> <span> Dr. Mine Çetinkaya-Rundel - <a href="https://introds.org" target="_blank">introds.org </a> </span> </div> --- ## So far this week... - Hands on practice with R, RStudio, Git, GitHub - First look at visualising and summarising data in R - Why summary statistics alone are not sufficient for data exploration - Clarifications requested: - One language: R - Teamwork: labs, project (assessed) + in class activities (suggested) - Student hours: Tue 14:30-16:30, questions/conversation - And some of you still need to complete: - Syllabus review - Piazza sign up - Getting to know you survey (deadline extended to tonight by 8pm!) --- class: center, middle .question[ .large[ Any questions? ] ] --- class: center, middle # Reproducible data analysis --- ## Reproducibility checklist .question[ What does it mean for a data analysis to be "reproducible"? ] -- Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear **why** it was done? (e.g., how were parameter settings chosen?) Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? --- ## Toolkit - Scriptability `\(\rightarrow\)` R - Literate programming (code, narrative, output in one place) `\(\rightarrow\)` R Markdown - Version control `\(\rightarrow\)` Git / GitHub --- class: center, middle # Toolkit overview --- <img src="img/whole-game-01.png" width="100%" /> --- <img src="img/whole-game-02.png" width="100%" /> --- <img src="img/whole-game-03.png" width="100%" /> --- <img src="img/whole-game-04.png" width="100%" /> --- class: center, middle # The whole game --- class: center, middle ### DEMO: Edinburgh Airbnb prices --- class: center, middle # R --- ## R - R can be used as a calculator. ```r 8738787213 / 1653 ``` ``` ## [1] 5286623 ``` - The most commonly used data type in R is data frames, where each row represents an observation, and each column a variable. ```r mtcars ``` ``` ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 ``` --- ## R - We use the `$` operator to access a variable within a data frame. ```r mtcars$mpg ``` ``` ## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 ## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 ## [29] 15.8 19.7 15.0 21.4 ``` - Functions are (often) verbs, followed by what they will be applied to in parantheses. ```r do_this(to_this) do_that(to_this, to_that, with_those) ``` --- ## R - In R, the fundamental unit of shareable code is the package. - As of September 2019, there are over 14,000 packages available on the **C**omprehensive **R** **A**rchive **N**etwork (CRAN), the public clearing house for R packages. - This huge variety of packages is one reason why R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package. - Using R packages: - Install them from CRAN with `install.packages("x")` - Use them in R with `library(x)` - Get help on them with package `?x` and `help(package = "x")` --- class: center, middle # RStudio --- ## RStudio <img src="img/rstudio-anatomy.png" width="80%" /> --- class: center, middle # R Markdown --- ## R Markdown <img src="img/rmarkdown-anatomy.png" width="100%" /> --- ## R Markdown tips - Most importantly: environment of your R Markdown document is separate from that of the Console - Help: - [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) - Markdown Quick Reference (Help -> Markdown Quick Reference) --- ## How will we use R Markdown? - Every assignment / report / project / etc. is an R Markdown document - You'll always have a template R Markdown document to start with - The amount of scaffolding in the template will decrease over the semester --- class: center, middle # Getting help in R --- ## Reading help files <img src="img/r-help.png" width="50%" /> .tiny[ Source: http://socviz.co/appendix.html#a-little-more-about-r ] --- ## Asking good questions - Always include your code and the error - Create a minimum working example (we'll keep working on this throughout the semester) - Use code formatting