evals <- read_csv("data/evals-mod.csv")
evals <- evals %>% mutate(bty_avg = rowMeans(select(., bty_f1lower:bty_m2upper)))What percent of the variability in evaluation scores is explained by the model?
full_model <- lm(score ~ cls_did_eval + cls_students + cls_perc_eval, data = evals)glance(full_model)$r.squared
## [1] 0.04463827glance(full_model)$adj.r.squared
## [1] 0.03839408library(GGally)evals %>% select(score, cls_did_eval, cls_students, cls_perc_eval) %>% ggpairs()

Suppose we definitely want to keep cls_did_eval in the model. Which of the
other two variables (cls_students or cls_perc_eval) is least likely to
be effective in increasing the model's predictive power?

full_model <- lm(score ~ cls_did_eval + cls_students + cls_perc_eval, data = evals)glance(full_model)$adj.r.squared
## [1] 0.03839408# Remove cls_did_evals1_stu_perc <- lm(score ~ cls_students + cls_perc_eval, data = evals)glance(s1_stu_perc)$adj.r.squared
## [1] 0.03970295# Remove cls_did_evals1_stu_perc <- lm(score ~ cls_students + cls_perc_eval, data = evals)glance(s1_stu_perc)$adj.r.squared
## [1] 0.03970295# Remove cls_studentss1_did_perc <- lm(score ~ cls_did_eval + cls_perc_eval, data = evals)glance(s1_did_perc)$adj.r.squared
## [1] 0.04038255# Remove cls_did_evals1_stu_perc <- lm(score ~ cls_students + cls_perc_eval, data = evals)glance(s1_stu_perc)$adj.r.squared
## [1] 0.03970295# Remove cls_studentss1_did_perc <- lm(score ~ cls_did_eval + cls_perc_eval, data = evals)glance(s1_did_perc)$adj.r.squared
## [1] 0.04038255# Remove cls_perc_evals1_did_stu <- lm(score ~ cls_did_eval + cls_students, data = evals)glance(s1_did_stu)$adj.r.squared
## [1] 0.02206412Given the following adjusted R-squared values, which model should be selected in step 1 of backwards selection?
# full modelglance(full_model)$adj.r.squared
## [1] 0.03839408# remove cls_did_evalglance(s1_stu_perc)$adj.r.squared
## [1] 0.03970295# remove cls_studentsglance(s1_did_perc)$adj.r.squared
## [1] 0.04038255# remove cls_perc_evalglance(s1_did_stu)$adj.r.squared
## [1] 0.02206412--
Removing cls_students (number of students in the class) resulted in the
highest increase in adjusted R-squared, so the model with only cls_did_eval
and cls_perc_eval (number and percentage of students who completed evaluations,
respectively) should be selected.
# Remove cls_did_evals2_perc <- lm(score ~ cls_perc_eval, data = evals)glance(s2_perc)$adj.r.squared
## [1] 0.0321918# Remove cls_did_evals2_perc <- lm(score ~ cls_perc_eval, data = evals)glance(s2_perc)$adj.r.squared
## [1] 0.0321918# Remove cls_perc_evals2_did <- lm(score ~ cls_did_eval, data = evals)glance(s2_did)$adj.r.squared
## [1] 0.001785817# Remove cls_did_evals2_perc <- lm(score ~ cls_perc_eval, data = evals)glance(s2_perc)$adj.r.squared
## [1] 0.0321918# Remove cls_perc_evals2_did <- lm(score ~ cls_did_eval, data = evals)glance(s2_did)$adj.r.squared
## [1] 0.001785817No further variables should be dropped since dropping any results in a decrease in adjusted R-squared. The model selected in the previous step should be the final model.
Given the following adjusted R-squared values, which model should be selected in step 2 of backwards selection?
glance(s1_did_perc)$adj.r.squared # result of step 1
## [1] 0.04038255glance(s2_perc)$adj.r.squared # remove cls_did_eval
## [1] 0.0321918glance(s2_did)$adj.r.squared # remove cls_perc_eval
## [1] 0.001785817What percent of the variability in evaluation scores is explained by the model?
full_model <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval + cls_did_eval + cls_students + cls_level + cls_profs + cls_credits + bty_avg, data = evals)glance(full_model)$r.squared
## [1] 0.1644867glance(full_model)$adj.r.squared
## [1] 0.1402959Given that the adjusted R-squared of the full model was 0.1403, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_profs 0.1421885## 2 Remove cls_level 0.1421425## 3 Remove cls_students 0.1417647## 4 Remove cls_did_eval 0.1412196## 5 Remove rank 0.1411639## 6 Remove language 0.1394560## 7 Remove age 0.1335567## 8 Remove cls_perc_eval 0.1327892## 9 Remove ethnicity 0.1315133## 10 Remove gender 0.1187097## 11 Remove bty_avg 0.1167521## 12 Remove cls_credits 0.1064995Given that the adjusted R-squared of the full model was 0.1403, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_profs 0.1421885## 2 Remove cls_level 0.1421425## 3 Remove cls_students 0.1417647## 4 Remove cls_did_eval 0.1412196## 5 Remove rank 0.1411639## 6 Remove language 0.1394560## 7 Remove age 0.1335567## 8 Remove cls_perc_eval 0.1327892## 9 Remove ethnicity 0.1315133## 10 Remove gender 0.1187097## 11 Remove bty_avg 0.1167521## 12 Remove cls_credits 0.1064995Remove cls_profs
Given that the adjusted R-squared of the model selected in Step 1 was 0.1422, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_level 0.1440303## 2 Remove cls_students 0.1436317## 3 Remove cls_did_eval 0.1430708## 4 Remove rank 0.1430366## 5 Remove language 0.1413504## 6 Remove age 0.1354409## 7 Remove cls_perc_eval 0.1346513## 8 Remove ethnicity 0.1329045## 9 Remove gender 0.1206375## 10 Remove bty_avg 0.1187028## 11 Remove cls_credits 0.1078684Given that the adjusted R-squared of the model selected in Step 1 was 0.1422, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_level 0.1440303## 2 Remove cls_students 0.1436317## 3 Remove cls_did_eval 0.1430708## 4 Remove rank 0.1430366## 5 Remove language 0.1413504## 6 Remove age 0.1354409## 7 Remove cls_perc_eval 0.1346513## 8 Remove ethnicity 0.1329045## 9 Remove gender 0.1206375## 10 Remove bty_avg 0.1187028## 11 Remove cls_credits 0.1078684Remove cls_level
Given that the adjusted R-squared of the model selected in Step 2 was 0.144, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_students 0.1453516## 2 Remove rank 0.1449154## 3 Remove cls_did_eval 0.1447586## 4 Remove language 0.1432499## 5 Remove age 0.1373534## 6 Remove cls_perc_eval 0.1365490## 7 Remove ethnicity 0.1344177## 8 Remove gender 0.1225830## 9 Remove bty_avg 0.1206257## 10 Remove cls_credits 0.1076569Given that the adjusted R-squared of the model selected in Step 2 was 0.144, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_students 0.1453516## 2 Remove rank 0.1449154## 3 Remove cls_did_eval 0.1447586## 4 Remove language 0.1432499## 5 Remove age 0.1373534## 6 Remove cls_perc_eval 0.1365490## 7 Remove ethnicity 0.1344177## 8 Remove gender 0.1225830## 9 Remove bty_avg 0.1206257## 10 Remove cls_credits 0.1076569Remove cls_students
Given that the adjusted R-squared of the model selected in Step 3 was 0.1454, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove rank 0.1460210## 2 Remove language 0.1447503## 3 Remove cls_did_eval 0.1438601## 4 Remove age 0.1386372## 5 Remove ethnicity 0.1351420## 6 Remove gender 0.1244633## 7 Remove bty_avg 0.1220691## 8 Remove cls_perc_eval 0.1216729## 9 Remove cls_credits 0.1091898Given that the adjusted R-squared of the model selected in Step 3 was 0.1454, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove rank 0.1460210## 2 Remove language 0.1447503## 3 Remove cls_did_eval 0.1438601## 4 Remove age 0.1386372## 5 Remove ethnicity 0.1351420## 6 Remove gender 0.1244633## 7 Remove bty_avg 0.1220691## 8 Remove cls_perc_eval 0.1216729## 9 Remove cls_credits 0.1091898Remove rank
Given that the adjusted R-squared of the model selected in Step 3 was 0.146, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_did_eval 0.1445941## 2 Remove language 0.1438720## 3 Remove age 0.1413323## 4 Remove ethnicity 0.1340933## 5 Remove gender 0.1245360## 6 Remove bty_avg 0.1218780## 7 Remove cls_perc_eval 0.1216266## 8 Remove cls_credits 0.1010899Given that the adjusted R-squared of the model selected in Step 3 was 0.146, which of the following models should be selected in the first step of backwards selection?
## remove adj_r_sq## 1 Remove cls_did_eval 0.1445941## 2 Remove language 0.1438720## 3 Remove age 0.1413323## 4 Remove ethnicity 0.1340933## 5 Remove gender 0.1245360## 6 Remove bty_avg 0.1218780## 7 Remove cls_perc_eval 0.1216266## 8 Remove cls_credits 0.1010899None, stick with model from Step 4.
## # A tibble: 9 x 2## term estimate## <chr> <dbl>## 1 (Intercept) 3.41 ## 2 ethnicitynot minority 0.202 ## 3 gendermale 0.177 ## 4 languagenon-english -0.151 ## 5 age -0.00487 ## 6 cls_perc_eval 0.00549 ## 7 cls_did_eval 0.000722## 8 cls_creditsone credit 0.524 ## 9 bty_avg 0.0615Model selection for models including interaction effects must follow the following two principles:
AIC=−2log(L)+2k
glance(full_model)$AIC
## [1] 695.7457step() function selects a model by AIC:
selected_model <- step(full_model, direction = "backward")
tidy(selected_model) %>% select(term, estimate)
## # A tibble: 8 x 2## term estimate## <chr> <dbl>## 1 (Intercept) 3.45 ## 2 ethnicitynot minority 0.205 ## 3 gendermale 0.185 ## 4 languagenon-english -0.161 ## 5 age -0.00501## 6 cls_perc_eval 0.00509## 7 cls_creditsone credit 0.515 ## 8 bty_avg 0.0650glance(full_model)$AIC
## [1] 695.7457glance(selected_model)$AIC
## [1] 687.5712Take a look at the variables in the full and the selected model. Can you guess why some of them may have been dropped? Remember: We like parsimonous models.
| variable | selected |
|---|---|
| rank | |
| ethnicity | x |
| gender | x |
| language | x |
| age | x |
| cls_perc_eval | x |
| cls_did_eval | |
| cls_students | |
| cls_level | |
| cls_profs | |
| cls_credits | x |
| bty_avg | x |
Interpret the slope of bty_avg and gender in the selected model.
## # A tibble: 8 x 2## term estimate## <chr> <dbl>## 1 (Intercept) 3.45 ## 2 ethnicitynot minority 0.205 ## 3 gendermale 0.185 ## 4 languagenon-english -0.161 ## 5 age -0.00501## 6 cls_perc_eval 0.00509## 7 cls_creditsone credit 0.515 ## 8 bty_avg 0.0650Interpret the slope of bty_avg and gender in the selected model.
## # A tibble: 8 x 2## term estimate## <chr> <dbl>## 1 (Intercept) 3.45 ## 2 ethnicitynot minority 0.205 ## 3 gendermale 0.185 ## 4 languagenon-english -0.161 ## 5 age -0.00501## 6 cls_perc_eval 0.00509## 7 cls_creditsone credit 0.515 ## 8 bty_avg 0.0650Interpret the slope of bty_avg and gender in the selected model.
## # A tibble: 8 x 2## term estimate## <chr> <dbl>## 1 (Intercept) 3.45 ## 2 ethnicitynot minority 0.205 ## 3 gendermale 0.185 ## 4 languagenon-english -0.161 ## 5 age -0.00501## 6 cls_perc_eval 0.00509## 7 cls_creditsone credit 0.515 ## 8 bty_avg 0.0650Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |