Learning goals:
+ Using inline code in interpretations of model estimates. + Fitting simple and multiple regression models. + Comparing models using adjusted \(R^2\).
This week we revisit the professor evaluations data we modeled in the previous lab. In the last lab we modeled evaluation scores using a single predictor at a time. However this time we use multiple predictors to model evaluation scores.
If you don’t remember the data, review the previous lab before continuing to the exercises.
As before, we will start with a data prep step where we calculate average beauty scores for each professor. Then we’ll fit a simple linear regression model predicting evaluation score from beauty score. And then for the majority of the workshop we will work on fitting, interpreting, and evaluating the fit of various multiple regression models predicting evaluation scores from beauty score along with other variables like gender and rank of the professor.
You can find your team assignment for the rest of the semester here.
Go to the course GitHub organization and locate your Lab 08 repo, which should be named lab-08-model-course-evals-again-YOUR_TEAMNAME
. Grab the URL of the repo, and clone it in RStudio Cloud.
Your email address is the address tied to your GitHub account and your name should be first and last name.
Run the following (but update it for your name and email!) in the Console to configure Git:
We will use the following packages in this analysis:
In this lab you will first download the data, then upload it to the data/
folder in your RStudio Cloud project.
evals-mod.csv
.evals-mod.csv
file.Then, you can load the data as usual using the following.
bty_avg
that is the average attractiveness score of the six students for each professor (bty_f1lower
through bty_m2upper
). Add this new variable to the evals
data frame. Do this in one pipe, using the rowMeans()
function within a mutate()
.If you have questions about what is happening in the code above, refer to last week’s lab.
We start with a linear model for predicting average professor evaluation score
based on average beauty rating (bty_avg
) only.
Remember: The linear model is of the form \(\hat{y} = b_0 + b_1 * x\), but you should replace \(y\) and \(x\) with the actual variables and \(b_0\) and \(b_1\) with the model estimates. Unfortunately math text doesn’t work well in GitHub documents, so instead of \(\hat{y}\), you can type out y-hat.
tidy()
to obtain the slope and intercept estimates from the tidy model output, and glance()
to obtain the \(R^2\). Your answer should use inline code. Below are some tips for extracting these values from the model output to use in your inline code.Let’s start with the intercept. You have two options for extracting the value of the intercept from the regression output. Remember, the output looks like this:
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.88 0.0761 51.0 1.56e-191
## 2 bty_avg 0.0666 0.0163 4.09 5.08e- 5
So the intercept is in the estimate
column, and it’s the first element in there.
## [1] 3.880333
We can also extract it using a dplyr pipeline:
# Option 2
tidy(m_bty) %>%
filter(term == "(Intercept)") %>% # filter for the intercept
select(estimate) %>% # select the estimate column
pull() # pull value out of data frame
## [1] 3.880333
Regardless of which option you use, you might consider storing the value in an object that you can easily refer to in your inline code, e.g.
You can hide the code for chunks like this where you are simply preparing objects for later use by adding echo = FALSE
in the code chunk options, that is, where you label your code chunk, separated by a comma, i.e.
{r label, echo = FALSE}
And then, you can use the m_bty_intercept
in inline code, e.g.
The intercept of the model is 'r m_bty_intercept'...
which will render to
The intercept of the model is 3.8803326...
There is still one small issue here though, the number of decimal places reported. It would be better to round the value in our narrative, for which we can use the round()
function. This function takes two arguments: the first one is the value (or vector of values) you want to round, and the second one is the number of digits.
The intercept of the model is 'r round(m_bty_intercept, 2)'...
which will render to
The intercept of the model is 3.88...
We should also use a graphical diagnostic, the residuals plot, to assess model fit. To do so we need to calculate the predicted evaluation scores for each professor in the dataset as well as the residuals for each observation.
We use the augment()
function for this:
Let’s take a look at what’s in this augmented dataset:
## [1] "score" "bty_avg" ".fitted" ".se.fit" ".resid"
## [6] ".hat" ".sigma" ".cooksd" ".std.resid"
First, we have the variables used to build the model: score
and bty_avg
. We also have the predicted values (.fitted
) and the residuals (.resid
). We’ll talk about a few of the other variables later in the course, and some others you will encounter in future courses.
Hint: You can use geom_hline()
with linetype = “dashed”
for this.
geom_jitter()
instead of geom_point()
, and overlay a dashed horizontal line at y = 0
. Then, comment on whether the linear model is appropriate for modeling the relationship between evaluation scores and beauty scores.Next, we fit a multiple linear regression model, predicting average professor evaluation score
based on average beauty rating (bty_avg
) and gender
:
Write the linear model.
Interpret the intercept and the slopes of bty_avg
and gender
. Use inline code in your answer.
What percent of the variability in score
is explained by the model m_bty_gen
. Once again, use inline code in your answer.
What is the equation of the line corresponding to just male professors?
For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score?
How does the relationship between beauty and evaluation score vary between male and female professors?
What is the definition of adjusted \(R^2\)? Explain how it is different than \(R^2\) using values from this model. Once again, use inline code in your answer.
How do the adjusted \(R^2\) values of m_bty_gen
and m_bty
compare? What does this tell us about how useful gender
is in explaining the variability in evaluation scores when we already have information on the beauty score of the professor. Once again, use inline code in your answer.
Compare the slopes of bty_avg
under the two models (m_bty
and m_bty_gen
). Has the addition of gender
to the model changed the parameter estimate (slope) for bty_avg
?
Create a new model called m_bty_gen_rank
predicting average professor evaluation score
based on average beauty rating (bty_avg
), gender
, and rank
.
Write the linear model and interpret the slopes and intercept in context of the data. Once again, use inline code in your answer.
How do the adjusted \(R^2\) values of m_bty_gen_rank
and m_bty_gen
compare? What does this tell us about how useful rank
is in explaining the variability in evaluation scores when we already have information on the beauty score and gender of the professor. And yet again, use inline code in your answer.