Interaction Effect in Multiple Regression: Essentials
This chapter describes how to compute multiple linear regression with interaction effects.
Previously, we have described how to build a multiple linear regression model (Chapter @ref(linear-regression)) for predicting a continuous outcome variable (y) based on multiple predictor variables (x).
For example, to predict sales, based on advertising budgets spent on youtube and facebook, the model equation is sales = b0 + b1*youtube + b2*facebook
, where, b0 is the intercept; b1 and b2 are the regression coefficients associated respectively with the predictor variables youtube and facebook.
The above equation, also known as additive model, investigates only the main effects of predictors. It assumes that the relationship between a given predictor variable and the outcome is independent of the other predictor variables (James et al. 2014,P. Bruce and Bruce (2017)).
Considering our example, the additive model assumes that, the effect on sales of youtube advertising is independent of the effect of facebook advertising.
This assumption might not be true. For example, spending money on facebook advertising may increase the effectiveness of youtube advertising on sales. In marketing, this is known as a synergy effect, and in statistics it is referred to as an interaction effect (James et al. 2014).
In this chapter, you’ll learn:
- the equation of multiple linear regression with interaction
- R codes for computing the regression coefficients associated with the main effects and the interaction effects
- how to interpret the interaction effect
Contents:
Equation
The multiple linear regression equation, with interaction effects between two predictors (x1 and x2), can be written as follow:
y = b0 + b1*x1 + b2*x2 + b3*(x1*x2)
Considering our example, it becomes:
sales = b0 + b1*youtube + b2*facebook + b3*(youtube*facebook)
This can be also written as:
sales = b0 + (b1 + b3*facebook)*youtube + b2*facebook
or as:
sales = b0 + b1*youtube + (b2 +b3*youtube)*facebook
b3
can be interpreted as the increase in the effectiveness of youtube advertising for a one unit increase in facebook advertising (or vice-versa).
In the following sections, you will learn how to compute the regression coefficients in R.
Loading Required R packages
tidyverse
for easy data manipulation and visualizationcaret
for easy machine learning workflow
library(tidyverse)
library(caret)
Preparing the data
We’ll use the marketing
data set, introduced in the Chapter @ref(regression-analysis), for predicting sales units on the basis of the amount of money spent in the three advertising medias (youtube, facebook and newspaper)
We’ll randomly split the data into training set (80% for building a predictive model) and test set (20% for evaluating the model).
# Load the data
data("marketing", package = "datarium")
# Inspect the data
sample_n(marketing, 3)
## youtube facebook newspaper sales
## 58 163.4 23.0 19.9 15.8
## 157 112.7 52.2 60.6 18.4
## 81 91.7 32.0 26.8 14.2
# Split the data into training and test set
set.seed(123)
training.samples <- marketing$sales %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- marketing[training.samples, ]
test.data <- marketing[-training.samples, ]
Computation
Additive model
The standard linear regression model can be computed as follow:
# Build the model
model1 <- lm(sales ~ youtube + facebook, data = train.data)
# Summarize the model
summary(model1)
##
## Call:
## lm(formula = sales ~ youtube + facebook, data = train.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.481 -1.104 0.349 1.423 3.486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.43446 0.40877 8.4 2.3e-14 ***
## youtube 0.04558 0.00159 28.7 < 2e-16 ***
## facebook 0.18788 0.00920 20.4 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.11 on 159 degrees of freedom
## Multiple R-squared: 0.89, Adjusted R-squared: 0.889
## F-statistic: 644 on 2 and 159 DF, p-value: <2e-16
# Make predictions
predictions <- model1 %>% predict(test.data)
# Model performance
# (a) Prediction error, RMSE
RMSE(predictions, test.data$sales)
## [1] 1.58
# (b) R-square
R2(predictions, test.data$sales)
## [1] 0.938
Interaction effects
In R, you include interactions between variables using the *
operator:
# Build the model
# Use this:
model2 <- lm(sales ~ youtube + facebook + youtube:facebook,
data = marketing)
# Or simply, use this:
model2 <- lm(sales ~ youtube*facebook, data = train.data)
# Summarize the model
summary(model2)
##
## Call:
## lm(formula = sales ~ youtube * facebook, data = train.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.438 -0.482 0.231 0.748 1.860
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.90e+00 3.28e-01 24.06 <2e-16 ***
## youtube 1.95e-02 1.64e-03 11.90 <2e-16 ***
## facebook 2.96e-02 9.83e-03 3.01 0.003 **
## youtube:facebook 9.12e-04 4.84e-05 18.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.18 on 158 degrees of freedom
## Multiple R-squared: 0.966, Adjusted R-squared: 0.966
## F-statistic: 1.51e+03 on 3 and 158 DF, p-value: <2e-16
# Make predictions
predictions <- model2 %>% predict(test.data)
# Model performance
# (a) Prediction error, RMSE
RMSE(predictions, test.data$sales)
## [1] 0.963
# (b) R-square
R2(predictions, test.data$sales)
## [1] 0.982
Interpretation
It can be seen that all the coefficients, including the interaction term coefficient, are statistically significant, suggesting that there is an interaction relationship between the two predictor variables (youtube and facebook advertising).
Our model equation looks like this:
sales = 7.89 + 0.019*youtube + 0.029*facebook + 0.0009*youtube*facebook
We can interpret this as an increase in youtube advertising of 1000 dollars is associated with increased sales of (b1 + b3*facebook)*1000 = 19 + 0.9*facebook units
. And an increase in facebook advertising of 1000 dollars will be associated with an increase in sales of (b2 + b3*youtube)*1000 = 28 + 0.9*youtube units
.
Note that, sometimes, it is the case that the interaction term is significant but not the main effects. The hierarchical principle states that, if we include an interaction in a model, we should also include the main effects, even if the p-values associated with their coefficients are not significant (James et al. 2014).
Comparing the additive and the interaction models
The prediction error RMSE of the interaction model is 0.963, which is lower than the prediction error of the additive model (1.58).
Additionally, the R-square (R2) value of the interaction model is 98% compared to only 93% for the additive model.
These results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.
Discussion
This chapter describes how to compute multiple linear regression with interaction effects. Interaction terms should be included in the model if they are significantly.
References
Bruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists. O’Reilly Media.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.