Interaction Effect in Multiple Regression: Essentials

kassambara | 11/03/2018 | 107173 | Comments (7) | Regression Analysis

This chapter describes how to compute multiple linear regression with interaction effects.

Previously, we have described how to build a multiple linear regression model (Chapter @ref(linear-regression)) for predicting a continuous outcome variable (y) based on multiple predictor variables (x).

For example, to predict sales, based on advertising budgets spent on youtube and facebook, the model equation is sales = b0 + b1*youtube + b2*facebook, where, b0 is the intercept; b1 and b2 are the regression coefficients associated respectively with the predictor variables youtube and facebook.

The above equation, also known as additive model, investigates only the main effects of predictors. It assumes that the relationship between a given predictor variable and the outcome is independent of the other predictor variables (James et al. 2014,P. Bruce and Bruce (2017)).

Considering our example, the additive model assumes that, the effect on sales of youtube advertising is independent of the effect of facebook advertising.

This assumption might not be true. For example, spending money on facebook advertising may increase the effectiveness of youtube advertising on sales. In marketing, this is known as a synergy effect, and in statistics it is referred to as an interaction effect (James et al. 2014).

In this chapter, you’ll learn:

the equation of multiple linear regression with interaction
R codes for computing the regression coefficients associated with the main effects and the interaction effects
how to interpret the interaction effect

Contents:

Equation
Loading Required R packages
Preparing the data
Computation
- Additive model
- Interaction effects
Interpretation
Comparing the additive and the interaction models
Discussion
References

The Book:

Machine Learning Essentials: Practical Guide in R

Equation

The multiple linear regression equation, with interaction effects between two predictors (x1 and x2), can be written as follow:

y = b0 + b1*x1 + b2*x2 + b3*(x1*x2)

Considering our example, it becomes:

sales = b0 + b1*youtube + b2*facebook + b3*(youtube*facebook)

This can be also written as:

sales = b0 + (b1 + b3*facebook)*youtube + b2*facebook

or as:

sales = b0 + b1*youtube + (b2 +b3*youtube)*facebook

b3 can be interpreted as the increase in the effectiveness of youtube advertising for a one unit increase in facebook advertising (or vice-versa).

In the following sections, you will learn how to compute the regression coefficients in R.

Loading Required R packages

tidyverse for easy data manipulation and visualization
caret for easy machine learning workflow

library(tidyverse)
library(caret)

Preparing the data

We’ll use the marketing data set, introduced in the Chapter @ref(regression-analysis), for predicting sales units on the basis of the amount of money spent in the three advertising medias (youtube, facebook and newspaper)

We’ll randomly split the data into training set (80% for building a predictive model) and test set (20% for evaluating the model).

# Load the data
data("marketing", package = "datarium")
# Inspect the data
sample_n(marketing, 3)

##     youtube facebook newspaper sales
## 58    163.4     23.0      19.9  15.8
## 157   112.7     52.2      60.6  18.4
## 81     91.7     32.0      26.8  14.2

# Split the data into training and test set
set.seed(123)
training.samples <- marketing$sales %>%
  createDataPartition(p = 0.8, list = FALSE)
train.data  <- marketing[training.samples, ]
test.data <- marketing[-training.samples, ]

Computation

Additive model

The standard linear regression model can be computed as follow:

# Build the model
model1 <- lm(sales ~ youtube + facebook, data = train.data)
# Summarize the model
summary(model1)

## 
## Call:
## lm(formula = sales ~ youtube + facebook, data = train.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.481  -1.104   0.349   1.423   3.486 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.43446    0.40877     8.4  2.3e-14 ***
## youtube      0.04558    0.00159    28.7  < 2e-16 ***
## facebook     0.18788    0.00920    20.4  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.11 on 159 degrees of freedom
## Multiple R-squared:  0.89,   Adjusted R-squared:  0.889 
## F-statistic:  644 on 2 and 159 DF,  p-value: <2e-16

# Make predictions
predictions <- model1 %>% predict(test.data)
# Model performance
# (a) Prediction error, RMSE
RMSE(predictions, test.data$sales)

## [1] 1.58

# (b) R-square
R2(predictions, test.data$sales)

## [1] 0.938

Interaction effects

In R, you include interactions between variables using the * operator:

# Build the model
# Use this: 
model2 <- lm(sales ~ youtube + facebook + youtube:facebook,
             data = marketing)
# Or simply, use this: 
model2 <- lm(sales ~ youtube*facebook, data = train.data)
# Summarize the model
summary(model2)

## 
## Call:
## lm(formula = sales ~ youtube * facebook, data = train.data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.438 -0.482  0.231  0.748  1.860 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      7.90e+00   3.28e-01   24.06   <2e-16 ***
## youtube          1.95e-02   1.64e-03   11.90   <2e-16 ***
## facebook         2.96e-02   9.83e-03    3.01    0.003 ** 
## youtube:facebook 9.12e-04   4.84e-05   18.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.18 on 158 degrees of freedom
## Multiple R-squared:  0.966,  Adjusted R-squared:  0.966 
## F-statistic: 1.51e+03 on 3 and 158 DF,  p-value: <2e-16

# Make predictions
predictions <- model2 %>% predict(test.data)
# Model performance
# (a) Prediction error, RMSE
RMSE(predictions, test.data$sales)

## [1] 0.963

# (b) R-square
R2(predictions, test.data$sales)

## [1] 0.982

Interpretation

It can be seen that all the coefficients, including the interaction term coefficient, are statistically significant, suggesting that there is an interaction relationship between the two predictor variables (youtube and facebook advertising).

Our model equation looks like this:

sales = 7.89 + 0.019*youtube + 0.029*facebook + 0.0009*youtube*facebook

We can interpret this as an increase in youtube advertising of 1000 dollars is associated with increased sales of (b1 + b3*facebook)*1000 = 19 + 0.9*facebook units. And an increase in facebook advertising of 1000 dollars will be associated with an increase in sales of (b2 + b3*youtube)*1000 = 28 + 0.9*youtube units.

Note that, sometimes, it is the case that the interaction term is significant but not the main effects. The hierarchical principle states that, if we include an interaction in a model, we should also include the main effects, even if the p-values associated with their coefficients are not significant (James et al. 2014).

Comparing the additive and the interaction models

The prediction error RMSE of the interaction model is 0.963, which is lower than the prediction error of the additive model (1.58).

Additionally, the R-square (R2) value of the interaction model is 98% compared to only 93% for the additive model.

These results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.

Discussion

This chapter describes how to compute multiple linear regression with interaction effects. Interaction terms should be included in the model if they are significantly.

References

Bruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists. O’Reilly Media.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.

2 Notes

Enjoyed this article? Give us 5 stars (just above this text block)! Reader needs to be STHDA member for voting. I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Avez vous aimé cet article? Donnez nous 5 étoiles (juste au dessus de ce block)! Vous devez être membre pour voter. Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

Recommended for You!

Machine Learning Essentials: Practical Guide in R

Practical Guide to Cluster Analysis in R

Practical Guide to Principal Component Methods in R

R Graphics Essentials for Great Data Visualization

Network Analysis and Visualization in R

More books on R and data science

Recommended for you

This section contains the best data science and self-development resources to help you on your path.

Books - Data Science

Our Books

Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

Comments

You are not authorized to post a comment

Comment

sirmavid

Member

#925 01/03/2022 at 11h30

Thanks for the detailed information. ! https://get-vidmate.com/ https://instasave.onl/

Comment

jhonc2c1

Member

#910 03/31/2021 at 07h38

Did you fail to Recover your Yahoo account due to a lack of technical knowledge? don’t worry! here you will get complete knowledge for Yahoo password recovery. You just need to follow our simple troubleshooting steps, which we have mentioned in our blog. However, if you need instant assistance then you have the option to contact Yahoo email customer service.
https://emailassistance.over-blog.com/2021/03/forgot-yahoo-email-password-get-it-recovered-now.html

Comment

flistertwo

Member

#902 11/25/2020 at 17h25

Interaction effects occur when the effect of one variable depends on the value of another variable. Interaction effects are common in regression analysis, ANOVA, and designed experiments. ... Interaction effects indicate that a third variable influences the relationship between an independent and dependent variable. https://jiofilocalhtml.run https://forpc.onl

Comment

AidenJeffries

Member

#901 10/26/2020 at 15h49

A good lesson available for a middle school student. There are moments that are very difficult to understand from the above.

Comment

Desbois

Member

#893 08/26/2020 at 20h26

Awesome work! I like reading your tutorials, they are so practical and straight to the point. Thanks for your time and love to share your knowledge with others.
I was wondering if you have a tutorial on how to add higher order term to a model(regression)- treating the model as polynomial?
Thanks!

Comment

kassambara

Administrator

#487 05/19/2018 at 15h01

Thank you for your feedback Tomer Mann!

Comment

tomer mann

Member

#462 05/12/2018 at 14h03

a great to-the -point tutorial, as usual!

STAY UPDATED

Articles - Regression Analysis