# Compare Multiple Sample Variances in R

This article describes **statistical tests** for comparing the **variances** of two or more samples. Equal variances across samples is called **homogeneity** of **variances**.

Some statistical tests, such as two independent samples T-test and ANOVA test, assume that variances are equal across groups. The **Bartlett’s test**, **Levene’s test** or **Fligner-Killeen’s test** can be used to verify that assumption.

# Statistical tests for comparing variances

There are many solutions to test for the equality (**homogeneity**) of variance across groups, including:

**F-test**: Compare the variances of two samples. The data must be normally distributed.**Bartlett’s test**: Compare the variances of k samples, where k can be more than two samples. The data must be normally distributed. The Levene test is an alternative to the Bartlett test that is less sensitive to departures from normality.**Levene’s test**: Compare the variances of k samples, where k can be more than two samples. It’s an alternative to the Bartlett’s test that is less sensitive to departures from normality.**Fligner-Killeen test**: a non-parametric test which is very robust against departures from normality.

**F-test**has been described in our previous article: F-test to compare equality of two variances. In the present article, we’ll describe the tests for comparing more than two variances.

# Statistical hypotheses

For all these tests (**Bartlett’s test**, **Levene’s test** or **Fligner-Killeen’s test**),

- the null hypothesis is that all populations variances are equal;
- the alternative hypothesis is that at least two of them differ.

# Import and check your data into R

To import your data, use the following R code:

```
# If .txt tab file, use this
my_data <- read.delim(file.choose())
# Or, if .csv file, use this
my_data <- read.csv(file.choose())
```

Here, we’ll use ToothGrowth and PlantGrowth data sets:

```
# Load the data
data(ToothGrowth)
data(PlantGrowth)
```

To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function **sample_n**()[in **dplyr** package]. First, install dplyr package if you don’t have it: **install.packages(“dplyr”)**.

Show 10 random rows:

```
set.seed(123)
# Show PlantGrowth
dplyr::sample_n(PlantGrowth, 10)
```

```
weight group
24 5.50 trt2
12 4.17 trt1
25 5.37 trt2
26 5.29 trt2
2 5.58 ctrl
14 3.59 trt1
22 5.12 trt2
13 4.41 trt1
11 4.81 trt1
21 6.31 trt2
```

```
# PlantGrowth data structure
str(PlantGrowth)
```

```
'data.frame': 30 obs. of 2 variables:
$ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
$ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
```

```
# Show ToothGrowth
dplyr::sample_n(ToothGrowth, 10)
```

```
len supp dose
28 21.5 VC 2.0
40 9.7 OJ 0.5
34 9.7 OJ 0.5
6 10.0 VC 0.5
51 25.5 OJ 2.0
14 17.3 VC 1.0
3 7.3 VC 0.5
18 14.5 VC 1.0
50 27.3 OJ 1.0
46 25.2 OJ 1.0
```

```
# ToothGrowth data structure
str(ToothGrowth)
```

```
'data.frame': 60 obs. of 3 variables:
$ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
$ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
$ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
```

Note that, R considers the column “dose” [in ToothGrowth data set] as a numeric vector. We want to convert it as a grouping variable (factor).

`ToothGrowth$dose <- as.factor(ToothGrowth$dose)`

We want to test the equality of variances between groups.

# Compute Bartlett’s test in R

**Bartlett’s test**is used for testing homogeneity of variances in k samples, where k can be more than two. It’s adapted for normally distributed data. The

**Levene test**, described in the next section, is a more robust alternative to the Bartlett test when the distributions of the data are non-normal.

The R function **bartlett.test**() can be used to compute Barlett’s test. The simplified format is as follow:

`bartlett.test(formula, data)`

**formula**: a formula of the form values ~ groups**data**: a matrix or data frame

The function returns a list containing the following component:

**statistic**: Bartlett’s K-squared test statistic**parameter**: the degrees of freedom of the approximate chi-squared distribution of the test statistic.**p.value**: the p-value of the test

To perform the test, we’ll use the *PlantGrowth* data set, which contains the weight of plants obtained under 3 treatment groups.

**Bartlett’s test with one independent variable**:

```
res <- bartlett.test(weight ~ group, data = PlantGrowth)
res
```

```
Bartlett test of homogeneity of variances
data: weight by group
Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371
```

From the output, it can be seen that the p-value of 0.2370968 is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance in plant growth is statistically significantly different for the three treatment groups.

**Bartlett’s test with multiple independent variables**: the**interaction**() function must be used to collapse multiple factors into a single variable containing all combinations of the factors.

`bartlett.test(len ~ interaction(supp,dose), data=ToothGrowth)`

```
Bartlett test of homogeneity of variances
data: len by interaction(supp, dose)
Bartlett's K-squared = 6.9273, df = 5, p-value = 0.2261
```

# Compute Levene’s test in R

As mentioned above, Levene’s test is an alternative to Bartlett’s test when the data is not normally distributed.

The function **leveneTest**() [in **car** package] can be used.

```
library(car)
# Levene's test with one independent variable
leveneTest(weight ~ group, data = PlantGrowth)
```

```
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 1.1192 0.3412
27
```

```
# Levene's test with multiple independent variables
leveneTest(len ~ supp*dose, data = ToothGrowth)
```

```
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 5 1.7086 0.1484
54
```

# Compute Fligner-Killeen test in R

The **Fligner-Killeen test** is one of the many tests for homogeneity of variances which is most robust against departures from normality.

The R function **fligner.test**() can be used to compute the test:

`fligner.test(weight ~ group, data = PlantGrowth)`

```
Fligner-Killeen test of homogeneity of variances
data: weight by group
Fligner-Killeen:med chi-squared = 2.3499, df = 2, p-value = 0.3088
```

# Infos

This analysis has been performed using **R software** (ver. 3.2.4).

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

## Recommended for You!

## Recommended for you

This section contains best data science and self-development resources to help you on your path.

### Coursera - Online Courses and Specialization

#### Data science

- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University

#### Popular Courses Launched in 2020

- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services

#### Trending Courses

- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts

### Books - Data Science

#### Our Books

- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

#### Others

- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet