Unpaired Two-Samples T-test in R
Introduction
Independent t-test or (unpaired t-test) is used to compare the means of two unrelated groups of samples. The aim of this article is to show you how to calculate independent samples t test with R software. The t-test formula is described here.
A simplified format of the R function to use is :
t.test(x, y)
x and y are two numeric vectors of data values to compare.
Example of data
As an example, we have a cohort of 20 individuals (10 women and 10 men). The question is to test whether women’s average weight is significantly different from men’s average weight? The number of individuals considered here is obviously low. This is just to illustrate the usage of two-sample t-test.
The data are shown below:
Group | Weight (kg) | |
---|---|---|
1 | Woman | 38.90 |
2 | Woman | 61.20 |
3 | Woman | 73.30 |
4 | Woman | 21.80 |
5 | Woman | 63.40 |
6 | Woman | 64.60 |
7 | Woman | 48.40 |
8 | Woman | 48.80 |
9 | Woman | 48.50 |
10 | Woman | 43.60 |
11 | Man | 67.80 |
12 | Man | 60.00 |
13 | Man | 63.40 |
14 | Man | 76.00 |
15 | Man | 89.40 |
16 | Man | 73.30 |
17 | Man | 67.30 |
18 | Man | 61.30 |
19 | Man | 62.40 |
20 | Man | 111.20 |
Question : Does the women’s average weight is significantly different from that of men?
To answer to this question an independent t-test can be used :
From the data table above, two different methods can be used to perform the t-test. It depends on the structure of your input data.
Calculate independent t-test using R
1) Method 1 - The data are saved in two differents numeric vectors (x and y) :
set.seed(1234)
# Women's weights
x<- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5, 43.6)
# Men's weights
y <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4, 111.2)
In this case unpaired t-test can be performed as follow :
res<-t.test(x,y)
res
Welch Two Sample t-test
data: x and y
t = -3.17, df = 17.92, p-value = 0.005319
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-36.517 -7.403
sample estimates:
mean of x mean of y
51.25 73.21
2) Method 2 - The data are saved in a data.frame :
d<-as.data.frame(list(
group=c(rep("Woman", 10), rep("Man", 10)),
weight=c(x, y)
))
head(d)
group weight
1 Woman 38.9
2 Woman 61.2
3 Woman 73.3
4 Woman 21.8
5 Woman 63.4
6 Woman 64.6
In this case, unpaired t-test can be calculated using the following R code :
#res<-t.test(d$weight ~ d$group)
res<-t.test(weight ~ group, data=d)
res
Welch Two Sample t-test
data: weight by group
t = 3.17, df = 17.92, p-value = 0.005319
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7.403 36.517
sample estimates:
mean in group Man mean in group Woman
73.21 51.25
As you can see, the two methods give the same results.
The p-value of the test is 0.0053, which is less than the significance level alpha = 0.05. We can then reject the null hypothesis and conclude that women’s average weight is significantly different from men’s average weight with a p-value = 0.0053.
Remember that independent t-test can be used only when the two sets of data follow a bivariate normal distributions with equal variances.
By default, the R t.test() function makes the assumption that the variances of the two groups of samples, being compared, are different. Therefore, Welch t-test is performed by default. Welch t-test is just an adaptation of t-test, and it is used when the two samples have possibly unequal variances.
The argument “var.equal=TRUE” can be used to indicate to the t.test() function that the two samples have equal variances. However you have to check this assumption before using it.
Thus, we’ll use F-test to test for differences in variances.
The following R code can be used :
var.test(x,y)
F test to compare two variances
data: x and y
F = 0.8718, num df = 9, denom df = 9, p-value = 0.8414
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.2165 3.5099
sample estimates:
ratio of variances
0.8718
The p-value of the F-test is = 0.8414. It’s greater than the significance level alpha = 0.05. In conclusion, there is no significant difference between the variances of the two sets of data. Therefore, we can use the classic t-test witch assume equality of the two variances.
The t-test can be performed as follow:
res<-t.test(x, y, var.equal=TRUE)
res
Two Sample t-test
data: x and y
t = -3.17, df = 18, p-value = 0.005296
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-36.512 -7.408
sample estimates:
mean of x mean of y
51.25 73.21
Note that the formula of Welch t-test is described here and the formula of Student t-test here
Get the objects returned by t.test function
As indicated here, we can easily get each of the objects returned by t.test() function:
# printing the p-value
res$p.value
[1] 0.005296
# printing the mean
res$estimate
mean of x mean of y
51.25 73.21
# printing the confidence interval
res$conf.int
[1] -36.512 -7.408
attr(,"conf.level")
[1] 0.95
Online independent t-test calculator
Infos
This analysis has been done using R (ver. 3.1.0).
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!
Recommended for You!
Recommended for you
This section contains the best data science and self-development resources to help you on your path.
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Click to follow us on Facebook :
Comment this article by clicking on "Discussion" button (top-right position of this page)