# One-Proportion Z-Test in R

# What is one-proportion Z-test?

**One proportion**

**Z-test**is used to compare an observed proportion to a theoretical one, when there are only two categories. This article describes the basics of

**one-proportion z-test**and provides practical examples using

**R software**.

For example, we have a population of mice containing half male and have female (p = 0.5 = 50%). Some of these mice (n = 160) have developed a spontaneous cancer, including 95 male and 65 female.

We want to know, whether the cancer affects more male than female?

In this setting:

- the number of successes (male with cancer) is 95
- The observed proportion (\(p_o\)) of male is 95/160
- The observed proportion (\(q\)) of female is \(1 - p_o\)
- The expected proportion (\(p_e\)) of male is 0.5 (50%)
- The number of observations (\(n\)) is 160

# Research questions and statistical hypotheses

Typical research questions are:

- whether the observed proportion of male (\(p_o\))
*is equal*to the expected proportion (\(p_e\))? - whether the observed proportion of male (\(p_o\))
*is less than*the expected proportion (\(p_e\))? - whether the observed proportion of male (\(p\))
*is greater than*the expected proportion (\(p_e\))?

In statistics, we can define the corresponding *null hypothesis* (\(H_0\)) as follow:

- \(H_0: p_o = p_e\)
- \(H_0: p_o \leq p_e\)
- \(H_0: p_o \geq p_e\)

The corresponding *alternative hypotheses* (\(H_a\)) are as follow:

- \(H_a: p_o \ne p_e\) (different)
- \(H_a: p_o > p_e\) (greater)
- \(H_a: p_o < p_e\) (less)

Note that:

- Hypotheses 1) are called
**two-tailed tests** - Hypotheses 2) and 3) are called
**one-tailed tests**

# Formula of the test statistic

The test statistic (also known as **z-test**) can be calculated as follow:

\[ z = \frac{p_o-p_e}{\sqrt{p_oq/n}} \]

where,

- \(p_o\) is the observed proportion
- \(q = 1-p_o\)
- \(p_e\) is the expected proportion
- \(n\) is the sample size

- if \(|z| < 1.96\), then the difference
**is not significant**at 5% - if \(|z| \geq 1.96\), then the difference
**is significant**at 5% - The significance level (p-value) corresponding to the
**z-statistic**can be read in the z-table. We’ll see how to compute it in R.

The confidence interval of \(p_o\) at 95% is defined as follow:

\[ p_o \pm 1.96\sqrt{\frac{p_oq}{n}} \]

Note that, the formula of z-statistic is valid only when sample size (\(n\)) is large enough. \(np_o\) and \(nq\) should be \(\geq\) 5. For example, if \(p_o = 0.1\), then \(n\) should be at least 50.

# Compute one proportion z-test in R

## R functions: binom.test() & prop.test()

The R functions **binom.test**() and **prop.test**() can be used to perform one-proportion test:

**binom.test**(): compute exact**binomial test**. Recommended when sample size is small**prop.test**(): can be used when sample size is large ( N > 30). It uses a normal approximation to binomial

The syntax of the two functions are exactly the same. The simplified format is as follow:

```
binom.test(x, n, p = 0.5, alternative = "two.sided")
prop.test(x, n, p = NULL, alternative = "two.sided",
correct = TRUE)
```

**x**: the number of of successes**n**: the total number of trials**p**: the probability to test against.**correct**: a logical indicating whether Yates’ continuity correction should be applied where possible.

Note that, by default, the function **prop.test()** used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, use the additional argument *correct = FALSE* in prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.)

## Compute one-proportion z-test

We want to know, whether the cancer affects more male than female?

We’ll use the function **prop.test**()

```
res <- prop.test(x = 95, n = 160, p = 0.5,
correct = FALSE)
# Printing the results
res
```

```
1-sample proportions test without continuity correction
data: 95 out of 160, null probability 0.5
X-squared = 5.625, df = 1, p-value = 0.01771
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5163169 0.6667870
sample estimates:
p
0.59375
```

The function returns:

- the value of Pearson’s chi-squared test statistic.
- a p-value
- a 95% confidence intervals
- an estimated probability of success (the proportion of male with cancer)

Note that:

- if you want to test whether the proportion of male with cancer is less than 0.5 (one-tailed test), type this:

```
prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
alternative = "less")
```

- Or, if you want to test whether the proportion of male with cancer is greater than 0.5 (one-tailed test), type this:

```
prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
alternative = "greater")
```

## Interpretation of the result

The **p-value** of the test is 0.01771, which is less than the significance level alpha = 0.05. We can conclude that the proportion of male with cancer is significantly different from 0.5 with a **p-value** = 0.01771.

## Access to the values returned by prop.test()

The result of **prop.test()** function is a list containing the following components:

**statistic**: the number of successes**parameter**: the number of trials**p.value**: the**p-value**of the test**conf.int**: a confidence interval for the probability of success.**estimate**: the estimated probability of success.

The format of the **R** code to use for getting these values is as follow:

```
# printing the p-value
res$p.value
```

`[1] 0.01770607`

```
# printing the mean
res$estimate
```

```
p
0.59375
```

```
# printing the confidence interval
res$conf.int
```

```
[1] 0.5163169 0.6667870
attr(,"conf.level")
[1] 0.95
```

# See also

# Infos

This analysis has been performed using **R software** (ver. 3.2.4).

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

## Recommended for You!

## Recommended for you

This section contains best data science and self-development resources to help you on your path.

### Coursera - Online Courses and Specialization

#### Data science

- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University

#### Popular Courses Launched in 2020

- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services

#### Trending Courses

- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts

### Books - Data Science

#### Our Books

- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

#### Others

- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet