<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Sun, 17 May 2026 13:35:09 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Easy Guides]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/wiki/46" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles of the category: Comparing Proportions in R]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[Chi-Square Test of Independence in R]]></title>
			<link>https://www.sthda.com/english/wiki/chi-square-test-of-independence-in-r</link>
			<guid>https://www.sthda.com/english/wiki/chi-square-test-of-independence-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<br/>
<div class="block">
The <strong>chi-square test of independence</strong> is used to analyze the frequency table (i.e. <strong>contengency table</strong>) formed by two categorical variables. The <strong>chi-square test</strong> evaluates whether there is a significant association between the categories of the two variables. This article describes the basics of <strong>chi-square test</strong> and provides practical examples using <strong>R software</strong>.
</div>
<p><br/></p>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/chi-square-test-of-independence.png" alt="Chi-Square Test of Independence in R" /> <br/></p>


<div id="contents" class="section level2">
<h2>Contents</h2>
  <div id="TOC">
  <ul>
  <li><a href="#data-format-contingency-tables">Data format: Contingency tables</a></li>
  <li><a href="#graphical-display-of-contengency-tables">Graphical display of contengency tables</a></li>
  <li><a href="#chi-square-test-basics">Chi-square test basics</a></li>
  <li><a href="#compute-chi-square-test-in-r">Compute chi-square test in R</a></li>
  <li><a href="#nature-of-the-dependence-between-the-row-and-the-column-variables">Nature of the dependence between the row and the column variables</a></li>
  <li><a href="#access-to-the-values-returned-by-chisq.test-function">Access to the values returned by chisq.test() function</a></li>
  <li><a href="#see-also">See also</a></li>
  <li><a href="#infos">Infos</a></li>
  </ul>
  </div>

</div>
<div id="data-format-contingency-tables" class="section level2">
<h2>Data format: Contingency tables</h2>
<p>We’ll use <em>housetasks</em> data sets from STHDA: <a href="https://www.sthda.com/sthda/RDoc/data/housetasks.txt" class="uri">https://www.sthda.com/sthda/RDoc/data/housetasks.txt</a>.</p>
<pre class="r"><code># Import the data
file_path <- "https://www.sthda.com/sthda/RDoc/data/housetasks.txt"
housetasks <- read.delim(file_path, row.names = 1)
# head(housetasks)</code></pre>
<p>An image of the data is displayed below:</p>
<div class="figure">
<img src="https://www.sthda.com/english/sthda/RDoc/images/ca-housetasks.png" alt="Data format correspondence analysis" /><p class="caption">Data format correspondence analysis</p>
</div>
<br/>
<div class="block">
<p>The data is a contingency table containing 13 housetasks and their distribution in the couple:</p>
<ul>
<li>rows are the different tasks</li>
<li>values are the frequencies of the tasks done :
</li>
<li>by the <em>wife</em> only</li>
<li>alternatively</li>
<li>by the husband only</li>
<li>or jointly</li>
</ul>
</div>
<p><br/></p>
</div>
<div id="graphical-display-of-contengency-tables" class="section level2">
<h2>Graphical display of contengency tables</h2>
<p>Contingency table can be visualized using the function <strong>balloonplot()</strong> [in <em>gplots</em> package]. This function draws a graphical matrix where each cell contains a dot whose size reflects the relative magnitude of the corresponding component.</p>
<p><span class="notice">To execute the R code below, you should install the package <strong>gplots</strong>: <strong>install.packages(“gplots”)</strong>.</span></p>
<pre class="r"><code>library("gplots")
# 1. convert the data as a table
dt <- as.table(as.matrix(housetasks))
# 2. Graph
balloonplot(t(dt), main ="housetasks", xlab ="", ylab="",
            label = FALSE, show.margins = FALSE)</code></pre>
<div class="figure">
<img src="https://www.sthda.com/english/sthda/RDoc/figure/statistics/chi-square-test-of-independence-graph-contingency-table-data-mining-1.png" alt="Chi-Square Test of Independence in R" width="480" style="margin-bottom:10px;" />
<p class="caption">
Chi-Square Test of Independence in R
</p>
</div>
<p><span class="warning">Note that, row and column sums are printed by default in the bottom and right margins, respectively. These values can be hidden using the argument <em>show.margins = FALSE</em>.</span></p>
<p>It’s also possible to visualize a contingency table as a <em>mosaic plot</em>. This is done using the function <em>mosaicplot</em>() from the built-in R package <em>garphics</em>:</p>
<pre class="r"><code>library("graphics")
mosaicplot(dt, shade = TRUE, las=2,
           main = "housetasks")</code></pre>
<div class="figure">
<img src="https://www.sthda.com/english/sthda/RDoc/figure/statistics/chi-square-test-of-independence-contingency-table-graph-mosaic-1.png" alt="Chi-Square Test of Independence in R" width="480" style="margin-bottom:10px;" />
<p class="caption">
Chi-Square Test of Independence in R
</p>
</div>
<ul>
<li>The argument <strong>shade</strong> is used to color the graph</li>
<li>The argument <strong>las = 2</strong> produces vertical labels</li>
</ul>
<p><span class="warning">Note that the surface of an element of the mosaic reflects the relative magnitude of its value.</span></p>
<ul>
<li>Blue color indicates that the observed value is higher than the expected value if the data were random</li>
<li>Red color specifies that the observed value is lower than the expected value if the data were random</li>
</ul>
<p><span class="success">From this mosaic plot, it can be seen that the housetasks <em>Laundry, Main_meal, Dinner and breakfeast</em> (blue color) are mainly done by the wife in our example.</span></p>
<p>There is another package named <em>vcd</em>, which can be used to make a mosaic plot (function <em>mosaic</em>()) or an association plot (function <em>assoc</em>()).</span></p>
<pre class="r"><code># install.packages("vcd")
library("vcd")
# plot just a subset of the table
assoc(head(dt, 5), shade = TRUE, las=3)</code></pre>
<div class="figure">
<img src="https://www.sthda.com/english/sthda/RDoc/figure/statistics/chi-square-test-of-independence-contingency-table-graph-association-1.png" alt="Chi-Square Test of Independence in R" width="576" style="margin-bottom:10px;" />
<p class="caption">
Chi-Square Test of Independence in R
</p>
</div>
</div>
<div id="chi-square-test-basics" class="section level2">
<h2>Chi-square test basics</h2>
<p><strong>Chi-square test</strong> examines whether rows and columns of a contingency table are statistically significantly associated.</p>
<ul>
<li><strong>Null hypothesis (H0)</strong>: the row and the column variables of the contingency table are independent.</li>
<li><strong>Alternative hypothesis (H1)</strong>: row and column variables are dependent</li>
</ul>
<p>For each cell of the table, we have to calculate the expected value under null hypothesis.</p>
<p>For a given cell, the expected value is calculated as follow:</p>
<br/>
<div class="block">
<span class="math">\[
e = \frac{row.sum * col.sum}{grand.total}
\]</span>
</div>
<p><br/></p>
<p>The Chi-square statistic is calculated as follow:</p>
<br/>
<div class="block">
<p><span class="math">\[
\chi^2 = \sum{\frac{(o - e)^2}{e}}
\]</span></p>
<ul>
<li>o is the observed value</li>
<li>e is the expected value</li>
</ul>
</div>
<p><br/></p>
<p>This calculated Chi-square statistic is compared to the critical value (obtained from statistical tables) with <span class="math">\(df = (r - 1)(c - 1)\)</span> degrees of freedom and p = 0.05.</p>
<ul>
<li><em>r</em> is the number of rows in the contingency table</li>
<li><em>c</em> is the number of column in the contingency table</li>
</ul>
<p>If the calculated Chi-square statistic is greater than the critical value, then we must conclude that the row and the column variables are not independent of each other. This implies that they are significantly associated.</p>
<p><span class="warning">Note that, Chi-square test should only be applied when the expected frequency of any cell is at least 5.</span></p>
</div>
<div id="compute-chi-square-test-in-r" class="section level2">
<h2>Compute chi-square test in R</h2>
<p>Chi-square statistic can be easily computed using the function <strong>chisq.test()</strong> as follow:</p>
<pre class="r"><code>chisq <- chisq.test(housetasks)
chisq</code></pre>
<pre><code>
    Pearson&amp;#39;s Chi-squared test

data:  housetasks
X-squared = 1944.5, df = 36, p-value < 2.2e-16</code></pre>
<p><span class="success">In our example, the row and the column variables are statistically significantly associated (<em>p-value</em> = 0). </span></p>
<p>The observed and the expected counts can be extracted from the result of the test as follow:</p>
<pre class="r"><code># Observed counts
chisq$observed</code></pre>
<pre><code>           Wife Alternating Husband Jointly
Laundry     156          14       2       4
Main_meal   124          20       5       4
Dinner       77          11       7      13
Breakfeast   82          36      15       7
Tidying      53          11       1      57
Dishes       32          24       4      53
Shopping     33          23       9      55
Official     12          46      23      15
Driving      10          51      75       3
Finances     13          13      21      66
Insurance     8           1      53      77
Repairs       0           3     160       2
Holidays      0           1       6     153</code></pre>
<pre class="r"><code># Expected counts
round(chisq$expected,2)</code></pre>
<pre><code>            Wife Alternating Husband Jointly
Laundry    60.55       25.63   38.45   51.37
Main_meal  52.64       22.28   33.42   44.65
Dinner     37.16       15.73   23.59   31.52
Breakfeast 48.17       20.39   30.58   40.86
Tidying    41.97       17.77   26.65   35.61
Dishes     38.88       16.46   24.69   32.98
Shopping   41.28       17.48   26.22   35.02
Official   33.03       13.98   20.97   28.02
Driving    47.82       20.24   30.37   40.57
Finances   38.88       16.46   24.69   32.98
Insurance  47.82       20.24   30.37   40.57
Repairs    56.77       24.03   36.05   48.16
Holidays   55.05       23.30   34.95   46.70</code></pre>
</div>
<div id="nature-of-the-dependence-between-the-row-and-the-column-variables" class="section level2">
<h2>Nature of the dependence between the row and the column variables</h2>
<p><span class="success">As mentioned above the total Chi-square statistic is 1944.456196.</span></p>
<p>If you want to know the most contributing cells to the total Chi-square score, you just have to calculate the Chi-square statistic for each cell:</p>
<p><span class="math">\[
r = \frac{o - e}{\sqrt{e}}
\]</span></p>
<p><span class="success">The above formula returns the so-called <strong>Pearson residuals (r)</strong> for each cell (or standardized residuals)</span></p>
<p><span class="warning">Cells with the highest absolute standardized residuals contribute the most to the total Chi-square score.</span></p>
<p>Pearson residuals can be easily extracted from the output of the function <strong>chisq.test()</strong>:</p>
<pre class="r"><code>round(chisq$residuals, 3)</code></pre>
<pre><code>             Wife Alternating Husband Jointly
Laundry    12.266      -2.298  -5.878  -6.609
Main_meal   9.836      -0.484  -4.917  -6.084
Dinner      6.537      -1.192  -3.416  -3.299
Breakfeast  4.875       3.457  -2.818  -5.297
Tidying     1.702      -1.606  -4.969   3.585
Dishes     -1.103       1.859  -4.163   3.486
Shopping   -1.289       1.321  -3.362   3.376
Official   -3.659       8.563   0.443  -2.459
Driving    -5.469       6.836   8.100  -5.898
Finances   -4.150      -0.852  -0.742   5.750
Insurance  -5.758      -4.277   4.107   5.720
Repairs    -7.534      -4.290  20.646  -6.651
Holidays   -7.419      -4.620  -4.897  15.556</code></pre>
<p>Let’s visualize Pearson residuals using the package <strong>corrplot</strong>:</p>
<pre class="r"><code>library(corrplot)
corrplot(chisq$residuals, is.cor = FALSE)</code></pre>
<div class="figure">
<img src="https://www.sthda.com/english/sthda/RDoc/figure/statistics/chi-square-test-of-independence-residuals-chi-square-data-mining-1.png" alt="Chi-Square Test of Independence in R" width="432" style="margin-bottom:10px;" />
<p class="caption">
Chi-Square Test of Independence in R
</p>
</div>
<p><span class="notice">For a given cell, the size of the circle is proportional to the amount of the cell contribution.</span></p>
<p>The sign of the standardized residuals is also very important to interpret the association between rows and columns as explained in the block below.</p>
<br/>
<div class="block">
<ol style="list-style-type: decimal">
<li><strong>Positive residuals</strong> are in blue. Positive values in cells specify an attraction (positive association) between the corresponding row and column variables.</li>
</ol>
<ul>
<li>In the image above, it’s evident that there are an association between the column <strong>Wife</strong> and the rows <strong>Laundry, Main_meal</strong>.</li>
<li>There is a strong positive association between the column <strong>Husband</strong> and the row <strong>Repair</strong></li>
</ul>
<ol start="2" style="list-style-type: decimal">
<li><strong>Negative residuals</strong> are in red. This implies a repulsion (negative association) between the corresponding row and column variables. For example the column Wife are negatively associated (~ “not associated”) with the row <strong>Repairs</strong>. There is a repulsion between the column <em>Husband</em> and, the rows <strong>Laundry</strong> and <strong>Main_meal</strong></li>
</ol>
</div>
<p><br/></p>
<p>The contribution (in %) of a given cell to the total Chi-square score is calculated as follow:</p>
<br/>
<div class="block">
<span class="math">\[
contrib = \frac{r^2}{\chi^2}
\]</span>
</div>
<p><br/></p>
<ul>
<li><strong>r</strong> is the residual of the cell</li>
</ul>
<pre class="r"><code># Contibution in percentage (%)
contrib <- 100*chisq$residuals^2/chisq$statistic
round(contrib, 3)</code></pre>
<pre><code>            Wife Alternating Husband Jointly
Laundry    7.738       0.272   1.777   2.246
Main_meal  4.976       0.012   1.243   1.903
Dinner     2.197       0.073   0.600   0.560
Breakfeast 1.222       0.615   0.408   1.443
Tidying    0.149       0.133   1.270   0.661
Dishes     0.063       0.178   0.891   0.625
Shopping   0.085       0.090   0.581   0.586
Official   0.688       3.771   0.010   0.311
Driving    1.538       2.403   3.374   1.789
Finances   0.886       0.037   0.028   1.700
Insurance  1.705       0.941   0.868   1.683
Repairs    2.919       0.947  21.921   2.275
Holidays   2.831       1.098   1.233  12.445</code></pre>
<pre class="r"><code># Visualize the contribution
corrplot(contrib, is.cor = FALSE)</code></pre>
<div class="figure">
<img src="https://www.sthda.com/english/sthda/RDoc/figure/statistics/chi-square-test-of-independence-contribution-chi-square-data-mining-1.png" alt="Chi-Square Test of Independence in R" width="432" style="margin-bottom:10px;" />
<p class="caption">
Chi-Square Test of Independence in R
</p>
</div>
<p><span class="success">The relative contribution of each cell to the total Chi-square score give some indication of the nature of the dependency between rows and columns of the contingency table.</span></p>
<p>It can be seen that:</p>
<ol style="list-style-type: decimal">
<li>The column “Wife” is strongly associated with Laundry, Main_meal, Dinner</li>
<li>The column “Husband” is strongly associated with the row Repairs</li>
<li>The column jointly is frequently associated with the row Holidays</li>
</ol>
<div class="success">
<p>From the image above, it can be seen that the most contributing cells to the Chi-square are Wife/Laundry (7.74%), Wife/Main_meal (4.98%), Husband/Repairs (21.9%), Jointly/Holidays (12.44%).</p>
<p>These cells contribute about 47.06% to the total Chi-square score and thus account for most of the difference between expected and observed values.</p>
This confirms the earlier visual interpretation of the data. As stated earlier, visual interpretation may be complex when the contingency table is very large. In this case, the contribution of one cell to the total Chi-square score becomes a useful way of establishing the nature of dependency.
</div>
</div>
<div id="access-to-the-values-returned-by-chisq.test-function" class="section level2">
<h2>Access to the values returned by chisq.test() function</h2>
<p>The result of <strong>chisq.test()</strong> function is a list containing the following components:</p>
<br/>
<div class="block">
<ul>
<li><strong>statistic</strong>: the value the chi-squared test statistic.</li>
<li><strong>parameter</strong>: the degrees of freedom</li>
<li><strong>p.value</strong>: the <strong>p-value</strong> of the test</li>
<li><strong>observed</strong>: the observed count</li>
<li><strong>expected</strong>: the expected count</li>
</ul>
</div>
<p><br/></p>
<p>The format of the <strong>R</strong> code to use for getting these values is as follow:</p>
<pre class="r"><code># printing the p-value
chisq$p.value
# printing the mean
chisq$estimate</code></pre>
</div>
<div id="see-also" class="section level2">
<h2>See also</h2>
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/one-proportion-z-test-in-r">One Proportion Z-Test in R: Compare an Observed Proportion to an Expected One</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/two-proportions-z-test-in-r">Two Proportions Z-Test in R: Compare Two Observed Proportions</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-goodness-of-fit-test-in-r">Chi-Square Goodness of Fit Test in R: Compare Multiple Observed Proportions to Expected Probabilities</a></li>
</ul>
</div>
<div id="infos" class="section level2">
<h2>Infos</h2>
<p><span class="warning"> This analysis has been performed using <strong>R software</strong> (ver. 3.2.4). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('#rdoc h1').addClass('wiki_paragraph1');
    jQuery('#rdoc h2').addClass('wiki_paragraph2');
    jQuery('#rdoc h3').addClass('wiki_paragraph3');
    jQuery('#rdoc h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Fri, 25 Nov 2016 08:43:37 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Chi-square Goodness of Fit Test in R]]></title>
			<link>https://www.sthda.com/english/wiki/chi-square-goodness-of-fit-test-in-r</link>
			<guid>https://www.sthda.com/english/wiki/chi-square-goodness-of-fit-test-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#what-is-chi-square-goodness-of-fit-test">What is chi-square goodness of fit test?</a></li>
<li><a href="#example-data-and-questions">Example data and questions</a></li>
<li><a href="#statistical-hypotheses">Statistical hypotheses</a></li>
<li><a href="#r-function-chisq.test">R function: chisq.test()</a><ul>
<li><a href="#answer-to-q1-are-the-colors-equally-common">Answer to Q1: Are the colors equally common?</a></li>
<li><a href="#answer-to-q2-comparing-observed-to-expected-proportions">Answer to Q2 comparing observed to expected proportions</a></li>
<li><a href="#access-to-the-values-returned-by-chisq.test-function">Access to the values returned by chisq.test() function</a></li>
</ul></li>
<li><a href="#see-also">See also</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p><br/></p>
<div id="what-is-chi-square-goodness-of-fit-test" class="section level1">
<h1>What is chi-square goodness of fit test?</h1>
<br/>
<div class="block">
The <strong>chi-square</strong> <strong>goodness of fit</strong> test is used to compare the observed distribution to an expected distribution, in a situation where we have two or more categories in a discrete data. In other words, it compares multiple observed proportions to expected probabilities.
</div>
<p><br/></p>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/chi-square-goodness-of-fit-test.png" alt="Chi-square Goodness of Fit test in R" /> <br/></p>
</div>
<div id="example-data-and-questions" class="section level1">
<h1>Example data and questions</h1>
<p>For example, we collected wild tulips and found that 81 were red, 50 were yellow and 27 were white.</p>
<ol style="list-style-type: decimal">
<li><strong>Question 1</strong>:</li>
</ol>
<p><span class="question">Are these colors equally common?</span></p>
<p>If these colors were equally distributed, the expected proportion would be 1/3 for each of the color.</p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Question 2</strong>:</li>
</ol>
<p>Suppose that, in the region where you collected the data, the ratio of red, yellow and white tulip is 3:2:1 (3+2+1 = 6). This means that the expected proportion is:</p>
<ul>
<li>3/6 (= 1/2) for red</li>
<li>2/6 ( = 1/3) for yellow</li>
<li>1/6 for white</li>
</ul>
<p><span class="question">We want to know, if there is any significant difference between the observed proportions and the expected proportions.</span></p>
</div>
<div id="statistical-hypotheses" class="section level1">
<h1>Statistical hypotheses</h1>
<ul>
<li><em>Null hypothesis</em> (<span class="math">\(H_0\)</span>): There is no significant difference between the observed and the expected value.</li>
<li><em>Alternative hypothesis</em> (<span class="math">\(H_a\)</span>): There is a significant difference between the observed and the expected value.</li>
</ul>
</div>
<div id="r-function-chisq.test" class="section level1">
<h1>R function: chisq.test()</h1>
<p>The R function <strong>chisq.test</strong>() can be used as follow:</p>
<pre class="r"><code>chisq.test(x, p)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>x</strong>: a numeric vector</li>
<li><strong>p</strong>: a vector of probabilities of the same length of x.</li>
</ul>
</div>
<p><br/></p>
<div id="answer-to-q1-are-the-colors-equally-common" class="section level2">
<h2>Answer to Q1: Are the colors equally common?</h2>
<pre class="r"><code>tulip <- c(81, 50, 27)
res <- chisq.test(tulip, p = c(1/3, 1/3, 1/3))
res</code></pre>
<pre><code>
    Chi-squared test for given probabilities

data:  tulip
X-squared = 27.886, df = 2, p-value = 8.803e-07</code></pre>
<br/>
<div class="block">
The function returns: the value of chi-square test statistic (“X-squared”) and a a p-value.
</div>
<p><br/></p>
<p><span class="success"> The <strong>p-value</strong> of the test is 8.80310^{-7}, which is less than the significance level alpha = 0.05. We can conclude that the colors are significantly not commonly distributed with a <strong>p-value</strong> = 8.80310^{-7}. </span></p>
<p><span class="error">Note that, the chi-square test should be used only when all calculated expected values are greater than 5.</span></p>
<pre class="r"><code># Access to the expected values
res$expected</code></pre>
<pre><code>[1] 52.66667 52.66667 52.66667</code></pre>
</div>
<div id="answer-to-q2-comparing-observed-to-expected-proportions" class="section level2">
<h2>Answer to Q2 comparing observed to expected proportions</h2>
<pre class="r"><code>tulip <- c(81, 50, 27)
res <- chisq.test(tulip, p = c(1/2, 1/3, 1/6))
res</code></pre>
<pre><code>
    Chi-squared test for given probabilities

data:  tulip
X-squared = 0.20253, df = 2, p-value = 0.9037</code></pre>
<p><span class="success"> The <strong>p-value</strong> of the test is 0.9037, which is greater than the significance level alpha = 0.05. We can conclude that the observed proportions are not significantly different from the expected proportions.</span></p>
</div>
<div id="access-to-the-values-returned-by-chisq.test-function" class="section level2">
<h2>Access to the values returned by chisq.test() function</h2>
<p>The result of <strong>chisq.test()</strong> function is a list containing the following components:</p>
<br/>
<div class="block">
<ul>
<li><strong>statistic</strong>: the value the chi-squared test statistic.</li>
<li><strong>parameter</strong>: the degrees of freedom</li>
<li><strong>p.value</strong>: the <strong>p-value</strong> of the test</li>
<li><strong>observed</strong>: the observed count</li>
<li><strong>expected</strong>: the expected count</li>
</ul>
</div>
<p><br/></p>
<p>The format of the <strong>R</strong> code to use for getting these values is as follow:</p>
<pre class="r"><code># printing the p-value
res$p.value</code></pre>
<pre><code>[1] 0.9036928</code></pre>
<pre class="r"><code># printing the mean
res$estimate</code></pre>
<pre><code>NULL</code></pre>
</div>
</div>
<div id="see-also" class="section level1">
<h1>See also</h1>
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/one-proportion-z-test-in-r">One Proportion Z-Test in R: Compare an Observed Proportion to an Expected One</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/two-proportions-z-test-in-r">Two Proportions Z-Test in R: Compare Two Observed Proportions</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-test-of-independence-in-r.Rmd">Chi-Square Test of Independence in R: Evaluate The Association Between Two Categorical Variables</a></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using <strong>R software</strong> (ver. 3.2.4). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('#rdoc h1').addClass('wiki_paragraph1');
    jQuery('#rdoc h2').addClass('wiki_paragraph2');
    jQuery('#rdoc h3').addClass('wiki_paragraph3');
    jQuery('#rdoc h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Tue, 11 Oct 2016 10:45:30 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Comparing Proportions in R]]></title>
			<link>https://www.sthda.com/english/wiki/comparing-proportions-in-r</link>
			<guid>https://www.sthda.com/english/wiki/comparing-proportions-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<p><br/></p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong>. Additionally, we described how to compute <a href="https://www.sthda.com/english/english/wiki/descriptive-statistics-and-graphics">descriptive or summary statistics</a>, <a href="https://www.sthda.com/english/english/wiki/correlation-analyses-in-r">correlation analysis</a>, as well as, how to <a href="https://www.sthda.com/english/english/wiki/comparing-means-in-r">compare sample means</a> and <a href="https://www.sthda.com/english/english/wiki/comparing-variances-in-r">variances</a> using R software.</p>
<br/>
<div class="block">
This chapter contains articles describing <strong>statistical tests</strong> to use for <strong>comparing proportions</strong>.
</div>
<p><br/></p>
<div id="how-this-chapter-is-organized" class="section level1">
<h1><span class="header-section-number">1</span> How this chapter is organized?</h1>
<br/>
<div class="block">
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/one-proportion-z-test-in-r">One-Proportion Z-Test in R: Compare an Observed Proportion to an Expected One</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/two-proportions-z-test-in-r">Two Proportions Z-Test in R: Compare Two Observed Proportions</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-goodness-of-fit-test-in-r">Chi-Square Goodness of Fit Test in R: Compare Multiple Observed Proportions to Expected Probabilities</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-test-of-independence-in-r">Chi-Square Test of Independence in R: Evaluate The Association Between Two Categorical Variables</a></li>
</ul>
</div>
<p><br/></p>
<p>
 <img src="https://www.sthda.com/english/sthda/RDoc/images/comparing-proportions.png" alt="Comapring proportions in R" /> <br/></p>
<hr/>
</div>
<div id="one-proportion-z-test" class="section level1">
<h1><span class="header-section-number">2</span> One-proportion z-Test</h1>
<br/>
<div class="block">
Compare an observed proportion to an expected one.
</div>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/one-proportion-z-test.png" alt="One-Proportion Z-Test in R" /> <br/></p>
<p><span class="success">Read more: —> <a href="https://www.sthda.com/english/english/wiki/one-proportion-z-test-in-r">One-Proportion Z-Test in R</a>.</span></p>
</div>
<div id="two-proportions-z-test" class="section level1">
<h1><span class="header-section-number">3</span> Two-proportions z-Test</h1>
<br/>
<div class="block">
Compare two observed proportions.
</div>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/two-proportions-z-test.png" alt="Two Proportions Z-Test" /> <br/></p>
<p><span class="success">Read more: —> <a href="https://www.sthda.com/english/english/wiki/two-proportions-z-test-in-r">Two Proportions Z-Test</a>.</span></p>
</div>
<div id="chi-square-goodness-of-fit-test-in-r" class="section level1">
<h1><span class="header-section-number">4</span> Chi-square goodness of fit test in R</h1>
<br/>
<div class="block">
Compare multiple observed proportions to expected probabilities.
</div>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/chi-square-goodness-of-fit-test.png" alt="Two Proportions Z-Test" /> <br/></p>
<p><span class="success">Read more: —> <a href="https://www.sthda.com/english/english/wiki/chi-square-goodness-of-fit-test-in-r">Chi-square goodness of fit test in R</a>.</span></p>
</div>
<div id="chi-square-test-of-independence-in-r" class="section level1">
<h1><span class="header-section-number">5</span> Chi-Square test of independence in R</h1>
<br/>
<div class="block">
Evaluate the association between two categorical variables.
</div>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/chi-square-test-of-independence.png" alt="Chi-Square test of independence in R" /> <br/></p>
<p><span class="success">Read more: —> <a href="https://www.sthda.com/english/english/wiki/chi-square-test-of-independence-in-r">Chi-Square Test of Independence in R</a>.</span></p>
</div>
<div id="see-also" class="section level1">
<h1><span class="header-section-number">6</span> See also</h1>
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/import-and-export-data-using-r">Import and Export Data using R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/data-manipulation-in-r">Data Manipulation in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/data-visualization">Data visualization</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/descriptive-statistics-and-graphics">Descriptive Statistics and Graphics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/correlation-analyses-in-r">Correlation Analyses in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/comparing-means-in-r">Comparing Means in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/comparing-variances-in-r">Comparing Variances in R</a></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1><span class="header-section-number">7</span> Infos</h1>
<p><span class="warning"> This analysis has been performed using <strong>R statistical software</strong> (ver. 3.2.4). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('#rdoc h1').addClass('wiki_paragraph1');
    jQuery('#rdoc h2').addClass('wiki_paragraph2');
    jQuery('#rdoc h3').addClass('wiki_paragraph3');
    jQuery('#rdoc h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Sat, 08 Oct 2016 09:23:19 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Two-Proportions Z-Test in R]]></title>
			<link>https://www.sthda.com/english/wiki/two-proportions-z-test-in-r</link>
			<guid>https://www.sthda.com/english/wiki/two-proportions-z-test-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#what-is-two-proportions-z-test">What is two-proportions z-test?</a></li>
<li><a href="#research-questions-and-statistical-hypotheses">Research questions and statistical hypotheses</a></li>
<li><a href="#formula-of-the-test-statistic">Formula of the test statistic</a><ul>
<li><a href="#case-of-large-sample-sizes">Case of large sample sizes</a></li>
<li><a href="#case-of-small-sample-sizes">Case of small sample sizes</a></li>
</ul></li>
<li><a href="#compute-two-proportions-z-test-in-r">Compute two-proportions z-test in R</a><ul>
<li><a href="#r-functions-prop.test">R functions: prop.test()</a></li>
<li><a href="#compute-two-proportions-z-test">Compute two-proportions z-test</a></li>
<li><a href="#interpretation-of-the-result">Interpretation of the result</a></li>
<li><a href="#access-to-the-values-returned-by-prop.test-function">Access to the values returned by prop.test() function</a></li>
</ul></li>
<li><a href="#see-also">See also</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p><br/></p>
<div id="what-is-two-proportions-z-test" class="section level1">
<h1>What is two-proportions z-test?</h1>
<br/>
<div class="block">
The <strong>two-proportions</strong> <strong>z-test</strong> is used to compare two observed proportions. This article describes the basics of <strong>two-proportions</strong> *z-test<strong> and provides pratical examples using </strong>R sfoftware**.
</div>
<p><br/></p>
<p>For example, we have two groups of individuals:</p>
<ul>
<li>Group A with lung cancer: n = 500</li>
<li>Group B, healthy individuals: n = 500</li>
</ul>
<p>The number of smokers in each group is as follow:</p>
<ul>
<li>Group A with lung cancer: n = 500, 490 smokers, <span class="math">\(p_A = 490/500 = 98%\)</span></li>
<li>Group B, healthy individuals: n = 500, 400 smokers, <span class="math">\(p_B = 400/500 = 80%\)</span></li>
</ul>
<p>In this setting:</p>
<ul>
<li>The overall proportion of smokers is <span class="math">\(p = frac{(490 + 400)}{500 + 500} = 89%\)</span></li>
<li>The overall proportion of non-smokers is <span class="math">\(q = 1-p = 11%\)</span></li>
</ul>
<p><span class="question">We want to know, whether the proportions of smokers are the same in the two groups of individuals?</span></p>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/two-proportions-z-test.png" alt="Two Proportions Z-Test in R" /> <br/></p>
</div>
<div id="research-questions-and-statistical-hypotheses" class="section level1">
<h1>Research questions and statistical hypotheses</h1>
<p>Typical research questions are:</p>
<br/>
<div class="question">
<ol style="list-style-type: decimal">
<li>whether the observed proportion of smokers in group A (<span class="math">\(p_A\)</span>) <em>is equal</em> to the observed proportion of smokers in group (<span class="math">\(p_B\)</span>)?</li>
<li>whether the observed proportion of smokers in group A (<span class="math">\(p_A\)</span>) <em>is less than</em> the observed proportion of smokers in group (<span class="math">\(p_B\)</span>)?</li>
<li>whether the observed proportion of smokers in group A (<span class="math">\(p_A\)</span>) <em>is greater than</em> the observed proportion of smokers in group (<span class="math">\(p_B\)</span>)?</li>
</ol>
</div>
<p><br/></p>
<p>In statistics, we can define the corresponding <em>null hypothesis</em> (<span class="math">\(H_0\)</span>) as follow:</p>
<ol style="list-style-type: decimal">
<li><span class="math">\(H_0: p_A = p_B\)</span></li>
<li><span class="math">\(H_0: p_A \leq p_B\)</span></li>
<li><span class="math">\(H_0: p_A \geq p_B\)</span></li>
</ol>
<p>The corresponding <em>alternative hypotheses</em> (<span class="math">\(H_a\)</span>) are as follow:</p>
<ol style="list-style-type: decimal">
<li><span class="math">\(H_a: p_A \ne p_B\)</span> (different)</li>
<li><span class="math">\(H_a: p_A > p_B\)</span> (greater)</li>
<li><span class="math">\(H_a: p_A < p_B\)</span> (less)</li>
</ol>
<div class="notice">
<p>Note that:</p>
<ul>
<li>Hypotheses 1) are called <strong>two-tailed tests</strong></li>
<li>Hypotheses 2) and 3) are called <strong>one-tailed tests</strong></li>
</ul>
</div>
</div>
<div id="formula-of-the-test-statistic" class="section level1">
<h1>Formula of the test statistic</h1>
<div id="case-of-large-sample-sizes" class="section level2">
<h2>Case of large sample sizes</h2>
<p>The test statistic (also known as <strong>z-test</strong>) can be calculated as follow:</p>
<p><span class="math">\[
z = \frac{p_A-p_B}{\sqrt{pq/n_A+pq/n_B}}
\]</span></p>
<p>where,</p>
<ul>
<li><span class="math">\(p_A\)</span> is the proportion observed in group A with size <span class="math">\(n_A\)</span></li>
<li><span class="math">\(p_B\)</span> is the proportion observed in group B with size <span class="math">\(n_B\)</span></li>
<li><span class="math">\(p\)</span> and <span class="math">\(q\)</span> are the overall proportions</li>
</ul>
<div class="success">
<ul>
<li>if <span class="math">\(|z| < 1.96\)</span>, then the difference <strong>is not significant</strong> at 5%</li>
<li>if <span class="math">\(|z| \geq 1.96\)</span>, then the difference <strong>is significant</strong> at 5%</li>
<li>The significance level (p-value) corresponding to the z-statistic can be read in the z-table. We’ll see how to compute it in R.</li>
</ul>
</div>
<p><span class="error">Note that, the formula of z-statistic is valid only when sample size (<span class="math">\(n\)</span>) is large enough. <span class="math">\(n_Ap\)</span>, <span class="math">\(n_Aq\)</span>, <span class="math">\(n_Bp\)</span> and <span class="math">\(n_Bq\)</span> should be <span class="math">\(\geq\)</span> 5.</span></p>
</div>
<div id="case-of-small-sample-sizes" class="section level2">
<h2>Case of small sample sizes</h2>
<p>The <strong>Fisher Exact probability test</strong> is an excellent non-parametric technique for comparing proportions, when the two independent samples are small in size.</p>
</div>
</div>
<div id="compute-two-proportions-z-test-in-r" class="section level1">
<h1>Compute two-proportions z-test in R</h1>
<div id="r-functions-prop.test" class="section level2">
<h2>R functions: prop.test()</h2>
<p>The R functions <strong>prop.test</strong>() can be used as follow:</p>
<pre class="r"><code>prop.test(x, n, p = NULL, alternative = "two.sided",
          correct = TRUE)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>x</strong>: a vector of counts of successes</li>
<li><strong>n</strong>: a vector of count trials</li>
<li><strong>alternative</strong>: a character string specifying the alternative hypothesis</li>
<li><strong>correct</strong>: a logical indicating whether Yates’ continuity correction should be applied where possible</li>
</ul>
</div>
<p><br/></p>
<p><span class="error">Note that, by default, the function <strong>prop.test()</strong> used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, use the additional argument <em>correct = FALSE</em> in prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.) </span></p>
</div>
<div id="compute-two-proportions-z-test" class="section level2">
<h2>Compute two-proportions z-test</h2>
<p><span class="question">We want to know, whether the proportions of smokers are the same in the two groups of individuals?</span></p>
<pre class="r"><code>res <- prop.test(x = c(490, 400), n = c(500, 500))

# Printing the results
res </code></pre>
<pre><code>
    2-sample test for equality of proportions with continuity correction

data:  c(490, 400) out of c(500, 500)
X-squared = 80.909, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1408536 0.2191464
sample estimates:
prop 1 prop 2 
  0.98   0.80 </code></pre>
<br/>
<div class="block">
<p>The function returns:</p>
<ul>
<li>the value of Pearson’s chi-squared test statistic.</li>
<li>a p-value</li>
<li>a 95% confidence intervals</li>
<li>an estimated probability of success (the proportion of smokers in the two groups)</li>
</ul>
</div>
<p><br/></p>
<br/>
<div class="notice">
<p>Note that:</p>
<ul>
<li>if you want to test whether the observed proportion of smokers in group A (<span class="math">\(p_A\)</span>) <em>is less than</em> the observed proportion of smokers in group (<span class="math">\(p_B\)</span>), type this:</li>
</ul>
<pre class="r"><code>prop.test(x = c(490, 400), n = c(500, 500),
           alternative = "less")</code></pre>
<ul>
<li>Or, if you want to test whether the observed proportion of smokers in group A (<span class="math">\(p_A\)</span>) <em>is greater than</em> the observed proportion of smokers in group (<span class="math">\(p_B\)</span>), type this:</li>
</ul>
<pre class="r"><code>prop.test(x = c(490, 400), n = c(500, 500),
              alternative = "greater")</code></pre>
</div>
<p><br/></p>
</div>
<div id="interpretation-of-the-result" class="section level2">
<h2>Interpretation of the result</h2>
<p><span class="success"> The <strong>p-value</strong> of the test is 2.36310^{-19}, which is less than the significance level alpha = 0.05. We can conclude that the proportion of smokers is significantly different in the two groups with a <strong>p-value</strong> = 2.36310^{-19}. </span></p>
<p><span class="warning">Note that, for 2 x 2 table, the standard chi-square test in <strong>chisq.test</strong>() is exactly equivalent to <strong>prop.test</strong>() but it works with data in matrix form.</span></p>
</div>
<div id="access-to-the-values-returned-by-prop.test-function" class="section level2">
<h2>Access to the values returned by prop.test() function</h2>
<p>The result of <strong>prop.test()</strong> function is a list containing the following components:</p>
<br/>
<div class="block">
<ul>
<li><strong>statistic</strong>: the number of successes</li>
<li><strong>parameter</strong>: the number of trials</li>
<li><strong>p.value</strong>: the <strong>p-value</strong> of the test</li>
<li><strong>conf.int</strong>: a confidence interval for the probability of success.</li>
<li><strong>estimate</strong>: the estimated probability of success.</li>
</ul>
</div>
<p><br/></p>
<p>The format of the <strong>R</strong> code to use for getting these values is as follow:</p>
<pre class="r"><code># printing the p-value
res$p.value</code></pre>
<pre><code>[1] 2.363439e-19</code></pre>
<pre class="r"><code># printing the mean
res$estimate</code></pre>
<pre><code>prop 1 prop 2 
  0.98   0.80 </code></pre>
<pre class="r"><code># printing the confidence interval
res$conf.int</code></pre>
<pre><code>[1] 0.1408536 0.2191464
attr(,"conf.level")
[1] 0.95</code></pre>
</div>
</div>
<div id="see-also" class="section level1">
<h1>See also</h1>
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/one-proportion-z-test-in-r">One Proportion Z-Test in R: Compare an Observed Proportion to an Expected One</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-goodness-of-fit-test-in-r">Chi-Square Goodness of Fit Test in R: Compare Multiple Observed Proportions to Expected Probabilities</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-test-of-independence-in-r.Rmd">Chi-Square Test of Independence in R: Evaluate The Association Between Two Categorical Variables</a></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using <strong>R software</strong> (ver. 3.2.4). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('#rdoc h1').addClass('wiki_paragraph1');
    jQuery('#rdoc h2').addClass('wiki_paragraph2');
    jQuery('#rdoc h3').addClass('wiki_paragraph3');
    jQuery('#rdoc h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Thu, 06 Oct 2016 10:57:51 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[One-Proportion Z-Test in R]]></title>
			<link>https://www.sthda.com/english/wiki/one-proportion-z-test-in-r</link>
			<guid>https://www.sthda.com/english/wiki/one-proportion-z-test-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#what-is-one-proportion-z-test">What is one-proportion Z-test?</a></li>
<li><a href="#research-questions-and-statistical-hypotheses">Research questions and statistical hypotheses</a></li>
<li><a href="#formula-of-the-test-statistic">Formula of the test statistic</a></li>
<li><a href="#compute-one-proportion-z-test-in-r">Compute one proportion z-test in R</a><ul>
<li><a href="#r-functions-binom.test-prop.test">R functions: binom.test() &amp; prop.test()</a></li>
<li><a href="#compute-one-proportion-z-test">Compute one-proportion z-test</a></li>
<li><a href="#interpretation-of-the-result">Interpretation of the result</a></li>
<li><a href="#access-to-the-values-returned-by-prop.test">Access to the values returned by prop.test()</a></li>
</ul></li>
<li><a href="#see-also">See also</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p><br/></p>
<div id="what-is-one-proportion-z-test" class="section level1">
<h1>What is one-proportion Z-test?</h1>
<br/>
<div class="block">
The <strong>One proportion</strong> <strong>Z-test</strong> is used to compare an observed proportion to a theoretical one, when there are only two categories. This article describes the basics of <strong>one-proportion z-test</strong> and provides practical examples using <strong>R software</strong>.
</div>
<p><br/></p>
<p>For example, we have a population of mice containing half male and have female (p = 0.5 = 50%). Some of these mice (n = 160) have developed a spontaneous cancer, including 95 male and 65 female.</p>
<p><span class="question">We want to know, whether the cancer affects more male than female?</span></p>
<p>In this setting:</p>
<ul>
<li>the number of successes (male with cancer) is 95</li>
<li>The observed proportion (<span class="math">\(p_o\)</span>) of male is 95/160</li>
<li>The observed proportion (<span class="math">\(q\)</span>) of female is <span class="math">\(1 - p_o\)</span></li>
<li>The expected proportion (<span class="math">\(p_e\)</span>) of male is 0.5 (50%)</li>
<li>The number of observations (<span class="math">\(n\)</span>) is 160</li>
</ul>
<p><br/> <img src="https://www.sthda.com/english/sthda/RDoc/images/one-proportion-z-test.png" alt="One Proportion Z-Test in R" /> <br/></p>
</div>
<div id="research-questions-and-statistical-hypotheses" class="section level1">
<h1>Research questions and statistical hypotheses</h1>
<p>Typical research questions are:</p>
<br/>
<div class="question">
<ol style="list-style-type: decimal">
<li>whether the observed proportion of male (<span class="math">\(p_o\)</span>) <em>is equal</em> to the expected proportion (<span class="math">\(p_e\)</span>)?</li>
<li>whether the observed proportion of male (<span class="math">\(p_o\)</span>) <em>is less than</em> the expected proportion (<span class="math">\(p_e\)</span>)?</li>
<li>whether the observed proportion of male (<span class="math">\(p\)</span>) <em>is greater than</em> the expected proportion (<span class="math">\(p_e\)</span>)?</li>
</ol>
</div>
<p><br/></p>
<p>In statistics, we can define the corresponding <em>null hypothesis</em> (<span class="math">\(H_0\)</span>) as follow:</p>
<ol style="list-style-type: decimal">
<li><span class="math">\(H_0: p_o = p_e\)</span></li>
<li><span class="math">\(H_0: p_o \leq p_e\)</span></li>
<li><span class="math">\(H_0: p_o \geq p_e\)</span></li>
</ol>
<p>The corresponding <em>alternative hypotheses</em> (<span class="math">\(H_a\)</span>) are as follow:</p>
<ol style="list-style-type: decimal">
<li><span class="math">\(H_a: p_o \ne p_e\)</span> (different)</li>
<li><span class="math">\(H_a: p_o > p_e\)</span> (greater)</li>
<li><span class="math">\(H_a: p_o < p_e\)</span> (less)</li>
</ol>
<div class="notice">
<p>Note that:</p>
<ul>
<li>Hypotheses 1) are called <strong>two-tailed tests</strong></li>
<li>Hypotheses 2) and 3) are called <strong>one-tailed tests</strong></li>
</ul>
</div>
</div>
<div id="formula-of-the-test-statistic" class="section level1">
<h1>Formula of the test statistic</h1>
<p>The test statistic (also known as <strong>z-test</strong>) can be calculated as follow:</p>
<p><span class="math">\[
z = \frac{p_o-p_e}{\sqrt{p_oq/n}}
\]</span></p>
<p>where,</p>
<ul>
<li><span class="math">\(p_o\)</span> is the observed proportion</li>
<li><span class="math">\(q = 1-p_o\)</span></li>
<li><span class="math">\(p_e\)</span> is the expected proportion</li>
<li><span class="math">\(n\)</span> is the sample size</li>
</ul>
<div class="success">
<ul>
<li>if <span class="math">\(|z| < 1.96\)</span>, then the difference <strong>is not significant</strong> at 5%</li>
<li>if <span class="math">\(|z| \geq 1.96\)</span>, then the difference <strong>is significant</strong> at 5%</li>
<li>The significance level (p-value) corresponding to the <strong>z-statistic</strong> can be read in the z-table. We’ll see how to compute it in R.</li>
</ul>
</div>
<p>The confidence interval of <span class="math">\(p_o\)</span> at 95% is defined as follow:</p>
<p><span class="math">\[
p_o \pm 1.96\sqrt{\frac{p_oq}{n}}
\]</span></p>
<p><span class="error">Note that, the formula of z-statistic is valid only when sample size (<span class="math">\(n\)</span>) is large enough. <span class="math">\(np_o\)</span> and <span class="math">\(nq\)</span> should be <span class="math">\(\geq\)</span> 5. For example, if <span class="math">\(p_o = 0.1\)</span>, then <span class="math">\(n\)</span> should be at least 50.</span></p>
</div>
<div id="compute-one-proportion-z-test-in-r" class="section level1">
<h1>Compute one proportion z-test in R</h1>
<div id="r-functions-binom.test-prop.test" class="section level2">
<h2>R functions: binom.test() &amp; prop.test()</h2>
<p>The R functions <strong>binom.test</strong>() and <strong>prop.test</strong>() can be used to perform one-proportion test:</p>
<ul>
<li><strong>binom.test</strong>(): compute exact <strong>binomial test</strong>. Recommended when sample size is small</li>
<li><strong>prop.test</strong>(): can be used when sample size is large ( N > 30). It uses a normal approximation to binomial</li>
</ul>
<p>The syntax of the two functions are exactly the same. The simplified format is as follow:</p>
<pre class="r"><code>binom.test(x, n, p = 0.5, alternative = "two.sided")

prop.test(x, n, p = NULL, alternative = "two.sided",
          correct = TRUE)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>x</strong>: the number of of successes</li>
<li><strong>n</strong>: the total number of trials</li>
<li><strong>p</strong>: the probability to test against.</li>
<li><strong>correct</strong>: a logical indicating whether Yates’ continuity correction should be applied where possible.</li>
</ul>
</div>
<p><br/></p>
<p><span class="error">Note that, by default, the function <strong>prop.test()</strong> used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, use the additional argument <em>correct = FALSE</em> in prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.) </span></p>
</div>
<div id="compute-one-proportion-z-test" class="section level2">
<h2>Compute one-proportion z-test</h2>
<p><span class="question">We want to know, whether the cancer affects more male than female?</span></p>
<p>We’ll use the function <strong>prop.test</strong>()</p>
<pre class="r"><code>res <- prop.test(x = 95, n = 160, p = 0.5, 
                 correct = FALSE)

# Printing the results
res </code></pre>
<pre><code>
    1-sample proportions test without continuity correction

data:  95 out of 160, null probability 0.5
X-squared = 5.625, df = 1, p-value = 0.01771
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5163169 0.6667870
sample estimates:
      p 
0.59375 </code></pre>
<br/>
<div class="block">
<p>The function returns:</p>
<ul>
<li>the value of Pearson’s chi-squared test statistic.</li>
<li>a p-value</li>
<li>a 95% confidence intervals</li>
<li>an estimated probability of success (the proportion of male with cancer)</li>
</ul>
</div>
<p><br/></p>
<br/>
<div class="notice">
<p>Note that:</p>
<ul>
<li>if you want to test whether the proportion of male with cancer is less than 0.5 (one-tailed test), type this:</li>
</ul>
<pre class="r"><code>prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
           alternative = "less")</code></pre>
<ul>
<li>Or, if you want to test whether the proportion of male with cancer is greater than 0.5 (one-tailed test), type this:</li>
</ul>
<pre class="r"><code>prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
              alternative = "greater")</code></pre>
</div>
<p><br/></p>
</div>
<div id="interpretation-of-the-result" class="section level2">
<h2>Interpretation of the result</h2>
<p><span class="success"> The <strong>p-value</strong> of the test is 0.01771, which is less than the significance level alpha = 0.05. We can conclude that the proportion of male with cancer is significantly different from 0.5 with a <strong>p-value</strong> = 0.01771. </span></p>
</div>
<div id="access-to-the-values-returned-by-prop.test" class="section level2">
<h2>Access to the values returned by prop.test()</h2>
<p>The result of <strong>prop.test()</strong> function is a list containing the following components:</p>
<br/>
<div class="block">
<ul>
<li><strong>statistic</strong>: the number of successes</li>
<li><strong>parameter</strong>: the number of trials</li>
<li><strong>p.value</strong>: the <strong>p-value</strong> of the test</li>
<li><strong>conf.int</strong>: a confidence interval for the probability of success.</li>
<li><strong>estimate</strong>: the estimated probability of success.</li>
</ul>
</div>
<p><br/></p>
<p>The format of the <strong>R</strong> code to use for getting these values is as follow:</p>
<pre class="r"><code># printing the p-value
res$p.value</code></pre>
<pre><code>[1] 0.01770607</code></pre>
<pre class="r"><code># printing the mean
res$estimate</code></pre>
<pre><code>      p 
0.59375 </code></pre>
<pre class="r"><code># printing the confidence interval
res$conf.int</code></pre>
<pre><code>[1] 0.5163169 0.6667870
attr(,"conf.level")
[1] 0.95</code></pre>
</div>
</div>
<div id="see-also" class="section level1">
<h1>See also</h1>
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/two-proportions-z-test-in-r">Two Proportions Z-test in R: Compare Two Observed Proportions</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-goodness-of-fit-test-in-r">Chi-Square Goodness of Fit Test in R: Compare Multiple Observed Proportions to Expected Probabilities</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/chi-square-test-of-independence-in-r.Rmd">Chi-Square Test of Independence in R: Evaluate The Association Between Two Categorical Variables</a></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using <strong>R software</strong> (ver. 3.2.4). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('#rdoc h1').addClass('wiki_paragraph1');
    jQuery('#rdoc h2').addClass('wiki_paragraph2');
    jQuery('#rdoc h3').addClass('wiki_paragraph3');
    jQuery('#rdoc h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Thu, 06 Oct 2016 10:43:04 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
