Easy Guides

Chi-Square Test of Independence in R

Fri, 25 Nov 2016 08:43:37 +0100

The chi-square test of independence is used to analyze the frequency table (i.e. contengency table) formed by two categorical variables. The chi-square test evaluates whether there is a significant association between the categories of the two variables. This article describes the basics of chi-square test and provides practical examples using R software.

Data format: Contingency tables
Graphical display of contengency tables
Chi-square test basics
Compute chi-square test in R
Nature of the dependence between the row and the column variables
Access to the values returned by chisq.test() function
See also
Infos

Data format: Contingency tables

We’ll use housetasks data sets from STHDA: https://www.sthda.com/sthda/RDoc/data/housetasks.txt.

# Import the data
file_path <- "https://www.sthda.com/sthda/RDoc/data/housetasks.txt"
housetasks <- read.delim(file_path, row.names = 1)
# head(housetasks)

An image of the data is displayed below:

Data format correspondence analysis

The data is a contingency table containing 13 housetasks and their distribution in the couple:

rows are the different tasks
values are the frequencies of the tasks done :
by the wife only
alternatively
by the husband only
or jointly

Graphical display of contengency tables

Contingency table can be visualized using the function balloonplot() [in gplots package]. This function draws a graphical matrix where each cell contains a dot whose size reflects the relative magnitude of the corresponding component.

To execute the R code below, you should install the package gplots: install.packages(“gplots”).

library("gplots")
# 1. convert the data as a table
dt <- as.table(as.matrix(housetasks))
# 2. Graph
balloonplot(t(dt), main ="housetasks", xlab ="", ylab="",
            label = FALSE, show.margins = FALSE)

Chi-Square Test of Independence in R

Note that, row and column sums are printed by default in the bottom and right margins, respectively. These values can be hidden using the argument show.margins = FALSE.

It’s also possible to visualize a contingency table as a mosaic plot. This is done using the function mosaicplot() from the built-in R package garphics:

library("graphics")
mosaicplot(dt, shade = TRUE, las=2,
           main = "housetasks")

Chi-Square Test of Independence in R

The argument shade is used to color the graph
The argument las = 2 produces vertical labels

Note that the surface of an element of the mosaic reflects the relative magnitude of its value.

Blue color indicates that the observed value is higher than the expected value if the data were random
Red color specifies that the observed value is lower than the expected value if the data were random

From this mosaic plot, it can be seen that the housetasks Laundry, Main_meal, Dinner and breakfeast (blue color) are mainly done by the wife in our example.

There is another package named vcd, which can be used to make a mosaic plot (function mosaic()) or an association plot (function assoc()).

# install.packages("vcd")
library("vcd")
# plot just a subset of the table
assoc(head(dt, 5), shade = TRUE, las=3)

Chi-Square Test of Independence in R

Chi-square test basics

Chi-square test examines whether rows and columns of a contingency table are statistically significantly associated.

Null hypothesis (H0): the row and the column variables of the contingency table are independent.
Alternative hypothesis (H1): row and column variables are dependent

For each cell of the table, we have to calculate the expected value under null hypothesis.

For a given cell, the expected value is calculated as follow:

\[ e = \frac{row.sum * col.sum}{grand.total} \]

The Chi-square statistic is calculated as follow:

\[ \chi^2 = \sum{\frac{(o - e)^2}{e}} \]

o is the observed value
e is the expected value

This calculated Chi-square statistic is compared to the critical value (obtained from statistical tables) with \(df = (r - 1)(c - 1)\) degrees of freedom and p = 0.05.

r is the number of rows in the contingency table
c is the number of column in the contingency table

If the calculated Chi-square statistic is greater than the critical value, then we must conclude that the row and the column variables are not independent of each other. This implies that they are significantly associated.

Note that, Chi-square test should only be applied when the expected frequency of any cell is at least 5.

Compute chi-square test in R

Chi-square statistic can be easily computed using the function chisq.test() as follow:

chisq <- chisq.test(housetasks)
chisq


    Pearson's Chi-squared test

data:  housetasks
X-squared = 1944.5, df = 36, p-value < 2.2e-16

In our example, the row and the column variables are statistically significantly associated (p-value = 0).

The observed and the expected counts can be extracted from the result of the test as follow:

# Observed counts
chisq$observed

           Wife Alternating Husband Jointly
Laundry     156          14       2       4
Main_meal   124          20       5       4
Dinner       77          11       7      13
Breakfeast   82          36      15       7
Tidying      53          11       1      57
Dishes       32          24       4      53
Shopping     33          23       9      55
Official     12          46      23      15
Driving      10          51      75       3
Finances     13          13      21      66
Insurance     8           1      53      77
Repairs       0           3     160       2
Holidays      0           1       6     153

# Expected counts
round(chisq$expected,2)

            Wife Alternating Husband Jointly
Laundry    60.55       25.63   38.45   51.37
Main_meal  52.64       22.28   33.42   44.65
Dinner     37.16       15.73   23.59   31.52
Breakfeast 48.17       20.39   30.58   40.86
Tidying    41.97       17.77   26.65   35.61
Dishes     38.88       16.46   24.69   32.98
Shopping   41.28       17.48   26.22   35.02
Official   33.03       13.98   20.97   28.02
Driving    47.82       20.24   30.37   40.57
Finances   38.88       16.46   24.69   32.98
Insurance  47.82       20.24   30.37   40.57
Repairs    56.77       24.03   36.05   48.16
Holidays   55.05       23.30   34.95   46.70

Nature of the dependence between the row and the column variables

As mentioned above the total Chi-square statistic is 1944.456196.

If you want to know the most contributing cells to the total Chi-square score, you just have to calculate the Chi-square statistic for each cell:

\[ r = \frac{o - e}{\sqrt{e}} \]

The above formula returns the so-called Pearson residuals (r) for each cell (or standardized residuals)

Cells with the highest absolute standardized residuals contribute the most to the total Chi-square score.

Pearson residuals can be easily extracted from the output of the function chisq.test():

round(chisq$residuals, 3)

             Wife Alternating Husband Jointly
Laundry    12.266      -2.298  -5.878  -6.609
Main_meal   9.836      -0.484  -4.917  -6.084
Dinner      6.537      -1.192  -3.416  -3.299
Breakfeast  4.875       3.457  -2.818  -5.297
Tidying     1.702      -1.606  -4.969   3.585
Dishes     -1.103       1.859  -4.163   3.486
Shopping   -1.289       1.321  -3.362   3.376
Official   -3.659       8.563   0.443  -2.459
Driving    -5.469       6.836   8.100  -5.898
Finances   -4.150      -0.852  -0.742   5.750
Insurance  -5.758      -4.277   4.107   5.720
Repairs    -7.534      -4.290  20.646  -6.651
Holidays   -7.419      -4.620  -4.897  15.556

Let’s visualize Pearson residuals using the package corrplot:

library(corrplot)
corrplot(chisq$residuals, is.cor = FALSE)

Chi-Square Test of Independence in R

For a given cell, the size of the circle is proportional to the amount of the cell contribution.

The sign of the standardized residuals is also very important to interpret the association between rows and columns as explained in the block below.

Positive residuals are in blue. Positive values in cells specify an attraction (positive association) between the corresponding row and column variables.

In the image above, it’s evident that there are an association between the column Wife and the rows Laundry, Main_meal.
There is a strong positive association between the column Husband and the row Repair

Negative residuals are in red. This implies a repulsion (negative association) between the corresponding row and column variables. For example the column Wife are negatively associated (~ “not associated”) with the row Repairs. There is a repulsion between the column Husband and, the rows Laundry and Main_meal

The contribution (in %) of a given cell to the total Chi-square score is calculated as follow:

\[ contrib = \frac{r^2}{\chi^2} \]

r is the residual of the cell

# Contibution in percentage (%)
contrib <- 100*chisq$residuals^2/chisq$statistic
round(contrib, 3)

            Wife Alternating Husband Jointly
Laundry    7.738       0.272   1.777   2.246
Main_meal  4.976       0.012   1.243   1.903
Dinner     2.197       0.073   0.600   0.560
Breakfeast 1.222       0.615   0.408   1.443
Tidying    0.149       0.133   1.270   0.661
Dishes     0.063       0.178   0.891   0.625
Shopping   0.085       0.090   0.581   0.586
Official   0.688       3.771   0.010   0.311
Driving    1.538       2.403   3.374   1.789
Finances   0.886       0.037   0.028   1.700
Insurance  1.705       0.941   0.868   1.683
Repairs    2.919       0.947  21.921   2.275
Holidays   2.831       1.098   1.233  12.445

# Visualize the contribution
corrplot(contrib, is.cor = FALSE)

Chi-Square Test of Independence in R

The relative contribution of each cell to the total Chi-square score give some indication of the nature of the dependency between rows and columns of the contingency table.

It can be seen that:

The column “Wife” is strongly associated with Laundry, Main_meal, Dinner
The column “Husband” is strongly associated with the row Repairs
The column jointly is frequently associated with the row Holidays

From the image above, it can be seen that the most contributing cells to the Chi-square are Wife/Laundry (7.74%), Wife/Main_meal (4.98%), Husband/Repairs (21.9%), Jointly/Holidays (12.44%).

These cells contribute about 47.06% to the total Chi-square score and thus account for most of the difference between expected and observed values.

This confirms the earlier visual interpretation of the data. As stated earlier, visual interpretation may be complex when the contingency table is very large. In this case, the contribution of one cell to the total Chi-square score becomes a useful way of establishing the nature of dependency.

Access to the values returned by chisq.test() function

The result of chisq.test() function is a list containing the following components:

statistic: the value the chi-squared test statistic.
parameter: the degrees of freedom
p.value: the p-value of the test
observed: the observed count
expected: the expected count

The format of the R code to use for getting these values is as follow:

# printing the p-value
chisq$p.value
# printing the mean
chisq$estimate

Infos

This analysis has been performed using R software (ver. 3.2.4).

Chi-square Goodness of Fit Test in R

Tue, 11 Oct 2016 10:45:30 +0200

What is chi-square goodness of fit test?
Example data and questions
Statistical hypotheses
R function: chisq.test()
See also
Infos

What is chi-square goodness of fit test?

The chi-square goodness of fit test is used to compare the observed distribution to an expected distribution, in a situation where we have two or more categories in a discrete data. In other words, it compares multiple observed proportions to expected probabilities.

Example data and questions

For example, we collected wild tulips and found that 81 were red, 50 were yellow and 27 were white.

Question 1:

Are these colors equally common?

If these colors were equally distributed, the expected proportion would be 1/3 for each of the color.

Question 2:

Suppose that, in the region where you collected the data, the ratio of red, yellow and white tulip is 3:2:1 (3+2+1 = 6). This means that the expected proportion is:

3/6 (= 1/2) for red
2/6 ( = 1/3) for yellow
1/6 for white

We want to know, if there is any significant difference between the observed proportions and the expected proportions.

Statistical hypotheses

Null hypothesis (\(H_0\)): There is no significant difference between the observed and the expected value.
Alternative hypothesis (\(H_a\)): There is a significant difference between the observed and the expected value.

R function: chisq.test()

The R function chisq.test() can be used as follow:

chisq.test(x, p)

x: a numeric vector
p: a vector of probabilities of the same length of x.

Answer to Q1: Are the colors equally common?

tulip <- c(81, 50, 27)
res <- chisq.test(tulip, p = c(1/3, 1/3, 1/3))
res


    Chi-squared test for given probabilities

data:  tulip
X-squared = 27.886, df = 2, p-value = 8.803e-07

The function returns: the value of chi-square test statistic (“X-squared”) and a a p-value.

The p-value of the test is 8.80310^{-7}, which is less than the significance level alpha = 0.05. We can conclude that the colors are significantly not commonly distributed with a p-value = 8.80310^{-7}.

Note that, the chi-square test should be used only when all calculated expected values are greater than 5.

# Access to the expected values
res$expected

[1] 52.66667 52.66667 52.66667

Answer to Q2 comparing observed to expected proportions

tulip <- c(81, 50, 27)
res <- chisq.test(tulip, p = c(1/2, 1/3, 1/6))
res


    Chi-squared test for given probabilities

data:  tulip
X-squared = 0.20253, df = 2, p-value = 0.9037

The p-value of the test is 0.9037, which is greater than the significance level alpha = 0.05. We can conclude that the observed proportions are not significantly different from the expected proportions.

Access to the values returned by chisq.test() function

The result of chisq.test() function is a list containing the following components:

statistic: the value the chi-squared test statistic.
parameter: the degrees of freedom
p.value: the p-value of the test
observed: the observed count
expected: the expected count

The format of the R code to use for getting these values is as follow:

# printing the p-value
res$p.value

[1] 0.9036928

# printing the mean
res$estimate

NULL

Infos

This analysis has been performed using R software (ver. 3.2.4).

Comparing Proportions in R

Sat, 08 Oct 2016 09:23:19 +0200

Previously, we described the essentials of R programming and provided quick start guides for importing data into R. Additionally, we described how to compute descriptive or summary statistics, correlation analysis, as well as, how to compare sample means and variances using R software.

This chapter contains articles describing statistical tests to use for comparing proportions.

1 How this chapter is organized?

2 One-proportion z-Test

Compare an observed proportion to an expected one.

Read more: —> One-Proportion Z-Test in R.

3 Two-proportions z-Test

Compare two observed proportions.

4 Chi-square goodness of fit test in R

Compare multiple observed proportions to expected probabilities.

Read more: —> Chi-square goodness of fit test in R.

5 Chi-Square test of independence in R

Evaluate the association between two categorical variables.

Read more: —> Chi-Square Test of Independence in R.

6 See also

7 Infos

This analysis has been performed using R statistical software (ver. 3.2.4).

Two-Proportions Z-Test in R

Thu, 06 Oct 2016 10:57:51 +0200

What is two-proportions z-test?
Research questions and statistical hypotheses
Formula of the test statistic
- Case of large sample sizes
- Case of small sample sizes
Compute two-proportions z-test in R
See also
Infos

What is two-proportions z-test?

The two-proportions z-test is used to compare two observed proportions. This article describes the basics of two-proportions *z-test and provides pratical examples using R sfoftware**.

For example, we have two groups of individuals:

Group A with lung cancer: n = 500
Group B, healthy individuals: n = 500

The number of smokers in each group is as follow:

Group A with lung cancer: n = 500, 490 smokers, \(p_A = 490/500 = 98%\)
Group B, healthy individuals: n = 500, 400 smokers, \(p_B = 400/500 = 80%\)

In this setting:

The overall proportion of smokers is \(p = frac{(490 + 400)}{500 + 500} = 89%\)
The overall proportion of non-smokers is \(q = 1-p = 11%\)

We want to know, whether the proportions of smokers are the same in the two groups of individuals?

Research questions and statistical hypotheses

Typical research questions are:

whether the observed proportion of smokers in group A (\(p_A\)) is equal to the observed proportion of smokers in group (\(p_B\))?
whether the observed proportion of smokers in group A (\(p_A\)) is less than the observed proportion of smokers in group (\(p_B\))?
whether the observed proportion of smokers in group A (\(p_A\)) is greater than the observed proportion of smokers in group (\(p_B\))?

In statistics, we can define the corresponding null hypothesis (\(H_0\)) as follow:

\(H_0: p_A = p_B\)
\(H_0: p_A \leq p_B\)
\(H_0: p_A \geq p_B\)

The corresponding alternative hypotheses (\(H_a\)) are as follow:

\(H_a: p_A \ne p_B\) (different)
\(H_a: p_A > p_B\) (greater)
\(H_a: p_A < p_B\) (less)

Note that:

Hypotheses 1) are called two-tailed tests
Hypotheses 2) and 3) are called one-tailed tests

Formula of the test statistic

Case of large sample sizes

The test statistic (also known as z-test) can be calculated as follow:

\[ z = \frac{p_A-p_B}{\sqrt{pq/n_A+pq/n_B}} \]

where,

\(p_A\) is the proportion observed in group A with size \(n_A\)
\(p_B\) is the proportion observed in group B with size \(n_B\)
\(p\) and \(q\) are the overall proportions

if \(|z| < 1.96\), then the difference is not significant at 5%
if \(|z| \geq 1.96\), then the difference is significant at 5%
The significance level (p-value) corresponding to the z-statistic can be read in the z-table. We’ll see how to compute it in R.

Note that, the formula of z-statistic is valid only when sample size (\(n\)) is large enough. \(n_Ap\), \(n_Aq\), \(n_Bp\) and \(n_Bq\) should be \(\geq\) 5.

Case of small sample sizes

The Fisher Exact probability test is an excellent non-parametric technique for comparing proportions, when the two independent samples are small in size.

Compute two-proportions z-test in R

R functions: prop.test()

The R functions prop.test() can be used as follow:

prop.test(x, n, p = NULL, alternative = "two.sided",
          correct = TRUE)

x: a vector of counts of successes
n: a vector of count trials
alternative: a character string specifying the alternative hypothesis
correct: a logical indicating whether Yates’ continuity correction should be applied where possible

Note that, by default, the function prop.test() used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, use the additional argument correct = FALSE in prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.)

Compute two-proportions z-test

We want to know, whether the proportions of smokers are the same in the two groups of individuals?

res <- prop.test(x = c(490, 400), n = c(500, 500))

# Printing the results
res


    2-sample test for equality of proportions with continuity correction

data:  c(490, 400) out of c(500, 500)
X-squared = 80.909, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1408536 0.2191464
sample estimates:
prop 1 prop 2 
  0.98   0.80

The function returns:

the value of Pearson’s chi-squared test statistic.
a p-value
a 95% confidence intervals
an estimated probability of success (the proportion of smokers in the two groups)

Note that:

if you want to test whether the observed proportion of smokers in group A (\(p_A\)) is less than the observed proportion of smokers in group (\(p_B\)), type this:

prop.test(x = c(490, 400), n = c(500, 500),
           alternative = "less")

Or, if you want to test whether the observed proportion of smokers in group A (\(p_A\)) is greater than the observed proportion of smokers in group (\(p_B\)), type this:

prop.test(x = c(490, 400), n = c(500, 500),
              alternative = "greater")

Interpretation of the result

The p-value of the test is 2.36310^{-19}, which is less than the significance level alpha = 0.05. We can conclude that the proportion of smokers is significantly different in the two groups with a p-value = 2.36310^{-19}.

Note that, for 2 x 2 table, the standard chi-square test in chisq.test() is exactly equivalent to prop.test() but it works with data in matrix form.

Access to the values returned by prop.test() function

The result of prop.test() function is a list containing the following components:

statistic: the number of successes
parameter: the number of trials
p.value: the p-value of the test
conf.int: a confidence interval for the probability of success.
estimate: the estimated probability of success.

The format of the R code to use for getting these values is as follow:

# printing the p-value
res$p.value

[1] 2.363439e-19

# printing the mean
res$estimate

prop 1 prop 2 
  0.98   0.80

# printing the confidence interval
res$conf.int

[1] 0.1408536 0.2191464
attr(,"conf.level")
[1] 0.95

Infos

This analysis has been performed using R software (ver. 3.2.4).

One-Proportion Z-Test in R

Thu, 06 Oct 2016 10:43:04 +0200

What is one-proportion Z-test?
Research questions and statistical hypotheses
Formula of the test statistic
Compute one proportion z-test in R
See also
Infos

What is one-proportion Z-test?

The One proportion Z-test is used to compare an observed proportion to a theoretical one, when there are only two categories. This article describes the basics of one-proportion z-test and provides practical examples using R software.

For example, we have a population of mice containing half male and have female (p = 0.5 = 50%). Some of these mice (n = 160) have developed a spontaneous cancer, including 95 male and 65 female.

We want to know, whether the cancer affects more male than female?

In this setting:

the number of successes (male with cancer) is 95
The observed proportion (\(p_o\)) of male is 95/160
The observed proportion (\(q\)) of female is \(1 - p_o\)
The expected proportion (\(p_e\)) of male is 0.5 (50%)
The number of observations (\(n\)) is 160

Research questions and statistical hypotheses

Typical research questions are:

whether the observed proportion of male (\(p_o\)) is equal to the expected proportion (\(p_e\))?
whether the observed proportion of male (\(p_o\)) is less than the expected proportion (\(p_e\))?
whether the observed proportion of male (\(p\)) is greater than the expected proportion (\(p_e\))?

In statistics, we can define the corresponding null hypothesis (\(H_0\)) as follow:

\(H_0: p_o = p_e\)
\(H_0: p_o \leq p_e\)
\(H_0: p_o \geq p_e\)

The corresponding alternative hypotheses (\(H_a\)) are as follow:

\(H_a: p_o \ne p_e\) (different)
\(H_a: p_o > p_e\) (greater)
\(H_a: p_o < p_e\) (less)

Note that:

Hypotheses 1) are called two-tailed tests
Hypotheses 2) and 3) are called one-tailed tests

Formula of the test statistic

The test statistic (also known as z-test) can be calculated as follow:

\[ z = \frac{p_o-p_e}{\sqrt{p_oq/n}} \]

where,

\(p_o\) is the observed proportion
\(q = 1-p_o\)
\(p_e\) is the expected proportion
\(n\) is the sample size

if \(|z| < 1.96\), then the difference is not significant at 5%
if \(|z| \geq 1.96\), then the difference is significant at 5%
The significance level (p-value) corresponding to the z-statistic can be read in the z-table. We’ll see how to compute it in R.

The confidence interval of \(p_o\) at 95% is defined as follow:

\[ p_o \pm 1.96\sqrt{\frac{p_oq}{n}} \]

Note that, the formula of z-statistic is valid only when sample size (\(n\)) is large enough. \(np_o\) and \(nq\) should be \(\geq\) 5. For example, if \(p_o = 0.1\), then \(n\) should be at least 50.

Compute one proportion z-test in R

R functions: binom.test() & prop.test()

The R functions binom.test() and prop.test() can be used to perform one-proportion test:

binom.test(): compute exact binomial test. Recommended when sample size is small
prop.test(): can be used when sample size is large ( N > 30). It uses a normal approximation to binomial

The syntax of the two functions are exactly the same. The simplified format is as follow:

binom.test(x, n, p = 0.5, alternative = "two.sided")

prop.test(x, n, p = NULL, alternative = "two.sided",
          correct = TRUE)

x: the number of of successes
n: the total number of trials
p: the probability to test against.
correct: a logical indicating whether Yates’ continuity correction should be applied where possible.

Compute one-proportion z-test

We want to know, whether the cancer affects more male than female?

We’ll use the function prop.test()

res <- prop.test(x = 95, n = 160, p = 0.5, 
                 correct = FALSE)

# Printing the results
res


    1-sample proportions test without continuity correction

data:  95 out of 160, null probability 0.5
X-squared = 5.625, df = 1, p-value = 0.01771
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5163169 0.6667870
sample estimates:
      p 
0.59375

The function returns:

the value of Pearson’s chi-squared test statistic.
a p-value
a 95% confidence intervals
an estimated probability of success (the proportion of male with cancer)

Note that:

if you want to test whether the proportion of male with cancer is less than 0.5 (one-tailed test), type this:

prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
           alternative = "less")

Or, if you want to test whether the proportion of male with cancer is greater than 0.5 (one-tailed test), type this:

prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
              alternative = "greater")

Interpretation of the result

The p-value of the test is 0.01771, which is less than the significance level alpha = 0.05. We can conclude that the proportion of male with cancer is significantly different from 0.5 with a p-value = 0.01771.

Access to the values returned by prop.test()

The result of prop.test() function is a list containing the following components:

statistic: the number of successes
parameter: the number of trials
p.value: the p-value of the test
conf.int: a confidence interval for the probability of success.
estimate: the estimated probability of success.

The format of the R code to use for getting these values is as follow:

# printing the p-value
res$p.value

[1] 0.01770607

# printing the mean
res$estimate

      p 
0.59375

# printing the confidence interval
res$conf.int

[1] 0.5163169 0.6667870
attr(,"conf.level")
[1] 0.95

Infos

This analysis has been performed using R software (ver. 3.2.4).

Easy Guides

Chi-Square Test of Independence in R

Contents

Data format: Contingency tables

Graphical display of contengency tables

Chi-square test basics

Compute chi-square test in R

Nature of the dependence between the row and the column variables

Access to the values returned by chisq.test() function

See also

Infos

Chi-square Goodness of Fit Test in R

What is chi-square goodness of fit test?

Example data and questions

Statistical hypotheses

R function: chisq.test()

Answer to Q1: Are the colors equally common?

Answer to Q2 comparing observed to expected proportions

Access to the values returned by chisq.test() function

See also

Infos

Comparing Proportions in R

1 How this chapter is organized?

2 One-proportion z-Test

3 Two-proportions z-Test

4 Chi-square goodness of fit test in R

5 Chi-Square test of independence in R

6 See also

7 Infos

Two-Proportions Z-Test in R

What is two-proportions z-test?

Research questions and statistical hypotheses

Formula of the test statistic

Case of large sample sizes

Case of small sample sizes

Compute two-proportions z-test in R

R functions: prop.test()

Compute two-proportions z-test

Interpretation of the result

Access to the values returned by prop.test() function

See also

Infos

One-Proportion Z-Test in R

What is one-proportion Z-test?

Research questions and statistical hypotheses

Formula of the test statistic

Compute one proportion z-test in R

R functions: binom.test() & prop.test()

Compute one-proportion z-test

Interpretation of the result

Access to the values returned by prop.test()

See also

Infos