This article describes how to make a **graph of correlation matrix** in **R**. The R **symnum()** function is used. It takes the **correlation table** as an argument. The result is a table in which **correlation coefficients** are replaced by symbols according to the **degree of correlation**.

Note that online software is also available here to compute

The R function **symnum** can be used to easily highlight the highly correlated variables. It replaces correlation coefficients by symbols according to the value.

The simplified format of the function is :

`symnum(x, cutpoints = c(0.3, 0.6, 0.8, 0.9, 0.95),`

symbols = c(" ", ".", ",", "+", "*", "B"))

- **x** is the correlation matrix to visualize

- **cutpoints** : **correlation coefficient** cutpoints. The **correlation coefficients** between 0 and 0.3 are replaced by a space (" “ **correlation coefficients** between 0.3 and 0.6 are replace by”.“; etc …

- **symbols** : the symbols to use.

The following R code performs a **correlation analysis** and displays a **graph of the correlation matrix** :

`## Correlation matrix`

corMat<-cor(mtcars)

head(round(corMat,2))

` mpg cyl disp hp drat wt qsec vs am gear carb`

mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55

cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53

disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39

hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75

drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09

wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43

`## Correlation graph for visualization`

## abbr.colnames=FALSE to avoid abbreviation of column names

symnum(corMat, abbr.colnames=FALSE)

` mpg cyl disp hp drat wt qsec vs am gear carb`

mpg 1

cyl + 1

disp + * 1

hp , + , 1

drat , , , . 1

wt + , + , , 1

qsec . . . , 1

vs , + , , . . , 1

am . . . , , 1

gear . . . , . , 1

carb . . . , . , . 1

attr(,"legend")

[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1

One of the easy way to visualize

`This analysis was performed using R (ver. 3.1.0).`

[/html]]]>

Correlation matrix analysis is an important method to find **dependence** between variables. Computing **correlation matrix** and drawing **correlogram** is explained here. The aim of this article is to show you how to get the **lower and the upper triangular part of a correlation matrix**. We will also use the **xtable R package** to display a nice **correlation table** in html or latex formats.

Note that online software is also available here to compute **correlation matrix** and to plot a **correlogram** without any installation.

**Contents:**

The following R code computes a **correlation matrix** using mtcars data. Click here to read more.

```
mcor<-round(cor(mtcars),2)
mcor
```

```
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
```

The result is a table of **correlation coefficients** between all possible pairs of variables.

To get the lower or the upper part of a **correlation matrix**, the R function **lower.tri()** or **upper.tri()** can be used. The formats of the functions are :

```
lower.tri(x, diag = FALSE)
upper.tri(x, diag = FALSE)
```

- **x** : is the **correlation matrix** - **diag** : if TRUE the diagonal are not included in the result.

The two functions above, return a matrix of logicals which has the same size of a the **correlation matrix**. The entries is TRUE in the lower or upper triangle :

`upper.tri(mcor)`

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[2,] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[3,] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[4,] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[5,] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
[6,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
[7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
```

```
# Hide upper triangle
upper<-mcor
upper[upper.tri(mcor)]<-""
upper<-as.data.frame(upper)
upper
```

```
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1
cyl -0.85 1
disp -0.85 0.9 1
hp -0.78 0.83 0.79 1
drat 0.68 -0.7 -0.71 -0.45 1
wt -0.87 0.78 0.89 0.66 -0.71 1
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1
vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1
am 0.6 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1
gear 0.48 -0.49 -0.56 -0.13 0.7 -0.58 -0.21 0.21 0.79 1
carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1
```

```
#Hide lower triangle
lower<-mcor
lower[lower.tri(mcor, diag=TRUE)]<-""
lower<-as.data.frame(lower)
lower
```

```
mpg cyl disp hp drat wt qsec vs am gear carb
mpg -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.6 0.48 -0.55
cyl 0.9 0.83 -0.7 0.78 -0.59 -0.81 -0.52 -0.49 0.53
disp 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
hp -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
drat -0.71 0.09 0.44 0.71 0.7 -0.09
wt -0.17 -0.55 -0.69 -0.58 0.43
qsec 0.74 -0.23 -0.21 -0.66
vs 0.17 0.21 -0.57
am 0.79 0.06
gear 0.27
carb
```

```
library(xtable)
print(xtable(upper), type="html")
```

mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|

mpg | 1 | ||||||||||

cyl | -0.85 | 1 | |||||||||

disp | -0.85 | 0.9 | 1 | ||||||||

hp | -0.78 | 0.83 | 0.79 | 1 | |||||||

drat | 0.68 | -0.7 | -0.71 | -0.45 | 1 | ||||||

wt | -0.87 | 0.78 | 0.89 | 0.66 | -0.71 | 1 | |||||

qsec | 0.42 | -0.59 | -0.43 | -0.71 | 0.09 | -0.17 | 1 | ||||

vs | 0.66 | -0.81 | -0.71 | -0.72 | 0.44 | -0.55 | 0.74 | 1 | |||

am | 0.6 | -0.52 | -0.59 | -0.24 | 0.71 | -0.69 | -0.23 | 0.17 | 1 | ||

gear | 0.48 | -0.49 | -0.56 | -0.13 | 0.7 | -0.58 | -0.21 | 0.21 | 0.79 | 1 | |

carb | -0.55 | 0.53 | 0.39 | 0.75 | -0.09 | 0.43 | -0.66 | -0.57 | 0.06 | 0.27 | 1 |

Custom function **corstars()** is used to combine the **correlation coefficients** and the level of **significance**. The R code of the function is provided at the end of this article. It requires 2 packages :

- The
**Hmisc R package**to compute the**matrix of correlation coefficients**and the corresponding**p-values**. - The
**xtable R package**for displaying in HTML or Latex format.

Before continuing with the following exercises, you should first copy and paste the source code the function **corstars**(), which you can find at the bottom of this article.

`corstars(mtcars[,1:7], result="html")`

mpg | cyl | disp | hp | drat | wt | |
---|---|---|---|---|---|---|

mpg | ||||||

cyl | -0.85**** | |||||

disp | -0.85**** | 0.90**** | ||||

hp | -0.78**** | 0.83**** | 0.79**** | |||

drat | 0.68**** | -0.70**** | -0.71**** | -0.45** | ||

wt | -0.87**** | 0.78**** | 0.89**** | 0.66**** | -0.71**** | |

qsec | 0.42* | -0.59*** | -0.43* | -0.71**** | 0.09 | -0.17 |

p < .0001 ‘****’; p < .001 ‘***’, p < .01 ‘**’, p < .05 ‘*’

**The code of corstars function** (The code is adapted from the one posted on this forum and on this blog ):

```
# x is a matrix containing the data
# method : correlation method. "pearson"" or "spearman"" is supported
# removeTriangle : remove upper or lower triangle
# results : if "html" or "latex"
# the results will be displayed in html or latex format
corstars <-function(x, method=c("pearson", "spearman"), removeTriangle=c("upper", "lower"),
result=c("none", "html", "latex")){
#Compute correlation matrix
require(Hmisc)
x <- as.matrix(x)
correlation_matrix<-rcorr(x, type=method[1])
R <- correlation_matrix$r # Matrix of correlation coeficients
p <- correlation_matrix$P # Matrix of p-value
## Define notions for significance levels; spacing is important.
mystars <- ifelse(p < .0001, "****", ifelse(p < .001, "*** ", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " "))))
## trunctuate the correlation matrix to two decimal
R <- format(round(cbind(rep(-1.11, ncol(x)), R), 2))[,-1]
## build a new matrix that includes the correlations with their apropriate stars
Rnew <- matrix(paste(R, mystars, sep=""), ncol=ncol(x))
diag(Rnew) <- paste(diag(R), " ", sep="")
rownames(Rnew) <- colnames(x)
colnames(Rnew) <- paste(colnames(x), "", sep="")
## remove upper triangle of correlation matrix
if(removeTriangle[1]=="upper"){
Rnew <- as.matrix(Rnew)
Rnew[upper.tri(Rnew, diag = TRUE)] <- ""
Rnew <- as.data.frame(Rnew)
}
## remove lower triangle of correlation matrix
else if(removeTriangle[1]=="lower"){
Rnew <- as.matrix(Rnew)
Rnew[lower.tri(Rnew, diag = TRUE)] <- ""
Rnew <- as.data.frame(Rnew)
}
## remove last column and return the correlation matrix
Rnew <- cbind(Rnew[1:length(Rnew)-1])
if (result[1]=="none") return(Rnew)
else{
if(result[1]=="html") print(xtable(Rnew), type="html")
else print(xtable(Rnew), type="latex")
}
}
```

- Use
**cor()**function to compute**correlation matrix**. - Use
**lower.tri()**and**upper.tri()**functions to get the lower or upper part of the**correlation matrix** - Use
**xtable**R function to display a nice correlation matrix in latex or html format.

`This analysis was performed using R (ver. 3.3.2).`

**Correlation matrix** analysis is very useful to study **dependences** or **associations** between variables. This article provides a custom **R function**, **rquery.cormat**(), for **calculating** and **visualizing** easily a**correlation matrix**.The result is a list containing, the **correlation coefficient tables** and the **p-values** of the **correlations**. In the result, the variables are reordered according to the level of the **correlation** which can help to quickly identify the most associated variables. A graph is also generated to visualize the **correlation matrix** using a **correlogram** or a **heatmap**.

The **rquery.cormat** function requires the installation of **corrplot** package. Before proceeding, install it using he following R code :

`install.packages("corrplot")`

To use the **rquery.cormat** function, you can source it as follow :

`source("http://www.sthda.com/upload/rquery_cormat.r")`

The R code of **rquery.cormat** function is provided at the end of this document.

The `mtcars`

data is used in the following examples :

```
mydata <- mtcars[, c(1,3,4,5,6,7)]
head(mydata)
```

```
mpg disp hp drat wt qsec
Mazda RX4 21.0 160 110 3.90 2.620 16.46
Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02
Datsun 710 22.8 108 93 3.85 2.320 18.61
Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44
Hornet Sportabout 18.7 360 175 3.15 3.440 17.02
Valiant 18.1 225 105 2.76 3.460 20.22
```

`rquery.cormat(mydata)`

```
$r
hp disp wt qsec mpg drat
hp 1
disp 0.79 1
wt 0.66 0.89 1
qsec -0.71 -0.43 -0.17 1
mpg -0.78 -0.85 -0.87 0.42 1
drat -0.45 -0.71 -0.71 0.091 0.68 1
$p
hp disp wt qsec mpg drat
hp 0
disp 7.1e-08 0
wt 4.1e-05 1.2e-11 0
qsec 5.8e-06 0.013 0.34 0
mpg 1.8e-07 9.4e-10 1.3e-10 0.017 0
drat 0.01 5.3e-06 4.8e-06 0.62 1.8e-05 0
$sym
hp disp wt qsec mpg drat
hp 1
disp , 1
wt , + 1
qsec , . 1
mpg , + + . 1
drat . , , , 1
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
```

The result of **rquery.cormat** function is a list containing the following components :

**r**: The table of correlation coefficients**p**: Table of p-values corresponding to the significance levels of the correlations**sym**: A representation of the correlation matrix in which coefficients are replaced by symbols according to the strength of the dependence. For more description, see this article: Visualize correlation matrix using symnum function- In the generated graph, negative correlations are in blue and positive ones in red color.

Note that in the result above, only the lower triangle of the correlation matrix is shown by default. You can use the following R script to get the upper triangle or the full correlation matrix.

`rquery.cormat(mydata, type="upper")`

```
$r
hp disp wt qsec mpg drat
hp 1 0.79 0.66 -0.71 -0.78 -0.45
disp 1 0.89 -0.43 -0.85 -0.71
wt 1 -0.17 -0.87 -0.71
qsec 1 0.42 0.091
mpg 1 0.68
drat 1
$p
hp disp wt qsec mpg drat
hp 0 7.1e-08 4.1e-05 5.8e-06 1.8e-07 0.01
disp 0 1.2e-11 0.013 9.4e-10 5.3e-06
wt 0 0.34 1.3e-10 4.8e-06
qsec 0 0.017 0.62
mpg 0 1.8e-05
drat 0
$sym
hp disp wt qsec mpg drat
hp 1 , , , , .
disp 1 + . + ,
wt 1 + ,
qsec 1 .
mpg 1 ,
drat 1
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
```

`rquery.cormat(mydata, type="full")`

```
$r
hp disp wt qsec mpg drat
hp 1.00 0.79 0.66 -0.710 -0.78 -0.450
disp 0.79 1.00 0.89 -0.430 -0.85 -0.710
wt 0.66 0.89 1.00 -0.170 -0.87 -0.710
qsec -0.71 -0.43 -0.17 1.000 0.42 0.091
mpg -0.78 -0.85 -0.87 0.420 1.00 0.680
drat -0.45 -0.71 -0.71 0.091 0.68 1.000
$p
hp disp wt qsec mpg drat
hp 0.0e+00 7.1e-08 4.1e-05 5.8e-06 1.8e-07 1.0e-02
disp 7.1e-08 0.0e+00 1.2e-11 1.3e-02 9.4e-10 5.3e-06
wt 4.1e-05 1.2e-11 0.0e+00 3.4e-01 1.3e-10 4.8e-06
qsec 5.8e-06 1.3e-02 3.4e-01 0.0e+00 1.7e-02 6.2e-01
mpg 1.8e-07 9.4e-10 1.3e-10 1.7e-02 0.0e+00 1.8e-05
drat 1.0e-02 5.3e-06 4.8e-06 6.2e-01 1.8e-05 0.0e+00
$sym
hp disp wt qsec mpg drat
hp 1
disp , 1
wt , + 1
qsec , . 1
mpg , + + . 1
drat . , , , 1
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
```

```
col<- colorRampPalette(c("blue", "white", "red"))(20)
cormat<-rquery.cormat(mydata, type="full", col=col)
```

`cormat<-rquery.cormat(mydata, graphType="heatmap")`

To calculate the **correlation matrix** without plotting the **graph**, you can use the following **R** script :

`rquery.cormat(mydata, graph=FALSE)`

The **R code** below can be used to format the **correlation matrix** into a table of four columns containing :

- The names of rows/columns
- The correlation coefficients
- The p-values

For this end, use the argument : **type=“flatten”**

`rquery.cormat(mydata, type="flatten", graph=FALSE)`

```
$r
row column cor p
1 hp disp 0.790 7.1e-08
2 hp wt 0.660 4.1e-05
3 disp wt 0.890 1.2e-11
4 hp qsec -0.710 5.8e-06
5 disp qsec -0.430 1.3e-02
6 wt qsec -0.170 3.4e-01
7 hp mpg -0.780 1.8e-07
8 disp mpg -0.850 9.4e-10
9 wt mpg -0.870 1.3e-10
10 qsec mpg 0.420 1.7e-02
11 hp drat -0.450 1.0e-02
12 disp drat -0.710 5.3e-06
13 wt drat -0.710 4.8e-06
14 qsec drat 0.091 6.2e-01
15 mpg drat 0.680 1.8e-05
$p
NULL
$sym
NULL
```

**A simplified format of the function is** :

```
rquery.cormat(x, type=c('lower', 'upper', 'full', 'flatten'),
graph=TRUE, graphType=c("correlogram", "heatmap"),
col=NULL, ...)
```

**Description of the arguments**:

**x**:**matrix**of data values**type**: Possible values are “lower” (default), “upper”, “full” or “flatten”. It displays the lower or upper triangular of the matrix, full or flatten matrix.**graph**: if TRUE, a correlogram or heatmap is generated to visualize the correlation matrix.**graphType**: Type of graphs. Possible values are “correlogram” or “heatmap”.**col**: colors to use for the correlogram or the heatmap.**…**: Further arguments to be passed to**cor()**or**cor.test()**function.

**R code of the rquery.cormat function**:

```
#+++++++++++++++++++++++++
# Computing of correlation matrix
#+++++++++++++++++++++++++
# Required package : corrplot
# x : matrix
# type: possible values are "lower" (default), "upper", "full" or "flatten";
#display lower or upper triangular of the matrix, full or flatten matrix.
# graph : if TRUE, a correlogram or heatmap is plotted
# graphType : possible values are "correlogram" or "heatmap"
# col: colors to use for the correlogram
# ... : Further arguments to be passed to cor or cor.test function
# Result is a list including the following components :
# r : correlation matrix, p : p-values
# sym : Symbolic number coding of the correlation matrix
rquery.cormat<-function(x,
type=c('lower', 'upper', 'full', 'flatten'),
graph=TRUE,
graphType=c("correlogram", "heatmap"),
col=NULL, ...)
{
library(corrplot)
# Helper functions
#+++++++++++++++++
# Compute the matrix of correlation p-values
cor.pmat <- function(x, ...) {
mat <- as.matrix(x)
n <- ncol(mat)
p.mat<- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# Get lower triangle of the matrix
getLower.tri<-function(mat){
upper<-mat
upper[upper.tri(mat)]<-""
mat<-as.data.frame(upper)
mat
}
# Get upper triangle of the matrix
getUpper.tri<-function(mat){
lt<-mat
lt[lower.tri(mat)]<-""
mat<-as.data.frame(lt)
mat
}
# Get flatten matrix
flattenCorrMatrix <- function(cormat, pmat) {
ut <- upper.tri(cormat)
data.frame(
row = rownames(cormat)[row(cormat)[ut]],
column = rownames(cormat)[col(cormat)[ut]],
cor =(cormat)[ut],
p = pmat[ut]
)
}
# Define color
if (is.null(col)) {
col <- colorRampPalette(
c("#67001F", "#B2182B", "#D6604D", "#F4A582",
"#FDDBC7", "#FFFFFF", "#D1E5F0", "#92C5DE",
"#4393C3", "#2166AC", "#053061"))(200)
col<-rev(col)
}
# Correlation matrix
cormat<-signif(cor(x, use = "complete.obs", ...),2)
pmat<-signif(cor.pmat(x, ...),2)
# Reorder correlation matrix
ord<-corrMatOrder(cormat, order="hclust")
cormat<-cormat[ord, ord]
pmat<-pmat[ord, ord]
# Replace correlation coeff by symbols
sym<-symnum(cormat, abbr.colnames=FALSE)
# Correlogram
if(graph & graphType[1]=="correlogram"){
corrplot(cormat, type=ifelse(type[1]=="flatten", "lower", type[1]),
tl.col="black", tl.srt=45,col=col,...)
}
else if(graphType[1]=="heatmap")
heatmap(cormat, col=col, symm=TRUE)
# Get lower/upper triangle
if(type[1]=="lower"){
cormat<-getLower.tri(cormat)
pmat<-getLower.tri(pmat)
}
else if(type[1]=="upper"){
cormat<-getUpper.tri(cormat)
pmat<-getUpper.tri(pmat)
sym=t(sym)
}
else if(type[1]=="flatten"){
cormat<-flattenCorrMatrix(cormat, pmat)
pmat=NULL
sym=NULL
}
list(r=cormat, p=pmat, sym=sym)
}
```

This analysis has been performed using R (ver. 3.2.4).

For instance, if we are interested to know whether there is a relationship between the heights of fathers and sons, a **correlation coefficient** can be calculated to answer this question.

If there is no relationship between the two variables (father and son heights), the average height of son should be the same regardless of the height of the fathers and vice versa.

Here, we’ll describe the different correlation methods and we’ll provide pratical examples using **R** software.

We’ll use the **ggpubr** R package for an easy ggplot2-based data visualization

- Install the latest version from GitHub as follow (recommended):

```
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
```

- Or, install from CRAN as follow:

`install.packages("ggpubr")`

- Load ggpubr as follow:

`library("ggpubr")`

There are different methods to perform **correlation analysis**:

**Pearson correlation (r)**, which measures a linear dependence between two variables (x and y). It’s also known as a**parametric correlation**test because it depends to the distribution of the data. It can be used only when x and y are from normal distribution. The plot of y = f(x) is named the**linear regression**curve.**Kendall tau**and**Spearman rho**, which are rank-based correlation coefficients (non-parametric)

The most commonly used method is the **Pearson correlation** method.

In the formula below,

**x**and**y**are two vectors of length**n**- \(m_x\) and \(m_y\) corresponds to the means of x and y, respectively.

\[ r = \frac{\sum{(x-m_x)(y-m_y)}}{\sqrt{\sum{(x-m_x)^2}\sum{(y-m_y)^2}}} \]

\(m_x\) and \(m_y\) are the means of x and y variables.

The p-value (significance level) of the correlation can be determined :

by using the correlation coefficient table for the degrees of freedom : \(df = n-2\), where \(n\) is the number of observation in x and y variables.

or by calculating the

**t value**as follow:

\[ t = \frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \]

In the case 2) the corresponding p-value is determined using **t distribution table** for \(df = n-2\)

If the p-value is < 5%, then the correlation between x and y is significant.

The **Spearman correlation** method computes the correlation between the rank of x and the rank of y variables.

\[ rho = \frac{\sum(x' - m_{x'})(y'_i - m_{y'})}{\sqrt{\sum(x' - m_{x'})^2 \sum(y' - m_{y'})^2}} \]

Where \(x' = rank(x_)\) and \(y' = rank(y)\).

The **Kendall correlation** method measures the correspondence between the ranking of x and y variables. The total number of possible pairings of x with y observations is \(n(n-1)/2\), where n is the size of x and y.

The procedure is as follow:

Begin by ordering the pairs by the x values. If x and y are correlated, then they would have the same relative rank orders.

Now, for each \(y_i\), count the number of \(y_j > y_i\) (

**concordant pairs (c)**) and the number of \(y_j < y_i\) (**discordant pairs (d)**).

**Kendall correlation distance** is defined as follow:

\[ tau = \frac{n_c - n_d}{\frac{1}{2}n(n-1)} \]

Where,

- \(n_c\): total number of concordant pairs
- \(n_d\): total number of discordant pairs
- \(n\): size of x and y

Correlation coefficient can be computed using the functions **cor()** or **cor.test()**:

**cor()**computes the**correlation coefficient****cor.test()**test for association/correlation between paired samples. It returns both the**correlation coefficient**and the**significance level**(or p-value) of the correlation .

The simplified formats are:

```
cor(x, y, method = c("pearson", "kendall", "spearman"))
cor.test(x, y, method=c("pearson", "kendall", "spearman"))
```

**x, y**: numeric vectors with the same length**method**: correlation method

If your data contain missing values, use the following R code to handle missing values by case-wise deletion.

`cor(x, y, method = "pearson", use = "complete.obs")`

**Prepare your data**as specified here: Best practices for preparing your data set for R**Save your data**in an external .txt tab or .csv files**Import your data into R**as follow:

```
# If .txt tab file, use this
my_data <- read.delim(file.choose())
# Or, if .csv file, use this
my_data <- read.csv(file.choose())
```

Here, we’ll use the built-in R data set *mtcars* as an example.

The R code below computes the correlation between mpg and wt variables in mtcars data set:

```
my_data <- mtcars
head(my_data, 6)
```

```
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
```

We want to compute the correlation between mpg and wt variables.

To use R base graphs, click this link: scatter plot - R base graphs. Here, we’ll use the **ggpubr** R package.

```
library("ggpubr")
ggscatter(my_data, x = "mpg", y = "wt",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Miles/(US) gallon", ylab = "Weight (1000 lbs)")
```

**Is the covariation linear**? Yes, form the plot above, the relationship is linear. In the situation where the scatter plots show curved patterns, we are dealing with nonlinear association between the two variables.**Are the data from each of the 2 variables (x, y) follow a normal distribution**?- Use Shapiro-Wilk normality test –> R function:
**shapiro.test**() - and look at the normality plot —> R function:
**ggpubr::ggqqplot**()

- Use Shapiro-Wilk normality test –> R function:

**Shapiro-Wilk test**can be performed as follow:- Null hypothesis: the data are normally distributed
- Alternative hypothesis: the data are not normally distributed

```
# Shapiro-Wilk normality test for mpg
shapiro.test(my_data$mpg) # => p = 0.1229
# Shapiro-Wilk normality test for wt
shapiro.test(my_data$wt) # => p = 0.09
```

From the output, the two p-values are greater than the significance level 0.05 implying that the distribution of the data are not significantly different from normal distribution. In other words, we can assume the normality.

**Visual inspection**of the data normality using**Q-Q plots**(quantile-quantile plots). Q-Q plot draws the correlation between a given sample and the normal distribution.

```
library("ggpubr")
# mpg
ggqqplot(my_data$mpg, ylab = "MPG")
# wt
ggqqplot(my_data$wt, ylab = "WT")
```

From the normality plots, we conclude that both populations may come from normal distributions.

Note that, if the data are not normally distributed, it’s recommended to use the non-parametric correlation, including Spearman and Kendall rank-based correlation tests.

Correlation test between mpg and wt variables:

```
res <- cor.test(my_data$wt, my_data$mpg,
method = "pearson")
res
```

```
Pearson's product-moment correlation
data: my_data$wt and my_data$mpg
t = -9.559, df = 30, p-value = 1.294e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.9338264 -0.7440872
sample estimates:
cor
-0.8676594
```

In the result above :

**t**is the**t-test statistic**value (t = -9.559),**df**is the degrees of freedom (df= 30),**p-value**is the significance level of the**t-test**(p-value = 1.29410^{-10}).**conf.int**is the**confidence interval**of the correlation coefficient at 95% (conf.int = [-0.9338, -0.7441]);**sample estimates**is the correlation coefficient (Cor.coeff = -0.87).

The **p-value** of the test is 1.29410^{-10}, which is less than the significance level alpha = 0.05. We can conclude that wt and mpg are significantly correlated with a correlation coefficient of -0.87 and p-value of 1.29410^{-10} .

The function **cor.test**() returns a list containing the following components:

**p.value**: the p-value of the test**estimate**: the correlation coefficient

```
# Extract the p.value
res$p.value
```

`[1] 1.293959e-10`

```
# Extract the correlation coefficient
res$estimate
```

```
cor
-0.8676594
```

The **Kendall rank correlation coefficient** or **Kendall’s tau** statistic is used to estimate a rank-based measure of association. This test may be used if the data do not necessarily come from a bivariate normal distribution.

```
res2 <- cor.test(my_data$wt, my_data$mpg, method="kendall")
res2
```

```
Kendall's rank correlation tau
data: my_data$wt and my_data$mpg
z = -5.7981, p-value = 6.706e-09
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
-0.7278321
```

**tau** is the **Kendall correlation coefficient**.

The **correlation coefficient** between x and y are -0.7278 and the *p-value* is 6.70610^{-9}.

Spearman’s **rho** statistic is also used to estimate a rank-based measure of association. This test may be used if the data do not come from a bivariate normal distribution.

```
res2 <-cor.test(my_data$wt, my_data$mpg, method = "spearman")
res2
```

```
Spearman's rank correlation rho
data: my_data$wt and my_data$mpg
S = 10292, p-value = 1.488e-11
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.886422
```

**rho** is the **Spearman’s correlation coefficient**.

The **correlation coefficient** between x and y are -0.8864 and the *p-value* is 1.48810^{-11}.

Correlation coefficient is comprised between **-1** and **1**:

**-1**indicates a strong**negative correlation**: this means that every time**x increases**,**y decreases**(left panel figure)**0**means that there is no**association**between the two variables (x and y) (middle panel figure)**1**indicates a strong**positive correlation**: this means that**y increases**with**x**(right panel figure)

You can compute correlation test between two variables, online, without any installation by clicking the following link:

- Use the function
**cor.test**(x,y) to analyze the correlation coefficient between two variables and to get significance level of the correlation. - Three possible correlation methods using the function
**cor.test**(x,y): pearson, kendall, spearman

This analysis has been performed using **R software** (ver. 3.2.4).

Previously, we described the essentials of R programming and provided quick start guides for importing data into **R**. Additionally, we described how to compute descriptive or summary statistics using R software.

This chapter contains articles for computing and visualizing **correlation analyses** in R. Recall that, **correlation analysis** is used to investigate the association between two or more variables. A simple example, is to evaluate whether there is a link between maternal age and child’s weight at birth.

Brief outline:

- What is correlation test?
- Methods for correlation analyses
- Correlation formula
- Pearson correlation formula
- Spearman correlation formula
- Kendall correlation formula

- Compute correlation in R
- R functions
- Import your data into R
- Visualize your data using scatter plots
- Preliminary test to check the test assumptions
- Pearson correlation test
- Kendall rank correlation test
- Spearman rank correlation coefficient

- Interpret correlation coefficient

Read more: —> Correlation Test Between Two Variables in R.

**Correlation matrix** is used to analyze the correlation between multiple variables at the same time.

Brief outline:

- What is correlation matrix?
- Compute correlation matrix in R
- R functions
- Compute correlation matrix
- Correlation matrix with significance levels (p-value)
- A simple function to format the correlation matrix
- Visualize correlation matrix
- Use symnum() function: Symbolic number coding
- Use corrplot() function: Draw a correlogram
- Use chart.Correlation(): Draw scatter plots
- Use heatmap()

Read more: —> Correlation Matrix: Analyze, Format and Visualize.

**Correlogram** is a **graph of correlation matrix**. Useful to highlight the most correlated variables in a data table. In this plot, **correlation coefficients** are colored according to the value. **Correlation matrix** can be also reordered according to the degree of association between variables.

Brief outline:

- Install R corrplot package
- Data for correlation analysis
- Computing correlation matrix
- Correlogram : Visualizing the correlation matrix
- Visualization methods
- Types of correlogram layout
- Reordering the correlation matrix
- Changing the color of the correlogram
- Changing the color and the rotation of text labels
- Combining correlogram with the significance test
- Customize the correlogram

```
library(corrplot)
library(RColorBrewer)
M <-cor(mtcars)
corrplot(M, type="upper", order="hclust",
col=brewer.pal(n=8, name="RdYlBu"))
```

Read more: —> Visualize Correlation Matrix using Correlogram.

The aim of this article is to show you how to get the **lower and the upper triangular part of a correlation matrix**. We will use also **xtable R package** to display a nice **correlation table**.

Brief outline:

- Correlation matrix analysis
- Lower and upper triangular part of a correlation matrix
- Use xtable R package to display nice correlation table in html format
- Combine matrix of correlation coefficients and significance levels

Read more: —> Elegant correlation table using xtable R package.

The goal of this article is to provide you a custom **R function**, named **rquery.cormat**(), for **calculating** and **visualizing** easily a **correlation matrix** in a single line R code.

Brief outline:

- Computing the correlation matrix using rquery.cormat()
- Upper triangle of the correlation matrix
- Full correlation matrix
- Change the colors of the correlogram
- Draw a heatmap

- Format the correlation table
- Description of rquery.cormat() function

```
source("http://www.sthda.com/upload/rquery_cormat.r")
mydata <- mtcars[, c(1,3,4,5,6,7)]
require("corrplot")
rquery.cormat(mydata)
```

```
$r
hp disp wt qsec mpg drat
hp 1
disp 0.79 1
wt 0.66 0.89 1
qsec -0.71 -0.43 -0.17 1
mpg -0.78 -0.85 -0.87 0.42 1
drat -0.45 -0.71 -0.71 0.091 0.68 1
$p
hp disp wt qsec mpg drat
hp 0
disp 7.1e-08 0
wt 4.1e-05 1.2e-11 0
qsec 5.8e-06 0.013 0.34 0
mpg 1.8e-07 9.4e-10 1.3e-10 0.017 0
drat 0.01 5.3e-06 4.8e-06 0.62 1.8e-05 0
$sym
hp disp wt qsec mpg drat
hp 1
disp , 1
wt , + 1
qsec , . 1
mpg , + + . 1
drat . , , , 1
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
```

Read more: —> Correlation Matrix : An R Function to Do All You Need.

This analysis has been performed using **R statistical software** (ver. 3.2.4).

- What is correlation matrix?
- Compute correlation matrix in R
- Online software to analyze and visualize a correlation matrix
- Summarry
- Infos

Previously, we described how to perform **correlation test** between two variables. In this article, you’ll learn how to compute a **correlation matrix**, which is used to investigate the dependence between multiple variables at the same time. The result is a table containing the **correlation coefficients** between each variable and the others.

There are different methods for **correlation analysis** : **Pearson parametric correlation test**, **Spearman** and **Kendall** rank-based **correlation analysis**. These methods are discussed in the next sections.

The aim of this **R tutorial** is to show you how to compute and visualize a **correlation matrix in R**. We provide also an online software for computing and visualizing a correlation matrix.

As you may know, The **R** function **cor()** can be used to compute a **correlation matrix**. A simplified format of the function is :

`cor(x, method = c("pearson", "kendall", "spearman"))`

**x**: numeric matrix or a data frame.**method**: indicates the**correlation coefficient**to be computed. The default is pearson correlation coefficient which measures the linear**dependence**between two variables. kendall and spearman correlation methods are non-parametric**rank-based correlation test**.

If your data contain missing values, use the following R code to handle missing values by case-wise deletion.

`cor(x, method = "pearson", use = "complete.obs")`

**Prepare your data**as specified here: Best practices for preparing your data set for R**Save your data**in an external .txt tab or .csv files**Import your data into R**as follow:

```
# If .txt tab file, use this
my_data <- read.delim(file.choose())
# Or, if .csv file, use this
my_data <- read.csv(file.choose())
```

Here, we’ll use a data derived from the built-in R data set *mtcars* as an example:

```
# Load data
data("mtcars")
my_data <- mtcars[, c(1,3,4,5,6,7)]
# print the first 6 rows
head(my_data, 6)
```

```
mpg disp hp drat wt qsec
Mazda RX4 21.0 160 110 3.90 2.620 16.46
Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02
Datsun 710 22.8 108 93 3.85 2.320 18.61
Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44
Hornet Sportabout 18.7 360 175 3.15 3.440 17.02
Valiant 18.1 225 105 2.76 3.460 20.22
```

```
res <- cor(my_data)
round(res, 2)
```

```
mpg disp hp drat wt qsec
mpg 1.00 -0.85 -0.78 0.68 -0.87 0.42
disp -0.85 1.00 0.79 -0.71 0.89 -0.43
hp -0.78 0.79 1.00 -0.45 0.66 -0.71
drat 0.68 -0.71 -0.45 1.00 -0.71 0.09
wt -0.87 0.89 0.66 -0.71 1.00 -0.17
qsec 0.42 -0.43 -0.71 0.09 -0.17 1.00
```

In the table above **correlations coefficients** between the possible pairs of variables are shown.

Note that, if your data contain missing values, use the following R code to handle missing values by case-wise deletion.

`cor(my_data, use = "complete.obs")`

Unfortunately, the function **cor()** returns only the **correlation coefficients** between variables. In the next section, we will use **Hmisc R package** to calculate the **correlation p-values**.

The function **rcorr()** [in **Hmisc** package] can be used to compute the **significance levels** for **pearson** and **spearman correlations**. It returns both the correlation coefficients and the p-value of the correlation for all possible pairs of columns in the data table.

- Simplified format:

`rcorr(x, type = c("pearson","spearman"))`

**x** should be a matrix. The **correlation type** can be either **pearson** or **spearman**.

- Install
**Hmisc**package:

`install.packages("Hmisc")`

- Use
**rcorr**() function

```
library("Hmisc")
res2 <- rcorr(as.matrix(my_data))
res2
```

```
mpg disp hp drat wt qsec
mpg 1.00 -0.85 -0.78 0.68 -0.87 0.42
disp -0.85 1.00 0.79 -0.71 0.89 -0.43
hp -0.78 0.79 1.00 -0.45 0.66 -0.71
drat 0.68 -0.71 -0.45 1.00 -0.71 0.09
wt -0.87 0.89 0.66 -0.71 1.00 -0.17
qsec 0.42 -0.43 -0.71 0.09 -0.17 1.00
n= 32
P
mpg disp hp drat wt qsec
mpg 0.0000 0.0000 0.0000 0.0000 0.0171
disp 0.0000 0.0000 0.0000 0.0000 0.0131
hp 0.0000 0.0000 0.0100 0.0000 0.0000
drat 0.0000 0.0000 0.0100 0.0000 0.6196
wt 0.0000 0.0000 0.0000 0.0000 0.3389
qsec 0.0171 0.0131 0.0000 0.6196 0.3389
```

The output of the function **rcorr()** is a list containing the following elements :
- **r** : the **correlation matrix**
- **n** : the matrix of the number of observations used in analyzing each pair of variables
- **P** : the **p-values** corresponding to the **significance levels** of **correlations**.

If you want to extract the p-values or the correlation coefficients from the output, use this:

```
# Extract the correlation coefficients
res2$r
# Extract p-values
res2$P
```

This section provides a simple function for formatting a **correlation matrix** into a table with 4 columns containing :

- Column 1 : row names (variable 1 for the correlation test)
- Column 2 : column names (variable 2 for the correlation test)
- Column 3 : the correlation coefficients
- Column 4 : the p-values of the correlations

The custom function below can be used :

```
# ++++++++++++++++++++++++++++
# flattenCorrMatrix
# ++++++++++++++++++++++++++++
# cormat : matrix of the correlation coefficients
# pmat : matrix of the correlation p-values
flattenCorrMatrix <- function(cormat, pmat) {
ut <- upper.tri(cormat)
data.frame(
row = rownames(cormat)[row(cormat)[ut]],
column = rownames(cormat)[col(cormat)[ut]],
cor =(cormat)[ut],
p = pmat[ut]
)
}
```

Example of usage :

```
library(Hmisc)
res2<-rcorr(as.matrix(mtcars[,1:7]))
flattenCorrMatrix(res2$r, res2$P)
```

```
row column cor p
1 mpg cyl -0.85216194 6.112697e-10
2 mpg disp -0.84755135 9.380354e-10
3 cyl disp 0.90203285 1.803002e-12
4 mpg hp -0.77616835 1.787838e-07
5 cyl hp 0.83244747 3.477856e-09
6 disp hp 0.79094857 7.142686e-08
7 mpg drat 0.68117189 1.776241e-05
8 cyl drat -0.69993812 8.244635e-06
9 disp drat -0.71021390 5.282028e-06
10 hp drat -0.44875914 9.988768e-03
11 mpg wt -0.86765939 1.293956e-10
12 cyl wt 0.78249580 1.217567e-07
13 disp wt 0.88797992 1.222311e-11
14 hp wt 0.65874785 4.145833e-05
15 drat wt -0.71244061 4.784268e-06
16 mpg qsec 0.41868404 1.708199e-02
17 cyl qsec -0.59124213 3.660527e-04
18 disp qsec -0.43369791 1.314403e-02
19 hp qsec -0.70822340 5.766250e-06
20 drat qsec 0.09120482 6.195823e-01
21 wt qsec -0.17471591 3.388682e-01
```

There are different ways for visualizing a **correlation matrix** in R software :

- symnum() function
- corrplot() function to plot a
**correlogram** - scatter plots
- heatmap

The R function symnum() replaces **correlation coefficients** by symbols according to the level of the correlation. It takes the **correlation matrix** as an argument :

**Simplified format**:

```
symnum(x, cutpoints = c(0.3, 0.6, 0.8, 0.9, 0.95),
symbols = c(" ", ".", ",", "+", "*", "B"),
abbr.colnames = TRUE)
```

**x**: the correlation matrix to visualize**cutpoints**:**correlation coefficient**cutpoints. The**correlation coefficients**between 0 and 0.3 are replaced by a space (" “);**correlation coefficients**between 0.3 and 0.6 are replace by”.“; etc …**symbols**: the symbols to use.**abbr.colnames**: logical value. If TRUE, colnames are abbreviated.

**Example of usage**:

`symnum(res, abbr.colnames = FALSE)`

```
mpg disp hp drat wt qsec
mpg 1
disp + 1
hp , , 1
drat , , . 1
wt + + , , 1
qsec . . , 1
attr(,"legend")
[1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
```

As indicated in the legend, the **correlation coefficients** between **0** and **0.3** are replaced by a space (" “); **correlation coefficients** between 0.3 and 0.6 are replace by”.“; etc …

The function **corrplot**(), in the package of the same name, creates a **graphical** display of a correlation matrix, highlighting the most correlated variables in a data table.

In this plot, correlation coefficients are colored according to the value. Correlation matrix can be also reordered according to the degree of association between variables.

**Install corrplot**:

`install.packages("corrplot")`

**Use corrplot**() to create a**correlogram**:

The function **corrplot()** takes the **correlation matrix** as the first argument. The second argument (type=“upper”) is used to display only the upper triangular of the **correlation matrix**.

```
library(corrplot)
corrplot(res, type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45)
```

**Positive correlations** are displayed in blue and **negative correlations** in red color. Color intensity and the size of the circle are proportional to the **correlation coefficients**. In the right side of the **correlogram**, the legend color shows the **correlation coefficients** and the corresponding colors.

- The
**correlation matrix**is reordered according to the**correlation coefficient**using**“hclust”**method. **tl.col**(for text label color) and**tl.srt**(for text label string rotation) are used to change text colors and rotations.- Possible values for the argument
**type**are : “upper”, “lower”, “full”

Read more : visualize a correlation matrix using corrplot.

It’s also possible to **combine correlogram with the significance test**. We’ll use the result *res.cor2* generated in the previous section with **rcorr**() function [in **Hmisc** package]:

```
# Insignificant correlation are crossed
corrplot(res2$r, type="upper", order="hclust",
p.mat = res2$P, sig.level = 0.01, insig = "blank")
# Insignificant correlations are leaved blank
corrplot(res2$r, type="upper", order="hclust",
p.mat = res2$P, sig.level = 0.01, insig = "blank")
```

In the above plot, correlations with p-value > 0.01 are considered as insignificant. In this case the correlation coefficient values are leaved blank or crosses are added.

The function *chart.Correlation()*[ in the package **PerformanceAnalytics**], can be used to display a chart of a correlation matrix.

**Install PerformanceAnalytics**:

`install.packages("PerformanceAnalytics")`

**Use chart.Correlation()**:

```
library("PerformanceAnalytics")
my_data <- mtcars[, c(1,3,4,5,6,7)]
chart.Correlation(my_data, histogram=TRUE, pch=19)
```

In the above plot:

- The distribution of each variable is shown on the diagonal.
- On the bottom of the diagonal : the bivariate scatter plots with a fitted line are displayed
- On the top of the diagonal : the value of the correlation plus the significance level as stars
- Each significance level is associated to a symbol : p-values(0, 0.001, 0.01, 0.05, 0.1, 1) <=> symbols(“***”, “**”, “*”, “.”, " “)

```
# Get some colors
col<- colorRampPalette(c("blue", "white", "red"))(20)
heatmap(x = res, col = col, symm = TRUE)
```

**x**: the correlation matrix to be plotted**col**: color palettes**symm**: logical indicating if x should be treated symmetrically; can only be true when x is a square matrix.

A web application for computing and visualizing a correlation matrix is available here without any installation : online software for correlation matrix.

Take me to the correlation matrix calculator

The software can be used as follow :

**Go to the web application**: correlation matrix calculator**Upload a .txt tab or a CSV file**containing your data (columns are variables). The supported file formats are described here. You can use the demo data available on the calculator web page by clicking on the corresponding link.- After uploading, an
**overview of a part of your file**is shown to check that the data are correctly imported. If the data are not correctly displayed, please make sure that the format of your file is OK here. **Click on the ‘Analyze’ button**and**select at least 2 variables**to calculate the correlation matrix. By default, all variables are selected.**Please, deselect the columns containing texts**. You can also**select the correlation methods**(Pearson, Spearman or Kendall). Default is the Pearson method.- Click the
**OK**button - Results : the output of the software includes :
- The correlation matrix
- The visualization of the correlation matrix as a correlogram
- A web link to export the results as .txt tab file

Note that, you can specify the alternative hypothesis to use for the correlation test by clicking on the button “Advanced options”.

Choose one of the 3 options :

- Two-sided
- Correlation < 0 for “less”
- Correlation > 0 for “greater”

- Use
**cor**() function for simple**correlation analysis** - Use
**rcorr**() function from**Hmisc**package to compute**matrix of correlation coefficients**and**matrix of p-values**in single step. - Use
**symnum**(),**corrplot**()[from**corrplot**package],**chart.Correlation**() [from**PerformanceAnalytics**package], or**heatmap**() functions to visualize a**correlation matrix**.

This analysis has been performed using **R software** (ver. 3.2.4).

**Correlation test** is used to study the dependence between two or more variables. The goal of this article is to describe briefly the different correlation methods and to provide an online **correlation coefficient calculator**.

There are :

- the
**Pearson correlation**method which is a parametric correlation test as it depends on the distribution of the data. This method measures the linear dependence between two variables. - the
**Kendall**and**Spearman**rank-based correlation analysis (non-parametric methods). These two methods are recommended if the data do not come from a bivariate normal distribution.

Pearson correlation test is the most commonly used method to calculate the **correlation coefficient** between two variables. The formula is shown in the next section.

The Pearson correlation coefficient between two variables, x and y, can be calculated using the formula below :

\[ r = \frac{\sum{(x-m_x)(y-m_y)}}{\sqrt{\sum{(x-mx)^2}\sum{(y-my)^2}}} \]

\(m_x\) and \(m_y\) are the means of x and y variables.

The significance level of the correlation can be determined by reading the table of critical values for the degrees of freedom : \(df = n-2\)

Read more : correlation formula

Correlation coefficient can be easily computed in R using the function **cor()** or **cor.test()**. The simplified formats are :

```
cor(x, y, method = c("pearson", "kendall", "spearman"))
cor.test(x, y, method=c("pearson", "kendall", "spearman"))
```

In the R code above x and y are two numeric vectors with the same length.

The main difference between the two correlation functions is that :

- the function
**cor()**returns only the**correlation coefficient** - the function
**cor.test()**returns both the**correlation coefficient**and the**significance level**(or p-value) of the correlation

These functions can be used as follow :

```
# Define two numeric vectors
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
# Pearson correlation coefficient between x and y
cor(x, y)
```

`[1] 0.5712`

```
# Pearson correlation test
cor.test(x, y)
```

```
Pearson's product-moment correlation
data: x and y
t = 1.841, df = 7, p-value = 0.1082
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1497 0.8956
sample estimates:
cor
0.5712
```

The **correlation coefficient** is 0.5712 and the *p-value* is 0.1082.

Read more : correlation analysis using R software

Spearman and Kendall non-parametric correlation methods can be used as follow :

```
# Spearman non-parametric correlation test
cor.test(x, y, method="spearman")
# Kendall non-parametric correlation test
cor.test(x, y, method="kendall")
```

The value of the correlation coefficient is comprised between **-1** and **1**

**-1**corresponds to a strong**negative correlation**: this means that every time**x increases**,**y decreases**(left panel figure)**0**means that there is no**association**between the two variables (x and y) (middle panel figure)**1**corresponds to a strong**positive correlation**: this means that**y increases**with**x**(right panel figure)

A web application, for computing the different correlation coefficients, is available at this link : correlation coefficient calculator.

It can be used online without any installation to calculate Pearson, Kendall or Spearman correlation coefficient.

Go to the correlation coefficient calculator

The correlation coefficient calculator can be used as follow :

**Go to the application**available at this link : correlation coefficient calculator**Copy and paste your data**from Excel to the calculator. You can use the demo data available in the calculator web page by clicking on the link.**Select the correlation methods**(Pearson, Spearman or Kendall). Default is the Pearson method.- Click the
**OK**button

Note that, you can specify the alternative hypothesis to use for the correlation test by clicking on the button “Advanced options”.

Choose one of the 3 options :

- Two-sided
- Correlation < 0 for “less”
- Correlation > 0 for “greater”

**Correlation coefficient** is a quantity that measures the strength of the **association** (or **dependence**) between two variables (x and y). For example if we are interested to know whether there is a **relationship** between the heights of fathers and son, a **correlation coefficient** can be calculated.

There are different correlation coefficients :

**Pearson r**: linear dependence**Kendall tau**: rank-based correlation coefficient**Spearman rho**: rank-base correlation coefficient

**Pearson correlation** is the most used one.

**correlation formula** is described here.

The value of **correlation coefficient** can be **negative **or **positive** and it is comprised between **-1** and **1** (see the plots below).

**-1**means strong**negative correlation**: In this case y decreases when x increases (left panel figure)**0**means that there is no**relationship**between the two variables (x and y) (middle panel figure)**1**means strong**positive correlation**: In this case y increases when y increases (right panel figure)

The top 3 online

It can be used to compute

Use it here!

It computes only pearson correlation coefficient.

Use it here

Demo]]>

**Correlation** is very helpful to investigate the **dependence** between two or more variables. As an example we are interested to know whether there is an association between the weights of fathers and son. **correlation coefficient** can be calculated to answer this question.

If there is no relationship between the two variables (father and son weights), the average weight of son should be the same regardless of the weight of the fathers and vice versa.

There are different methods to perform **correlation analysis** : **Pearson**, **Kendall** and **Spearman** correlation tests.

The most commonly used is **Pearson correlation**. The aim of this article is to describe Pearson **correlation formula**.

**Pearson correlation** measures a linear dependence between two variables (x and y). It’s also known as a **parametric correlation** test because it depends to the distribution of the data. The plot of y = f(x) is named **linear regression** curve.

The **pearson correlation** formula is :

\[ r = \frac{\sum{(x-m_x)(y-m_y)}}{\sqrt{\sum{(x-mx)^2}\sum{(y-my)^2}}} \]

\(m_x\) and \(m_y\) are the means of x and y variables.

the p-value (significance level) of the correlation can be determined :

by using the correlation coefficient table for the degrees of freedom : \(df = n-2\)

or by calculating the

**t value**: \[ t=\frac{r}{\sqrt{1-r^2}}\sqrt{n-2} \]

In this case the corresponding p-value is determined using **t distribution table** for \(df = n-2\)

If the p-value is less than 5%, then the correlation is significant.

Correlation coefficient is between -1 (strong **negative correlation**) and 1 (strong **positive correlation**)

Note that online **correlation coefficient calculator** is also available by following this links : **Correlation coefficient calculator**.