## Correlation plots in R

*Author: Lenka Fiřtová*

This article describes how to visualize computed correlation matrices in a clear, easily presentable way.

In this article, you can read how to compute correlation in R.

## Initial calculations

In this article we are going to use the *corrplot* package, which allows us to create nice and understandable visualizations of correlation matrices. We can install the package using the *install.packages* command (the name of the package needs to be wrapped in quotation marks). Then we can load the package using the *library *command (we need to use the *library *command every time we start *R *if we are using any packages, but the installation needs to be done only once).

> install.packages("corrplot") > library(corrplot)

We are going to work with the *mtcars *dataset, which contains information on 10 aspects of 32 cars. We can access the description of the data using the following command:

> ?mtcars

Let us take a look at the first few rows:

head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

The *corrplot* package requires a computed correlation matrix. Let’s first compute the correlation matrix of the variables in the *mtcars *dataset. Let’s name this correlation matrix *k*.

> k = cor(mtcars)

To visualize the correlation matrix, we use the *corrplot* function. There is only one necessary argument, the correlation matrix, and then many optional arguments. We are going to explain the *method* argument and the *type *argument.

## The corrplot function – argument *method*

The *method *argument serves to determine what the plot should look like. There are the following options: "circle" (default option),"square", "ellipse", "number", "pie", "shade", "color".

For example, the *color* method returns a plot with the negative correlations marked in red, positive correlations in blue, and the stronger the correlation, the more intense the colour.

> corrplot(k, method = "color")

From the plot we can easily estimate that the correlation between *mpg* and *hp *is around –0.7, the correlation between *mpg* and *drat *is around 0.7. The exact values of the correlations cannot be determined though – the plot serves more as an outline of which variables correlate more strongly and which ones more weakly.

The *circle *method can help estimate the correlations more precisely: it produces a plot which, apart from colour, displays the strength of the correlations using the size of circles.

> corrplot(k, method = "circle")

The *pie *method produces a plot where the strength of the correlation is (apart from colour) displayed as the portion of a circle which is in colour.

> corrplot(k, method = "pie")

For example, if we take a look at the correlation between *mpg* and *hp *(first line and fourth column), we can see that more than three quarters of the circle are in colour, so the correlation must be stronger than –0.75. On contrary, if we take a look at the correlation between *mpg *and *drat *(first row, fifth column), we can see that less than three quarters of the circle are in colour, so the correlation must be weaker than 0.75.

Out of the disponible methods, we get the most precise values using the *number* method, which displays the actual correlation coefficients. It will show us that the correlation between *mpg *and *hp *is –0.78, and between *mpg *and *drat *it is 0.68.

> corrplot(k, method = "number")

## The corrplot function – argument *type*

Each correlation matrix is symmetrical: it has the same numbers above and below diagonal (on the respective positions). It is thus possible to only display the numbers above diagonal, or only the ones below diagonal, without any loss of information.

To do this, we can use the *type *argument. Let us show how it works using the *circle *correlation matrix.

> corrplot(k, method = "circle", type = "upper")

> corrplot(k, method = "circle", type = "lower")