Author: Lenka Fiřtová

This article describes how to visualize computed correlation matrices in a clear, easily presentable way.

In this article, you can read how to compute correlation in R.

 

Initial calculations

In this article we are going to use the corrplot package, which allows us to create nice and understandable visualizations of correlation matrices. We can install the package using the install.packages command (the name of the package needs to be wrapped in quotation marks). Then we can load the package using the library command (we need to use the library command every time we start R if we are using any packages, but the installation needs to be done only once).

We are going to work with the mtcars dataset, which contains information on 10 aspects of 32 cars. We can access the description of the data using the following command:

Let us take a look at the first few rows:

The corrplot package requires a computed correlation matrix. Let’s first compute the correlation matrix of the variables in the mtcars dataset. Let’s name this correlation matrix k.  

To visualize the correlation matrix, we use the corrplot function. There is only one necessary argument, the correlation matrix, and then many optional arguments. We are going to explain the method argument and the type argument.

The corrplot function – argument method

The method argument serves to determine what the plot should look like. There are the following options: "circle" (default option),"square", "ellipse", "number", "pie", "shade", "color".

For example, the color method returns a plot with the negative correlations marked in red, positive correlations in blue, and the stronger the correlation, the more intense the colour.

From the plot we can easily estimate that the correlation between mpg and hp is around –0.7, the correlation between mpg and drat is around 0.7. The exact values of the correlations cannot be determined though – the plot serves more as an outline of which variables correlate more strongly and which ones more weakly.

The circle method can help estimate the correlations more precisely: it produces a plot which, apart from colour, displays the strength of the correlations using the size of circles.

The pie method produces a plot where the strength of the correlation is (apart from colour) displayed as the portion of a circle which is in colour.

For example, if we take a look at the correlation between mpg and hp (first line and fourth column), we can see that more than three quarters of the circle are in colour, so the correlation must be stronger than –0.75. On contrary, if we take a look at the correlation between mpg and drat (first row, fifth column), we can see that less than three quarters of the circle are in colour, so the correlation must be weaker than 0.75.

Out of the disponible methods, we get the most precise values using the number method, which displays the actual correlation coefficients. It will show us that the correlation between mpg and hp is –0.78, and between mpg and drat it is 0.68.

The corrplot function – argument type

Each correlation matrix is symmetrical: it has the same numbers above and below diagonal (on the respective positions). It is thus possible to only display the numbers above diagonal, or only the ones below diagonal, without any loss of information.

To do this, we can use the type argument. Let us show how it works using the circle correlation matrix.

 

Leave a Reply

Your email address will not be published.

*

clear formPost comment