One of the simplest and most common statistical Excel calculations you might make is correlation. It’s a simple statistic, but it can be very informative when you want to see if two variables are related. If you know the right commands, find the correlation coefficient in Excel is extremely easy.
We’ll take a look at what correlation is to give you an idea of the information that it gives you. Then we’ll move on to finding correlation coefficient in Excel using two methods and a good graph for looking at correlations. Finally, I’ll give you a very quick introduction to linear regression, another statistical function that might prove useful when you’re looking at correlations.
What Is Correlation?
Before we get started, let’s discuss the definition of correlation. It’s a simple measure of how things are related. Let’s take a look at two variables that have no correlation whatsoever.
These two variables (one plotted on the X axis, one on the Y) are totally random, and are not closely related.
The two variables below, however, are correlated:
In general, as one variable rises, so does the other. That’s correlation. (Note that it can be the other way around, too; if one goes up and the other goes down, that’s a negative correlation.)
Understanding Correlation Coefficient
The correlation coefficient tells you how related two variables are. The coefficient is between -1 and 1. A correlation coefficient of 0 means that there’s absolutely no correlation between two variables. This is what you should get when you have two sets of random numbers.
A coefficient of -1 means you have a perfect negative correlation: as one variable increases, the other decreases proportionally. A coefficient of 1 is a perfect positive correlation: as one variable increases, the other also increases proportionally.
Any number between those represents a scale. A correlation of .5, for example, is a moderate positive correlation.
As you can see in the graphic below, correlation only looks for a linear relationship. Two variables can be strongly related in another way and still have a correlation coefficient of zero:
How to Find Correlation Coefficient in Excel Using CORREL
There’s a built-in function for correlation in Excel. The CORREL function has a very simple syntax:
array1 is your first group of numbers, and array2 is the second group. Excel will spit out a number, and that’s your correlation coefficient. Let’s look at an example.
In this spreadsheet, we have a list of cars, with model and year, and their values. I used the CORREL function to see if the model year and value were related:
There’s a very weak positive correlation; so as the year goes up, so does the value of the vehicle. But not by very much.
When you’re running correlations, it’s a good idea to use a scatter plot to get a visual understanding of how your datasets are related. Go to Charts > Scatter to see what your data looks like:
You can see that in this data, the year of the car doesn’t affect the value very much. There’s a slight positive trend, but it’s weak. That’s what we found with our CORREL function.
Another useful element in a scatterplot is a trendline, which looks like this:
The trendline can be useful when you want to make a correlation clear in your scatterplot. In Windows, click on Chart Tools > Design > Add Chart Element and select trendline. On a Mac, you’ll need to go to Chart Layout or Chart Design, depending on the edition of Excel.
And don’t forget to check out our guide to making great charts in Excel before you present any findings!
Correlating Multiple Variables With the Data Analysis Toolpak
If you have many different sets of numbers and you want to find correlations between them, you need to run the CORREL function on each combination. Using the Data Analysis Toolpak, however, you can select a number of datasets and see where correlations lie.
Not sure if you have the Data Analysis Toolpak? Check out our walkthrough of the basics to download and get the hang of it.
To fire up the Toolpak, go to Data > Data Analysis. You’ll see a list of choices:
Select Correlation and hit OK.
In the resulting window, select all of your data sets in the Input Range box, and tell Excel where you want your results placed:
Here’s what you’ll get when you hit OK:
In the image above, we’ve run correlations on four different datasets: the year, the world population, and two sets of random numbers.
Every dataset’s correlation with itself is 1. The year and world population have an extremely strong correlation, while there are very weak correlations elsewhere, as we’d expect with random numbers.
Correlation vs. Linear Regression in Excel
Correlation is a simple measure: how closely related are two variables? This measure, however, doesn’t have any predictive or causative value. Just because two variables are correlated doesn’t mean one is causing changes in the other. That’s a crucial thing to understand about correlation.
If you’re interested in making a claim about causality, you’ll need to use linear regression. You can also access this through the Data Analysis Toolpak. (This article won’t cover the details of how linear regression works, but there are plenty of free statistics resources that can walk you through the basics.)
Open the Data Analysis Toolpak, select Regression, and click OK.
Fill the X and Y ranges (the X value is the explanatory variable, and the Y value is the value you’re trying to predict). Then select where you want your output to go, and click OK again.
The number you’ll want to focus on here is the p-value for your explanatory variable:
If it’s less than 0.05, you have a strong argument that changes in your X variable are causing changes in your Y variable. In the image above, we’ve shown that the year is a significant predictor of world population.
Linear regression is also useful in that it can look at multiple values. Here, we’ve used regression to see if year and population are significant predictors of the price of crude oil:
Both p-values are less than 0.05, so we can conclude that both year and world population are significant predictors of the price of crude oil. (Though strong correlations between X variables can cause their own problems.)
Again, this is a very simplistic explanation of linear regression, and if you’re interested in causality, you should read up on some stats tutorials.
But now you have an idea of how to go beyond simple correlation if you’re looking for more statistical information!
Get Better Data Out of Excel
Understanding basic statistical functions in Excel can help you get a lot more useful information from your data. Correlation is a simple measure, but it can be a big help when you’re trying to make claims about the numbers in your spreadsheet.
Of course, you can run plenty of other more complicated measures. But unless you’re comfortable with statistics, you’ll want to start with the basics.
Do you use Excel’s correlation functionality regularly? What other statistical functions would you like to learn about?