One advantage of knowing the correlation between two variables is that
Show
Often, when we start analyzing new data, one of the very first things we look at is whether certain pairs of variables are correlated. Correlation can tell if two variables have a linear relationship, and the strength of that relationship. This makes sense as a starting point, since we're usually looking for relationships and correlation is an easy way to get a quick handle on the data set we're working with. How do we define correlation? We can think of it in terms of a simple question: when X increases, what does Y tend to do? In general, if Y tends to increase along with X, there's a positive relationship. If Y decreases as X increases, that's a negative relationship. Correlation is defined numerically by a correlation coefficient. This is a value that takes a range from -1 to 1. A coefficient of -1 is perfect negative linear correlation: a straight line trending downward. A +1 coefficient is, conversely, perfect positive linear correlation. A correlation of 0 is no linear correlation at all. Making a scatterplot in Minitab can give you a quick visualization of the correlation between variables, and you can get the correlation coefficient by going to Stat > Basic Statistics > Correlation... Here's a few examples of data sets that a correlation coefficient can accurately assess. This graph shows a positive correlation of 0.7; close to 1. As you can see from the scatterplot, it's a fairly strong linear relationship. As the values of X tend to increase, Y tends to increase as well. Below is a similar plot, but here the relationship shows a negative direction. Correlation's LimitsHowever, there are some drawbacks and limitations to simple linear correlation. A correlation coefficient can only tell whether your two variables have a linear relationship. Take, for example, the following chart, which has a correlation coefficient of about 0; we can pretty easily see that there isn't much of a relationship at all: However, now take a look at this graph, in which there is an obvious relationship, but not a linear one. Notice that the correlation coefficient is also 0 in this case: This is what you have to keep in mind when interpreting correlations. The correlation coefficient will only detect linear relationships. Just because the correlation coefficient is near 0, it doesn't mean that there isn't some type of relationship there. The other thing to remember is something most of us hear soon after we begin exploring data—that correlation does not imply causation. Just because X and Y are correlated in some way does not mean that X causes a change in Y, or vice versa. Here's my favorite example for this. If we look at two variables, shark attacks and ice cream sales, we know intuitively that there's no way one variable has a cause-and-effect impact on the other. However, both shark attacks and ice cream sales will have greater numbers in summer months, so they will be strongly correlated with each other. Be careful not to fall into this trap with your data! Correlation has a lot of benefits, and it is still a good starting point in a number of different cases, but it's important to know its limitations as well. When investigating the relationship between two or more numeric variables, it is important to know the difference between correlation and regression. The similarities/differences and advantages/disadvantages of these tools are discussed here along with examples of each. Correlation quantifies the direction and strength of the relationship between two numeric variables, X and Y, and always lies between -1.0 and 1.0. Simple linear regression relates X to Y through an equation of the form Y = a + bX. Key similarities
Key differences
Prism helps you save time and make more appropriate analysis choices. Try Prism for free. *The X variable can be fixed with correlation, but confidence intervals and statistical tests are no longer appropriate. Typically, regression is used when X is fixed. Learn more about correlation vs regression analysis with this video by 365 Data Science Key advantage of correlation
Key advantage of regression
Correlation ExampleAs an example, let’s go through the Prism tutorial on correlation matrix which contains an automotive dataset with Cost in USD, MPG, Horsepower, and Weight in Pounds as the variables. Instead of just looking at the correlation between one X and one Y, we can generate all pairwise correlations using Prism’s correlation matrix. If you don’t have access to Prism, download the free 30 day trial here. These are the steps in Prism:
The Prism correlation matrix displays all the pairwise correlations for this set of variables.
Key findings:
Note that the matrix is symmetric. For example, the correlation between “weight in pounds” and “cost in USD” in the lower left corner (0.52) is the same as the correlation between “cost in USD” and “weight in pounds” in the upper right corner (0.52). This reinforces the fact that X and Y are interchangeable with regard to correlation. The correlations along the diagonal will always be 1.00 and a variable is always perfectly correlated with itself. When interpreting correlations, you should be aware of the four possible explanations for a strong correlation:
Regression ExampleThe strength of UV rays varies by latitude. The higher the latitude, the less exposure to the sun, which corresponds to a lower skin cancer risk. So where you live can have an impact on your skin cancer risk. Two variables, cancer mortality rate and latitude, were entered into Prism’s XY table. The Prism graph (right) shows the relationship between skin cancer mortality rate (Y) and latitude at the center of a state (X). It makes sense to compute the correlation between these variables, but taking it a step further, let’s perform a regression analysis and get a predictive equation. The relationship between X and Y is summarized by the fitted regression line on the graph with equation: mortality rate = 389.2 - 5.98*latitude. Based on the slope of -5.98, each 1 degree increase in latitude decreases deaths due to skin cancer by approximately 6 per 10 million people. Since regression analysis produces an equation, unlike correlation, it can be used for prediction. For example, a city at latitude 40 would be expected to have 389.2 - 5.98*40 = 150 deaths per 10 million due to skin cancer each year.Regression also allows for the interpretation of the model coefficients:
Improve your linear regression with Prism. Start your free trial today. Summary and Additional InformationIn summary, correlation and regression have many similarities and some important differences. Regression is primarily used to build models/equations to predict a key response, Y, from a set of predictor (X) variables. Correlation is primarily used to quickly and concisely summarize the direction and strength of the relationships between a set of 2 or more numeric variables. The table below summarizes the key similarities and differences between correlation and regression.
Learn more about how to choose between regression and correlation on Prism Academy Test your understanding of Correlation and RegressionWhich tool, correlation or regression, would you use in each of these scenarios:
Answers:
Start your free trial of Prism today What is the advantage of knowing the correlation between variables?Knowing how two variables are correlated allows for predicting trends in the future, as you'll be able to understand the relationship between the variables — or if there's no relationship at all.
What is one advantage of a correlational study?Conclusion: Findings from correlational research can be used to determine prevalence and relationships among variables, and to forecast events from current data and knowledge.
What is correlation advantages and disadvantages?Correlational Study Advantages and Disadvantages. What is one advantage of the use of a correlational design?One strength is that they can be used when experimental research is not possible because the predictor variables cannot be manipulated. Correlational designs also have the advantage of allowing the researcher to study behaviour as it occurs in everyday life.
|