Correlation
In statistics, correlation is a kind of statistical relationship between two random variables or bivariate data. Usually it refers to the degree to which a pair of variables are linearly related. In statistics, more general relationships between variables are called an association, the degree to which some of the variability of one variable can be accounted for by the other.
The presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation). Furthermore, the concept of correlation is not the same as dependence: if two variables are independent, then they are uncorrelated, but the opposite is not necessarily true – even if two variables are uncorrelated, they might be dependent on each other.
Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling.
There are several correlation coefficients, often denoted or , measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other). Other correlation coefficients – such as Spearman's rank correlation coefficient – have been developed to be more robust than Pearson's and to detect less structured relationships between variables.
The concept has been generalized to other forms of association between two variables, such as mutual information and distance covariance.