Correlation and dependence

correlationcorrelatedcorrelatecorrelationsassociationcorrelation matrixassociatedpositive correlationuncorrelatedlinear relationship
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.wikipedia
1,030 Related Articles

Human height

heightgrowth spurtstature
Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a limited supply product and its price.
A particular genetic profile in men called Y haplotype I-M170 is correlated with height.

Correlation does not imply causation

causationcorrelationcorrelation implies causation
However, in general, the presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation).
In statistics, many statistical tests calculate correlations between variables and when two variables are found to be correlated, it is tempting to assume that this shows that one variable causes the other.

Statistics

statisticalstatistical analysisstatistician
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
These inferences may take the form of: answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation) and modeling relationships within the data (for example, using regression analysis).

Pearson correlation coefficient

correlation coefficientcorrelationPearson correlation
The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other).
In statistics, the Pearson correlation coefficient (PCC, pronounced ), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y.

Causality

causalcausationcause and effect
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
Alternative methods of structure learning search through the many possible causal structures among the variables, and remove ones which are strongly incompatible with the observed correlations.

Correlation coefficient

correlationcorrelatedcorrelation coefficients
There are several correlation coefficients, often denoted \rho or r, measuring the degree of correlation.
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables.

Covariance

covariantcovariationcovary
It is obtained by dividing the covariance of the two variables by the product of their standard deviations.
The sign of the covariance therefore shows the tendency in the linear relationship between the variables.

Francis Galton

Sir Francis GaltonGaltonGalton, Francis
Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.
He also created the statistical concept of correlation and widely promoted regression toward the mean.

Spearman's rank correlation coefficient

rank correlation coefficientSpearmanSpearman's rho
Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient measure the extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship.
In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables).

Random variable

random variablesrandom variationrandom
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
The underlying probability space \Omega is a technical device used to guarantee the existence of random variables, sometimes to construct them, and to define notions such as correlation and dependence or independence based on a joint distribution of two or more random variables on the same probability space.

Mutual information

informationalgorithmic mutual informationan analogue of mutual information for Kolmogorov complexity
Mutual information can also be applied to measure dependence between two variables.
Mutual information is one of the measures of association or correlation between the row and column variables.

Karl Pearson

PearsonPearson, KarlCarl Pearson
Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.
These techniques, which are widely used today for statistical analysis, include the chi-squared test, standard deviation, and correlation and regression coefficients.

Distance correlation

distance standard deviationdistance covariance
Distance correlation was introduced to address the deficiency of Pearson's correlation that it can be zero for dependent random variables; zero distance correlation implies independence.
Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson’s correlation, namely that it can easily be zero for dependent variables.

Bivariate data

bivariatetwo-
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
Correlations between the two variables are determined as strong or weak correlations and are rated on a scale of –1 to 1, where 1 is a perfect direct correlation, –1 is a perfect inverse correlation, and 0 is no correlation.

Polychoric correlation

Tetrachoric correlationtetrachoric correlation coefficient
The polychoric correlation is another correlation applied to ordinal data that aims to estimate the correlation between theorised latent variables.
In statistics, polychoric correlation is a technique for estimating the correlation between two theorised normally distributed continuous latent variables, from two observed ordinal variables.

Rank correlation

ordinal associationrank correlation coefficientrank regression
Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient measure the extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship.
Some of the more popular rank correlation statistics include

Scaled correlation

For example, scaled correlation is designed to use the sensitivity to the range in order to pick out correlations between fast components of time series.
In statistics, scaled correlation is a form of a coefficient of correlation applicable to data that have a temporal component such as time series.

Coefficient of determination

R 2 R'' 2 explained
For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r_{xy}, Pearson's product-moment coefficient. The coefficient of determination generalizes the correlation coefficient for relationships beyond simple linear regression to multiple regression.
A caution that applies to R 2, as to other statistical descriptions of correlation and association is that "correlation does not imply causation."

Copula (probability theory)

copulacopulasGaussian copula
The Randomized Dependence Coefficient is a computationally efficient, copula-based measure of dependence between multivariate random variables.
For a given correlation matrix, the Gaussian copula with parameter matrix R can be written as

Autocorrelation

autocorrelation functionserial correlationautocorrelated
Autocorrelation
Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay.

Multivariate normal distribution

multivariate normalbivariate normal distributionjointly normally distributed
The correlation coefficient completely defines the dependence structure only in very particular cases, for example when the distribution is a multivariate normal distribution.
The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Canonical correlation

canonical correlation analysisCCAcanonical correlation analysis (CCA)
Canonical correlation
If we have two vectors X = (X 1, ..., X n ) and Y = (Y 1, ..., Y m ) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other.

Uncorrelatedness (probability theory)

uncorrelated
Then Y is completely determined by X, so that X and Y are perfectly dependent, but their correlation is zero; they are uncorrelated.
Correlation and dependence

Simple linear regression

simple regressioni.e. regression linelinear least squares regression with an intercept term and a single explanator
The coefficient of determination generalizes the correlation coefficient for relationships beyond simple linear regression to multiple regression.
See sample correlation coefficient for additional details.

Scatter plot

scatterplotscatter plotsscatter diagram
The adjacent image shows scatter plots of Anscombe's quartet, a set of four different pairs of variables created by Francis Anscombe.
If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of correlation (not causation) between two variables.