Correlation and dependence

correlationcorrelatedcorrelationscorrelateassociationcorrelation matrixpositive correlationassociateduncorrelatedstatistical correlation
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.wikipedia
1,031 Related Articles

Human height

heightgrowth spurtstature
Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a limited supply product and its price.
A particular genetic profile in men called Y haplotype I-M170 is correlated with height.

Correlation does not imply causation

Cum hoc ergo propter hoccausationcorrelation
However, in general, the presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation).
In statistics, the phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two variables solely on the basis of an observed association or correlation between them.

Statistics

statisticalstatistical analysisstatistician
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
These inferences may take the form of: answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation) and modeling relationships within the data (for example, using regression analysis).

Pearson correlation coefficient

correlation coefficientPearson product-moment correlation coefficientPearson correlation
The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other).
In statistics, the Pearson correlation coefficient (PCC, pronounced ), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y.

Correlation coefficient

correlationcorrelatedcorrelation coefficients
There are several correlation coefficients, often denoted \rho or r, measuring the degree of correlation.
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables.

Causality

causalcause and effectcausation
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
Alternative methods of structure learning search through the many possible causal structures among the variables, and remove ones which are strongly incompatible with the observed correlations.

Covariance

covariantcovariationcovary
Mathematically, one simply divides the covariance of the two variables by the product of their standard deviations.
The sign of the covariance therefore shows the tendency in the linear relationship between the variables.

Spearman's rank correlation coefficient

Spearman's rank correlationrank correlation coefficientSpearman
Other correlation coefficients have been developed to be more robust than the Pearson correlation – such as the Spearman's rank correlation that is, more sensitive to nonlinear relationships. Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient measure the extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship.
In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables).

Francis Galton

Sir Francis GaltonGaltonGalton, Francis
Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.
He also created the statistical concept of correlation and widely promoted regression toward the mean.

Odds ratio

ORoddsodds ratios
For two binary variables, the odds ratio measures their dependence, and takes range non-negative numbers, possibly infinity:.
An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due to symmetry), the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event.

Random variable

random variablesrandom variationrandom
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
The underlying probability space \Omega is a technical device used to guarantee the existence of random variables, sometimes to construct them, and to define notions such as correlation and dependence or independence based on a joint distribution of two or more random variables on the same probability space.

Mutual information

Average Mutual Informationinformationalgorithmic mutual information
Mutual information can also be applied to measure dependence between two variables. The correlation ratio, entropy-based mutual information, total correlation, dual total correlation and polychoric correlation are all also capable of detecting more general dependencies, as is consideration of the copula between them, while the coefficient of determination generalizes the correlation coefficient to multiple regression.
Mutual information is one of the measures of association or correlation between the row and column variables.

Goodman and Kruskal's gamma

Gamma test (statistics)Yule's ''Qgamma
Related statistics such as Yule's Y and Yule's Q normalize this to the correlation-like range.
It measures the strength of association of the cross tabulated data when both variables are measured at the ordinal level.

Distance correlation

distance standard deviationBrownian covarianceDistance covariance
Distance correlation was introduced to address the deficiency of Pearson's correlation that it can be zero for dependent random variables; zero distance correlation implies independence.
Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson’s correlation, namely that it can easily be zero for dependent variables.

Karl Pearson

PearsonPearson, KarlCarl Pearson
Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.
These techniques, which are widely used today for statistical analysis, include the chi-squared test, standard deviation, and correlation and regression coefficients.

Bivariate data

bivariatetwo-
In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.
Correlations between the two variables are determined as strong or weak correlations and are rated on a scale of –1 to 1, where 1 is a perfect direct correlation, –1 is a perfect inverse correlation, and 0 is no correlation.

Polychoric correlation

Tetrachoric correlationtetrachoric correlation coefficient
The correlation ratio, entropy-based mutual information, total correlation, dual total correlation and polychoric correlation are all also capable of detecting more general dependencies, as is consideration of the copula between them, while the coefficient of determination generalizes the correlation coefficient to multiple regression.
In statistics, polychoric correlation is a technique for estimating the correlation between two theorised normally distributed continuous latent variables, from two observed ordinal variables.

Rank correlation

ordinal associationrank correlation coefficientrank regression
Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient measure the extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship.

Scaled correlation

For example, scaled correlation is designed to use the sensitivity to the range in order to pick out correlations between fast components of time series.
In statistics, scaled correlation is a form of a coefficient of correlation applicable to data that have a temporal component such as time series.

Coefficient of determination

R-squaredR'' 2 R 2
The correlation ratio, entropy-based mutual information, total correlation, dual total correlation and polychoric correlation are all also capable of detecting more general dependencies, as is consideration of the copula between them, while the coefficient of determination generalizes the correlation coefficient to multiple regression. For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r_{xy}, Pearson's product-moment coefficient.
A caution that applies to R 2, as to other statistical descriptions of correlation and association is that "correlation does not imply causation."

Copula (probability theory)

copulaGaussian copulacopulas
The correlation ratio, entropy-based mutual information, total correlation, dual total correlation and polychoric correlation are all also capable of detecting more general dependencies, as is consideration of the copula between them, while the coefficient of determination generalizes the correlation coefficient to multiple regression. The Randomized Dependence Coefficient is a computationally efficient, copula-based measure of dependence between multivariate random variables.
For a given correlation matrix, the Gaussian copula with parameter matrix R can be written as

Autocorrelation

autocorrelation functionserial correlationautocorrelated
Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay.

Multivariate normal distribution

multivariate normalbivariate normal distributionjointly normally distributed
However, in the special case when X and Y are jointly normal, uncorrelatedness is equivalent to independence.
The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Canonical correlation

canonical correlation analysisCCAcanonical correlation analysis (CCA)
If we have two vectors X = (X 1, ..., X n ) and Y = (Y 1, ..., Y m ) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other.

Correlation function

correlationautocorrelation functioncorrelated
A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables.