# Correlation and dependence

**correlationcorrelatedcorrelatecorrelationsassociationcorrelation matrixassociatedpositive correlationuncorrelatedlinear relationship**

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.wikipedia

1,030 Related Articles

### Human height

**heightgrowth spurtstature**

Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a limited supply product and its price.

A particular genetic profile in men called Y haplotype I-M170 is correlated with height.

### Correlation does not imply causation

**causationcorrelationcorrelation implies causation**

However, in general, the presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., correlation does not imply causation).

In statistics, many statistical tests calculate correlations between variables and when two variables are found to be correlated, it is tempting to assume that this shows that one variable causes the other.

### Statistics

**statisticalstatistical analysisstatistician**

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.

These inferences may take the form of: answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data (estimation), describing associations within the data (correlation) and modeling relationships within the data (for example, using regression analysis).

### Pearson correlation coefficient

**correlation coefficientcorrelationPearson correlation**

The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other).

In statistics, the Pearson correlation coefficient (PCC, pronounced ), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y.

### Causality

**causalcausationcause and effect**

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.

Alternative methods of structure learning search through the many possible causal structures among the variables, and remove ones which are strongly incompatible with the observed correlations.

### Correlation coefficient

**correlationcorrelatedcorrelation coefficients**

There are several correlation coefficients, often denoted \rho or r, measuring the degree of correlation.

A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables.

### Covariance

**covariantcovariationcovary**

It is obtained by dividing the covariance of the two variables by the product of their standard deviations.

The sign of the covariance therefore shows the tendency in the linear relationship between the variables.

### Francis Galton

**Sir Francis GaltonGaltonGalton, Francis**

Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.

He also created the statistical concept of correlation and widely promoted regression toward the mean.

### Spearman's rank correlation coefficient

**rank correlation coefficientSpearmanSpearman's rho**

Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient measure the extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship.

In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables).

### Random variable

**random variablesrandom variationrandom**

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.

The underlying probability space \Omega is a technical device used to guarantee the existence of random variables, sometimes to construct them, and to define notions such as correlation and dependence or independence based on a joint distribution of two or more random variables on the same probability space.

### Mutual information

**informationalgorithmic mutual informationan analogue of mutual information for Kolmogorov complexity**

Mutual information can also be applied to measure dependence between two variables.

Mutual information is one of the measures of association or correlation between the row and column variables.

### Karl Pearson

**PearsonPearson, KarlCarl Pearson**

Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton.

These techniques, which are widely used today for statistical analysis, include the chi-squared test, standard deviation, and correlation and regression coefficients.

### Distance correlation

**distance standard deviationdistance covariance**

Distance correlation was introduced to address the deficiency of Pearson's correlation that it can be zero for dependent random variables; zero distance correlation implies independence.

Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson’s correlation, namely that it can easily be zero for dependent variables.

### Bivariate data

**bivariatetwo-**

Correlations between the two variables are determined as strong or weak correlations and are rated on a scale of –1 to 1, where 1 is a perfect direct correlation, –1 is a perfect inverse correlation, and 0 is no correlation.

### Polychoric correlation

**Tetrachoric correlationtetrachoric correlation coefficient**

The polychoric correlation is another correlation applied to ordinal data that aims to estimate the correlation between theorised latent variables.

In statistics, polychoric correlation is a technique for estimating the correlation between two theorised normally distributed continuous latent variables, from two observed ordinal variables.

### Rank correlation

**ordinal associationrank correlation coefficientrank regression**

Rank correlation coefficients, such as Spearman's rank correlation coefficient and Kendall's rank correlation coefficient measure the extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship.

Some of the more popular rank correlation statistics include

### Scaled correlation

For example, scaled correlation is designed to use the sensitivity to the range in order to pick out correlations between fast components of time series.

In statistics, scaled correlation is a form of a coefficient of correlation applicable to data that have a temporal component such as time series.

### Coefficient of determination

**R 2 R'' 2 explained**

For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r_{xy}, Pearson's product-moment coefficient. The coefficient of determination generalizes the correlation coefficient for relationships beyond simple linear regression to multiple regression.

A caution that applies to R 2, as to other statistical descriptions of correlation and association is that "correlation does not imply causation."

### Copula (probability theory)

**copulacopulasGaussian copula**

The Randomized Dependence Coefficient is a computationally efficient, copula-based measure of dependence between multivariate random variables.

For a given correlation matrix, the Gaussian copula with parameter matrix R can be written as

### Autocorrelation

**autocorrelation functionserial correlationautocorrelated**

Autocorrelation

Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay.

### Multivariate normal distribution

**multivariate normalbivariate normal distributionjointly normally distributed**

The correlation coefficient completely defines the dependence structure only in very particular cases, for example when the distribution is a multivariate normal distribution.

The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

### Canonical correlation

**canonical correlation analysisCCAcanonical correlation analysis (CCA)**

Canonical correlation

If we have two vectors X = (X 1, ..., X n ) and Y = (Y 1, ..., Y m ) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other.

### Uncorrelatedness (probability theory)

**uncorrelated**

Then Y is completely determined by X, so that X and Y are perfectly dependent, but their correlation is zero; they are uncorrelated.

Correlation and dependence

### Simple linear regression

**simple regressioni.e. regression linelinear least squares regression with an intercept term and a single explanator**

The coefficient of determination generalizes the correlation coefficient for relationships beyond simple linear regression to multiple regression.

See sample correlation coefficient for additional details.

### Scatter plot

**scatterplotscatter plotsscatter diagram**

The adjacent image shows scatter plots of Anscombe's quartet, a set of four different pairs of variables created by Francis Anscombe.

If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of correlation (not causation) between two variables.