Pearson correlation coefficient

correlation coefficientcorrelationPearson correlationcorrelation coefficientsrcorrelatedPearson's correlation coefficientPearson's rPearsonPearson's Product-Moment Correlation Coefficient
In statistics, the Pearson correlation coefficient (PCC, pronounced ), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y.wikipedia
251 Related Articles

Covariance

covariantcovariationcovary
Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. The population Pearson correlation coefficient is defined in terms of moments, and therefore exists for any bivariate probability distribution for which the population covariance is defined and the marginal population variances are defined and are non-zero.
The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

Negative relationship

inverse relationshipinversely relatednegative correlation
The correlation coefficient is negative (anti-correlation) if X i and Y i tend to lie on opposite sides of their respective means.
Negative correlation can be seen geometrically when two normalized random vectors are viewed as points on a sphere, and the correlation between them is the cosine of the arc of separation of the points on the sphere.

Statistics

statisticalstatistical analysisstatistician
In statistics, the Pearson correlation coefficient (PCC, pronounced ), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y.
Pearson developed the Pearson product-moment correlation coefficient, defined as a product-moment, the method of moments for the fitting of distributions to samples and the Pearson distribution, among many other things.

Fisher transformation

Fisher's ''z'' transformationFisher’s Z-transformation
In practice, confidence intervals and hypothesis tests relating to ρ are usually carried out using the Fisher transformation, the inverse hyperbolic function (artanh) of r:
In statistics, hypotheses about the value of the population correlation coefficient ρ between variables X and Y can be tested using the Fisher transformation (aka Fisher z-transformation) applied to the sample correlation coefficient.

Karl Pearson

PearsonPearson, KarlCarl Pearson
It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.
Correlation coefficient. The correlation coefficient (first conceived by Francis Galton) was defined as a product-moment, and its relationship with linear regression was studied.

Multivariate normal distribution

multivariate normalbivariate normal distributionjointly normally distributed
For pairs from an uncorrelated bivariate normal distribution, the sampling distribution of a certain function of Pearson's correlation coefficient follows Student's t-distribution with degrees of freedom n − 2.
where ρ is the correlation between X and Y and

Coefficient of determination

R 2 R'' 2 explained
The square of the sample correlation coefficient is typically denoted r 2 and is a special case of the coefficient of determination.
One class of such cases includes that of simple linear regression where r 2 is used instead of R 2 . When an intercept is included, then r 2 is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values.

Cosine similarity

cosine distance2 cos 1cosine angle
This uncentred correlation coefficient is identical with the cosine similarity.
If the attribute vectors are normalized by subtracting the vector means (e.g., A - \bar{A}), the measure is called the centered cosine similarity and is equivalent to the Pearson correlation coefficient.

Resampling (statistics)

resamplingstatistical supportstrongly supported
In some situations, the bootstrap can be applied to construct confidence intervals, and permutation tests can be applied to carry out hypothesis tests.
Bootstrapping is a statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient.

Correlation and dependence

correlationcorrelatedcorrelate
In statistics, the Pearson correlation coefficient (PCC, pronounced ), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC) or the bivariate correlation, is a measure of the linear correlation between two variables X and Y.
The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other).

Distance correlation

distance standard deviationdistance covariance
Distance correlation
This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

Normally distributed and uncorrelated does not imply independent

here for an examplein general, not sufficientindividually normally distributed
Normally distributed and uncorrelated does not imply independent
In probability theory, although simple examples illustrate that linear uncorrelatedness of two random variables does not in general imply their independence, it is sometimes mistakenly thought that it does imply that when the two random variables are normally distributed.

Multiple correlation

coefficient of multiple determinationcoefficient of multiple correlation
Multiple correlation
It is the correlation between the variable's values and the best predictions that can be computed linearly from the predictive variables.

RV coefficient

RV coefficient
is a multivariate generalization of the squared Pearson correlation coefficient (because the RV coefficient takes values between 0 and 1). It measures the closeness of two set of points that may each be represented in a matrix.

Probability distribution

distributioncontinuous probability distributiondiscrete probability distribution
The population Pearson correlation coefficient is defined in terms of moments, and therefore exists for any bivariate probability distribution for which the population covariance is defined and the marginal population variances are defined and are non-zero.
F-distribution, the distribution of the ratio of two scaled chi squared variables; useful e.g. for inferences that involve comparing variances or involving R-squared (the squared correlation coefficient)

Simple linear regression

simple regressioni.e. regression linelinear least squares regression with an intercept term and a single explanator
In this case, it estimates the fraction of the variance in Y that is explained by X in a simple linear regression.
The product-moment correlation coefficient might also be calculated:

Partial correlation

If a population or data-set is characterized by more than two variables, a partial correlation coefficient measures the strength of dependence between a pair of variables that is not accounted for by the way in which they both change in response to variations in a selected subset of the other variables.
If we compute the Pearson correlation coefficient between variables X and Y, the result is approximately 0.969, while if we compute the partial correlation between X and Y, using the formula given above, we find a partial correlation of 0.919.

Spearman's rank correlation coefficient

rank correlation coefficientSpearmanSpearman's rho
Spearman's rank correlation coefficient
The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not).

Quadrant count ratio

Quadrant count ratio
The QCR is not commonly used in the practice of statistics; rather, it is a useful tool in statistics education because it can be used as an intermediate step in the development of Pearson's correlation coefficient.

Exchangeable random variables

exchangeabilityexchangeableexchangeable sequence
However the standard versions of these approaches rely on exchangeability of the data, meaning that there is no ordering or grouping of the data pairs being analyzed that might affect the behavior of the correlation estimate.
Let (X, Y) have a bivariate normal distribution with parameters \mu = 0, and an arbitrary correlation coefficient . The random variables X and Y are then exchangeable, but independent only if \rho=0. The density function is

Anscombe's quartet

Anscombe's quartet
The second graph (top right) is not distributed normally; while a relationship between the two variables is obvious, it is not linear, and the Pearson correlation coefficient is not relevant. A more general regression and the corresponding coefficient of determination would be more appropriate.

Cauchy–Schwarz inequality

Cauchy Schwarz inequalityCauchy's inequalityCauchy-Schwarz inequality
According to the Cauchy–Schwarz inequality it has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.

Francis Galton

Sir Francis GaltonGaltonGalton, Francis
It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.

Moment (mathematics)

momentsmomentraw moment
The population Pearson correlation coefficient is defined in terms of moments, and therefore exists for any bivariate probability distribution for which the population covariance is defined and the marginal population variances are defined and are non-zero. The form of the definition involves a "product moment", that is, the mean (the first moment about the origin) of the product of the mean-adjusted random variables; hence the modifier product-moment in the name.

Statistical population

populationsubpopulationsubpopulations
The population Pearson correlation coefficient is defined in terms of moments, and therefore exists for any bivariate probability distribution for which the population covariance is defined and the marginal population variances are defined and are non-zero. Pearson's correlation coefficient when applied to a population is commonly represented by the Greek letter ρ (rho) and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient.