Statistical dispersion

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions. A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse. Most measures of dispersion have the same units as the quantity being measured.

Probability distribution

distributioncontinuous probability distributiondiscrete probability distribution
Copula (statistics). Empirical probability. Histogram. Joint probability distribution. Likelihood function. List of statistical topics. Kirkwood approximation. Moment-generating function. Quasiprobability distribution. Riemann–Stieltjes integral application to probability theory. B. S. Everitt: The Cambridge Dictionary of Statistics, Cambridge University Press, Cambridge (3rd edition, 2006). ISBN: 0-521-69027-7. Field Guide to Continuous Probability Distributions, Gavin E. Crooks.


sample variancepopulation variancevariability
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common.

Mean squared error

mean square errorsquared error lossMSE
Two or more statistical models may be compared using their MSEs as a measure of how well they explain a given set of observations: An unbiased estimator (estimated from a statistical model) with the smallest variance among all unbiased estimators is the best unbiased estimator or MVUE (Minimum Variance Unbiased Estimator). Both linear regression techniques such as analysis of variance estimate the MSE as part of the analysis and use the estimated MSE to determine the statistical significance of the factors or predictors under study.

Data set

datasetdatasetsdata sets
These include the number and types of the attributes or variables, and various statistical measures applicable to them, such as standard deviation and kurtosis. The values may be numbers, such as real numbers or integers, for example representing a person's height in centimeters, but may also be nominal data (i.e., not consisting of numerical values), for example representing a person's ethnicity. More generally, values may be of any of the kinds described as a level of measurement. For each variable, the values are normally all of the same kind. However, there may also be missing values, which must be indicated in some way.

Bias of an estimator

unbiasedunbiased estimatorbias
Further, mean-unbiasedness is not preserved under non-linear transformations, though median-unbiasedness is (see ); for example, the sample variance is an unbiased estimator for the population variance, but its square root, the sample standard deviation, is a biased estimator for the population standard deviation. These are all illustrated below. Suppose we have a statistical model, parameterized by a real number θ, giving rise to a probability distribution for observed data, and a statistic \hat\theta which serves as an estimator of θ based on any observed data x.


The standard deviation of an estimator of \theta (the square root of the variance), or an estimate of the standard deviation of an estimator of \theta, is called the standard error of. Best linear unbiased estimator (BLUE). Invariant estimator. Kalman filter. Markov chain Monte Carlo (MCMC). Maximum a posteriori (MAP). Method of moments, generalized method of moments. Minimum mean squared error (MMSE). Particle filter. Pitman closeness criterion. Sensitivity and specificity. Shrinkage estimator. Signal Processing. Testimator. Wiener filter. Well-behaved statistic.

Consistent estimator

Important examples include the sample variance and sample standard deviation. Without Bessel's correction (that is, when using the sample size n instead of the degrees of freedom n-1), these are both negatively biased but consistent estimators. With the correction, the corrected sample variance is unbiased, while the corrected sample standard deviation is still biased, but less so, and both are still consistent: the correction factor converges to 1 as sample size grows. Here is another example. Let T_n be a sequence of estimators for \theta.

Descriptive statistics

descriptivedescriptive statisticstatistics
Measures of central tendency include the mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness. Descriptive statistics provide simple summaries about the sample and about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. These summaries may either form the basis of the initial description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.

Confidence interval

confidence intervalsconfidence levelconfidence
Identify whether the population standard deviation is known, \sigma, or is unknown and is estimated by the sample standard deviation s. If the population standard deviation is known then, where is the confidence level and \Phi is the CDF of the standard normal distribution, used as the critical value. This value is only dependent on the confidence level for the test. Typical two sided confidence levels are:. {| class="wikitable". C || z*. 99% || 2.576. 98% || 2.326. 95% || 1.96. 90% || 1.645. }. If the population standard deviation is unknown then the Student's t distribution is used as the critical value.

Standard score

Computing a z-score requires knowing the mean and standard deviation of the complete population to which a data point belongs; if one only has a sample of observations from the population, then the analogous computation with sample mean and sample standard deviation yields the t-statistic. If the population mean and population standard deviation are known, the standard score of a raw score x is calculated as where: The absolute value of z represents the distance between the raw score and the population mean in units of the standard deviation. z is negative when the raw score is below the mean, positive when above.


sample statisticempiricalmeasure
In this example, "5.6 days" is a statistic, namely the mean length of stay for our sample of 20 hotel guests. The population is the set of all guests of this hotel, and the parameter is the mean length of stay for all guests. Sample mean and sample median. Sample variance and sample standard deviation. Sample quantiles besides the median, e.g., quartiles and percentiles. Test statistics, such as t statistics, chi-squared statistics, f statistics. Order statistics, including sample maximum and minimum. Sample moments and functions thereof, including kurtosis and skewness. Various functionals of the empirical distribution function. Statistics. Statistical theory. Descriptive statistics.

Statistical population

Alternatively, given two subpopulations with the same mean and different standard deviations, the overall population will exhibit high kurtosis, with a sharper peak and heavier tails (and correspondingly shallower shoulders) than a single distribution * Statistical Terms Made Simple Sample (statistics). Sampling (statistics). Data collection system. Horvitz–Thompson estimator : Pseudo-Population.


mean valueaveragepopulation mean
Descriptive statistics. Kurtosis. Law of averages. Mean value theorem. Summary statistics. Taylor's law.

Sample (statistics)

samplesamplesstatistical sample
In other words, X_i is a function representing the measurement at the i-th experiment and is the value obtained when making the measurement. * Statistical Terms Made Simple Estimation theory. Replication (statistics). Sample size determination. Sampling (statistics). Survey sampling.

Statistical significance

statistically significantsignificantsignificance level
An effect size measure quantifies the strength of an effect, such as the distance between two means in units of standard deviation (cf. Cohen's d), the correlation coefficient between two variables or its square, and other measures. A statistically significant result may not be easy to reproduce. In particular, some statistically significant results will in fact be false positives. Each failed attempt to reproduce a result increases the likelihood that the result was a false positive.

Random variable

random variablesrandom variationrandom
Once the "average value" is known, one could then ask how far from this average value the values of X typically are, a question that is answered by the variance and standard deviation of a random variable. can be viewed intuitively as an average obtained from an infinite population, the members of which are particular evaluations of X. Mathematically, this is known as the (generalised) problem of moments: for a given class of random variables X, find a collection \{f_i\} of functions such that the expectation values fully characterise the distribution of the random variable X. Moments can only be defined for real-valued functions of random variables (or complex-valued, etc.).

Expected value

For a different example, in statistics, where one seeks estimates for unknown parameters based on available data, the estimate itself is a random variable. In such settings, a desirable criterion for a "good" estimator is that it is unbiased – that is, the expected value of the estimate is equal to the true value of the underlying parameter. The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points, which seeks to divide the stakes in a fair way between two players who have to end their game before it's properly finished.

Efficient estimator

efficientEfficiencyefficient estimators
In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors of different magnitudes. The most common choice of the loss function is quadratic, resulting in the mean squared error criterion of optimality. Suppose { P θ is a parametric model and X = (X 1, …, X n ) are the data sampled from this model. Let T = T(X) be an estimator for the parameter θ.


infinitesimal calculusdifferential and integral calculusclassical calculus
Calculus is used in every branch of the physical sciences, actuarial science, computer science, statistics, engineering, economics, business, medicine, demography, and in other fields wherever a problem can be mathematically modeled and an optimal solution is desired. It allows one to go from (non-constant) rates of change to the total change or vice versa, and many times in studying a problem we know one and are trying to find the other. Physics makes particular use of calculus; all concepts in classical mechanics and electromagnetism are related through calculus.

Sample size determination

sample sizeSampling sizessample
For a fixed sample size, that is, which can be made a minimum if the sampling rate within each stratum is made proportional to the standard deviation within each stratum:, where and k is a constant such that. An "optimum allocation" is reached when the sampling rates within the strata are made directly proportional to the standard deviations within the strata and inversely proportional to the square root of the sampling cost per element within the strata, C_h: : where K is a constant such that, or, more generally, when Sample size determination in qualitative studies takes a different approach. It is generally a subjective judgment, taken as the research proceeds.

Karl Pearson

PearsonPearson, KarlCarl Pearson
In fact, Pearson devoted much time during 1893 to 1904 to developing statistical techniques for biometry. These techniques, which are widely used today for statistical analysis, include the chi-squared test, standard deviation, and correlation and regression coefficients. Pearson's Law of Ancestral Heredity stated that germ plasm consisted of heritable elements inherited from the parents as well as from more distant ancestors, the proportion of which varied for different traits.

Standard error

SEstandard errorsstandard error of the mean
Thus, it is common to see standard deviation of the mean alternatively defined as: The standard deviation of the sample mean is equivalent to the standard deviation of the error in the sample mean with respect to the true mean, since the sample mean is an unbiased estimator. Therefore, the standard error of the mean can also be understood as the standard deviation of the error in the sample mean with respect to the true mean (or an estimate of that statistic).

Robust statistics

robustbreakdown pointrobustness
Panel (a) shows the distribution of the standard deviation, (b) of the MAD and (c) of Qn. The distribution of standard deviation is erratic and wide, a result of the outliers. The MAD is better behaved, and Qn is a little bit more efficient than MAD. This simple example demonstrates that when outliers are present, the standard deviation cannot be recommended as an estimate of scale. Traditionally, statisticians would manually screen data for outliers, and remove them, usually checking the source of the data to see whether the outliers were erroneously recorded.