Errors and residuals

residualserror termresidualerrorerrorsErrors and residuals in statisticsstatistical errorerror termsmodel errorsadditive error term
In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "theoretical value".wikipedia
236 Related Articles

Expected value

expectationexpectedmean
A statistical error (or disturbance) is the amount by which an observation differs from its expected value, the latter being based on the whole population from which the statistical unit was chosen randomly.
If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate).

Arithmetic mean

meanaveragearithmetic
The expected value, being the mean of the entire population, is typically unobservable, and hence the statistical error cannot be observed either.

Studentized residual

studentized residualsexternallystudentization of residuals
One can standardize statistical errors (especially of a normal distribution) in a z-score (or "standard score"), and standardize residuals in a t-statistic, or more generally studentized residuals. In regression analysis, the distinction between errors and residuals is subtle and important, and leads to the concept of studentized residuals.
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation.

Heteroscedasticity

heteroscedasticheteroskedasticityheteroskedastic
If they are random, or have no trend, but "fan out" - they exhibit a phenomenon called heteroscedasticity.
The existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, as it can invalidate statistical tests of significance that assume that the modelling errors are uncorrelated and uniform—hence that their variances do not vary with the effects being modeled.

Degrees of freedom (statistics)

degrees of freedomdegree of freedomEffective degrees of freedom
The sum of squares of the statistical errors, divided by σ 2, has a chi-squared distribution with n degrees of freedom:
are residuals that may be considered estimates of the errors X i − μ.

Regression analysis

regressionmultiple regressionregression model
In regression analysis, the distinction between errors and residuals is subtle and important, and leads to the concept of studentized residuals.
Most regression models propose that Y_i is a function of X_i and \beta, with e_i representing an additive error term that may stand in for un-modeled determinants of Y_i or random statistical noise:

T-statistic

Student's t-statistict''-statisticStudent's ''t''-statistic
One can standardize statistical errors (especially of a normal distribution) in a z-score (or "standard score"), and standardize residuals in a t-statistic, or more generally studentized residuals.
One can also divide a residual by the sample standard deviation:

Bessel's correction

Bessel-correctedBessel corrected variance
This difference between n and n − 1 degrees of freedom results in Bessel's correction for the estimation of sample variance of a population with unknown mean and unknown variance.
One can understand Bessel's correction as the degrees of freedom in the residuals vector (residuals, not errors, because the population mean is unknown):

Mean squared error

mean square errorsquared error lossMSE
However, a terminological difference arises in the expression mean squared error (MSE). Mean square error or mean squared error (abbreviated MSE) and root mean square error (RMSE) refer to the amount by which the values predicted by an estimator differ from the quantities being estimated (typically outside the sample from which the model was estimated).
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.

Normal distribution

normally distributedGaussian distributionnormal
One can standardize statistical errors (especially of a normal distribution) in a z-score (or "standard score"), and standardize residuals in a t-statistic, or more generally studentized residuals.
In regression analysis, lack of normality in residuals simply indicates that the model postulated is inadequate in accounting for the tendency in the data and needs to be augmented; in other words, normality in residuals can always be achieved given a properly constructed model.

Analysis of variance

ANOVAanalysis of variance (ANOVA)corrected the means
Another method to calculate the mean square of error when analyzing the variance of linear regression using a technique like that used in ANOVA (they are the same because ANOVA is a type of regression), the sum of squares of the residuals (aka sum of squares of the error) is divided by the degrees of freedom (where the degrees of freedom equal n − p − 1, where p is the number of parameters estimated in the model (one for each variable in the regression equation).
The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors (\varepsilon) are independent and

Root-mean-square deviation

root mean squared errorroot mean square deviationRMSD
Mean square error or mean squared error (abbreviated MSE) and root mean square error (RMSE) refer to the amount by which the values predicted by an estimator differ from the quantities being estimated (typically outside the sample from which the model was estimated).
These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample.

Residual sum of squares

sum of squared residualssum of squares of residualsresidual sum-of-squares
Sum of squared errors, typically abbreviated SSE or SS e, refers to the residual sum of squares (the sum of squared residuals) of a regression; this is the sum of the squares of the deviations of the actual values from the predicted values, within the sample used for estimation.
In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data).

Squared deviations from the mean

Squared deviationsSum of squared differencessum of squared deviations
It is remarkable that the sum of squares of the residuals and the sample mean can be shown to be independent of each other, using, e.g. Basu's theorem.

Student's t-distribution

Student's ''t''-distributiont-distributiont''-distribution
That is fortunate because it means that even though we do not know σ, we know the probability distribution of this quotient: it has a Student's t-distribution with n − 1 degrees of freedom.
Student's t-distribution arises in a variety of statistical estimation problems where the goal is to estimate an unknown parameter, such as a mean value, in a setting where the data are observed with additive errors.

Propagation of uncertainty

error propagationtheory of errorspropagation of error
In statistics, propagation of uncertainty (or propagation of error) is the effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them.

Sampling error

sampling variabilitysampling variationless reliable
Since sampling is typically done to determine the characteristics of a whole population, the difference between the sample and population values is considered an error.

Lack-of-fit sum of squares

error sum of squaressum of squared errors
are the residuals, which are observable estimates of the unobservable values of the error term ε ij.

Observational error

systematic errormeasurement errorsystematic bias
When either randomness or uncertainty modeled by probability theory is attributed to such errors, they are "errors" in the sense in which that term is used in statistics; see errors and residuals in statistics.

Linear regression

regression coefficientmultiple linear regressionregression
Concretely, in a linear regression where the errors are identically distributed, the variability of residuals of inputs in the middle of the domain will be higher than the variability of residuals at the ends of the domain : linear regressions fit endpoints better than the middle.

Univariate distribution

univariateuni
Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model).

Mean

mean valueaveragepopulation mean
Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model).

Location parameter

locationlocation modellocation parameters
Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model).

Statistical population

populationsubpopulationsubpopulations
A statistical error (or disturbance) is the amount by which an observation differs from its expected value, the latter being based on the whole population from which the statistical unit was chosen randomly.