# Errors and residuals

**residualserror termresidualerrorerrorsErrors and residuals in statisticsstatistical errorerror termsmodel errorsadditive error term**

In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "theoretical value".wikipedia

236 Related Articles

### Expected value

**expectationexpectedmean**

A statistical error (or disturbance) is the amount by which an observation differs from its expected value, the latter being based on the whole population from which the statistical unit was chosen randomly.

If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate).

### Arithmetic mean

**meanaveragearithmetic**

The expected value, being the mean of the entire population, is typically unobservable, and hence the statistical error cannot be observed either.

### Studentized residual

**studentized residualsexternallystudentization of residuals**

One can standardize statistical errors (especially of a normal distribution) in a z-score (or "standard score"), and standardize residuals in a t-statistic, or more generally studentized residuals. In regression analysis, the distinction between errors and residuals is subtle and important, and leads to the concept of studentized residuals.

In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation.

### Heteroscedasticity

**heteroscedasticheteroskedasticityheteroskedastic**

If they are random, or have no trend, but "fan out" - they exhibit a phenomenon called heteroscedasticity.

The existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, as it can invalidate statistical tests of significance that assume that the modelling errors are uncorrelated and uniform—hence that their variances do not vary with the effects being modeled.

### Degrees of freedom (statistics)

**degrees of freedomdegree of freedomEffective degrees of freedom**

The sum of squares of the statistical errors, divided by σ 2, has a chi-squared distribution with n degrees of freedom:

are residuals that may be considered estimates of the errors X i − μ.

### Regression analysis

**regressionmultiple regressionregression model**

In regression analysis, the distinction between errors and residuals is subtle and important, and leads to the concept of studentized residuals.

Most regression models propose that Y_i is a function of X_i and \beta, with e_i representing an additive error term that may stand in for un-modeled determinants of Y_i or random statistical noise:

### T-statistic

**Student's t-statistict''-statisticStudent's ''t''-statistic**

One can standardize statistical errors (especially of a normal distribution) in a z-score (or "standard score"), and standardize residuals in a t-statistic, or more generally studentized residuals.

One can also divide a residual by the sample standard deviation:

### Bessel's correction

**Bessel-correctedBessel corrected variance**

This difference between n and n − 1 degrees of freedom results in Bessel's correction for the estimation of sample variance of a population with unknown mean and unknown variance.

One can understand Bessel's correction as the degrees of freedom in the residuals vector (residuals, not errors, because the population mean is unknown):

### Mean squared error

**mean square errorsquared error lossMSE**

However, a terminological difference arises in the expression mean squared error (MSE). Mean square error or mean squared error (abbreviated MSE) and root mean square error (RMSE) refer to the amount by which the values predicted by an estimator differ from the quantities being estimated (typically outside the sample from which the model was estimated).

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.

### Normal distribution

**normally distributedGaussian distributionnormal**

One can standardize statistical errors (especially of a normal distribution) in a z-score (or "standard score"), and standardize residuals in a t-statistic, or more generally studentized residuals.

In regression analysis, lack of normality in residuals simply indicates that the model postulated is inadequate in accounting for the tendency in the data and needs to be augmented; in other words, normality in residuals can always be achieved given a properly constructed model.

### Analysis of variance

**ANOVAanalysis of variance (ANOVA)corrected the means**

Another method to calculate the mean square of error when analyzing the variance of linear regression using a technique like that used in ANOVA (they are the same because ANOVA is a type of regression), the sum of squares of the residuals (aka sum of squares of the error) is divided by the degrees of freedom (where the degrees of freedom equal n − p − 1, where p is the number of parameters estimated in the model (one for each variable in the regression equation).

The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors (\varepsilon) are independent and

### Root-mean-square deviation

**root mean squared errorroot mean square deviationRMSD**

Mean square error or mean squared error (abbreviated MSE) and root mean square error (RMSE) refer to the amount by which the values predicted by an estimator differ from the quantities being estimated (typically outside the sample from which the model was estimated).

These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample.

### Residual sum of squares

**sum of squared residualssum of squares of residualsresidual sum-of-squares**

Sum of squared errors, typically abbreviated SSE or SS e, refers to the residual sum of squares (the sum of squared residuals) of a regression; this is the sum of the squares of the deviations of the actual values from the predicted values, within the sample used for estimation.

In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data).

### Squared deviations from the mean

**Squared deviationsSum of squared differencessum of squared deviations**

It is remarkable that the sum of squares of the residuals and the sample mean can be shown to be independent of each other, using, e.g. Basu's theorem.

### Student's t-distribution

**Student's ''t''-distributiont-distributiont''-distribution**

That is fortunate because it means that even though we do not know σ, we know the probability distribution of this quotient: it has a Student's t-distribution with n − 1 degrees of freedom.

Student's t-distribution arises in a variety of statistical estimation problems where the goal is to estimate an unknown parameter, such as a mean value, in a setting where the data are observed with additive errors.

### Propagation of uncertainty

**error propagationtheory of errorspropagation of error**

In statistics, propagation of uncertainty (or propagation of error) is the effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them.

### Sampling error

**sampling variabilitysampling variationless reliable**

Since sampling is typically done to determine the characteristics of a whole population, the difference between the sample and population values is considered an error.

### Lack-of-fit sum of squares

**error sum of squaressum of squared errors**

are the residuals, which are observable estimates of the unobservable values of the error term ε ij.

### Observational error

**systematic errormeasurement errorsystematic bias**

When either randomness or uncertainty modeled by probability theory is attributed to such errors, they are "errors" in the sense in which that term is used in statistics; see errors and residuals in statistics.

### Linear regression

**regression coefficientmultiple linear regressionregression**

Concretely, in a linear regression where the errors are identically distributed, the variability of residuals of inputs in the middle of the domain will be higher than the variability of residuals at the ends of the domain : linear regressions fit endpoints better than the middle.

### Innovation (signal processing)

**InnovationinnovationsInnovations vector**

### Univariate distribution

**univariateuni**

Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model).

### Mean

**mean valueaveragepopulation mean**

Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model).

### Location parameter

**locationlocation modellocation parameters**

Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model).

### Statistical population

**populationsubpopulationsubpopulations**

A statistical error (or disturbance) is the amount by which an observation differs from its expected value, the latter being based on the whole population from which the statistical unit was chosen randomly.