# Outlier

**outliersconservative estimateirregularitiesoutlier detectionstatistical outliers**

In statistics, an outlier is an observation point that is distant from other observations.wikipedia

214 Related Articles

### Robust statistics

**robustbreakdown pointrobustness**

In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution.

One motivation is to produce statistical methods that are not unduly affected by outliers.

### Normal distribution

**normally distributednormalGaussian**

In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution. In the case of normally distributed data, the three sigma rule means that roughly 1 in 22 observations will differ by twice the standard deviation or more from the mean, and 1 in 370 will deviate by three times the standard deviation.

Therefore, it may not be an appropriate model when one expects a significant fraction of outliers—values that lie many standard deviations away from the mean—and least squares and other statistical inference methods that are optimal for normally distributed variables often become highly unreliable when applied to such data.

### Arithmetic mean

**meanaveragearithmetic**

In most larger samplings of data, some data points will be further away from the sample mean than what is deemed reasonable.

While the arithmetic mean is often used to report central tendencies, it is not a robust statistic, meaning that it is greatly influenced by outliers (values that are very much larger or smaller than most of the values).

### Median

**averagesample medianmedian-unbiased estimator**

For example, if one is calculating the average temperature of 10 objects in a room, and nine of them are between 20 and 25 degrees Celsius, but an oven is at 175 °C, the median of the data will be between 20 and 25 °C but the mean temperature will be between 35.5 and 40 °C. In this case, the median better reflects the temperature of a randomly sampled object (but not the temperature in the room) than the mean; naively interpreting the mean as "a typical sample", equivalent to the median, is incorrect.

The median is a popular summary statistic used in descriptive statistics, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.

### Sample maximum and minimum

**sample maximumsample minimumMaximum**

Outliers, being the most extreme observations, may include the sample maximum or sample minimum, or both, depending on whether they are extremely high or low.

If the sample has outliers, they necessarily include the sample maximum or sample minimum, or both, depending on whether they are extremely high or low.

### Box plot

**boxplotbox and whisker plotadjusted boxplots**

Box plots are a hybrid.

Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points.

### Normal probability plot

Some are graphical such as normal probability plots.

This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures.

### King effect

Additionally, the pathological appearance of outliers of a certain form appears in a variety of datasets, indicating that the causative mechanism for the data might differ at the extreme end (King effect).

In statistics, economics, and econophysics, the King effect refers to the phenomenon where the top one or two members of a ranked set show up as outliers.

### 68–95–99.7 rule

**3-sigma68-95-99.7 rulethree sigma rule**

In the case of normally distributed data, the three sigma rule means that roughly 1 in 22 observations will differ by twice the standard deviation or more from the mean, and 1 in 370 will deviate by three times the standard deviation.

It is also as a simple test for outliers if the population is assumed normal, and as a normality test if the population is potentially not normal.

### Mixture model

**mixture modelsGaussian mixture modelGMM**

A frequent cause of outliers is a mixture of two distributions, which may be two distinct sub-populations, or may indicate 'correct trial' versus 'measurement error'; this is modeled by a mixture model.

outliers.

### Grubbs' test for outliers

Grubbs' test for outliers

Grubbs's test (named after Frank E. Grubbs, who published the test in 1950 ), also known as the maximum normalized residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population.

### Chauvenet's criterion

Chauvenet's criterion

In statistical theory, Chauvenet's criterion (named for William Chauvenet ) is a means of assessing whether one piece of experimental data — an outlier — from a set of observations, is likely to be spurious.

### Heavy-tailed distribution

**heavy tailsheavy-tailedheavy tail**

Outliers can occur by chance in any distribution, but they often indicate either measurement error or that the population has a heavy-tailed distribution.

Outlier

### Dixon's Q test

Dixon's Q test

In statistics, Dixon's Q test, or simply the Q test, is used for identification and rejection of outliers.

### Estimation of covariance matrices

**Shrinkage estimationcovariance estimationestimate of the covariance**

Some estimators are highly sensitive to outliers, notably estimation of covariance matrices.

Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

### Interquartile range

**inter-quartile rangebelowinterquartile**

Other methods flag observations based on measures such as the interquartile range.

The IQR can be used to identify outliers (see below).

### Local outlier factor

**LOFLocal Outlier Factor (LOF)**

In the data mining task of anomaly detection, other approaches are distance-based and density-based such as Local Outlier Factor (LOF), and most of them use the distance to the k-nearest neighbors to label observations as outliers or non-outliers.

These are considered to be outliers.

### Quartile

**quartileslower quartilelower and upper quartiles**

For example, if Q_1 and Q_3 are the lower and upper quartiles respectively, then one could define an outlier to be any observation outside the range:

There are methods by which to check for outliers in the discipline of statistics and statistical analysis.

### Cook's distance

**Cook's ''D**

In regression problems, an alternative approach may be to only exclude points which exhibit a large degree of influence on the estimated coefficients, using a measure such as Cook's distance.

Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression.

### Robust regression

**robust estimationRobustrobust linear model**

Robust regression

In particular, least squares estimates for regression models are highly sensitive to (i.e. not robust against) outliers.

### Winsorizing

**Winsorised estimatorswinsorized**

The two common approaches to exclude outliers are truncation (or trimming) and Winsorising.

Winsorizing or winsorization is the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers.

### Mahalanobis distance

**Chi distanceMahalanobis metricMahalanobis**

Mahalanobis distance and leverage are often used to detect outliers, especially in the development of linear regression models.

Mahalanobis distance and leverage are often used to detect outliers, especially in the development of linear regression models.

### Studentized residual

**studentized residualsexternallystudentization of residuals**

Studentized residual

This is an important technique in the detection of outliers.

### Anomaly (natural sciences)

**anomaly time series**

Anomaly (natural sciences)

Robust statistics, resistant to the effects of outliers, are sometimes used as the basis of the transformation.

### Data analysis

**data analyticsanalysisdata analyst**

If a data point (or points) is excluded from the data analysis, this should be clearly stated on any subsequent report.

In the case of outliers: should one use robust analysis techniques?