# Robust statistics

**robustbreakdown pointrobustnessrobust statisticrobust estimatorinfluence functionsresistant statisticrobust estimationstatistically resistantinfluence**

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.wikipedia

176 Related Articles

### Normal distribution

**normally distributednormalGaussian**

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.

In those cases, a more heavy-tailed distribution should be assumed and the appropriate robust statistical inference methods applied.

### Standard deviation

**standard deviationssample standard deviationsigma**

For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly. The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not.

It is algebraically simpler, though in practice less robust, than the average absolute deviation.

### L-estimator

**L-estimation**

L-estimators are a general class of simple statistics, often robust, while M-estimators are a general class of robust statistics, and are now the preferred solution, though they can be quite involved to calculate.

The main benefits of L-estimators are that they are often extremely simple, and often robust statistics: assuming sorted data, they are very easy to calculate and interpret, and are often resistant to outliers.

### Trimmed estimator

**trimmedtrimming**

Trimmed estimators and Winsorised estimators are general methods to make statistics more robust.

This is generally done to obtain a more robust statistic, and the extreme values are considered outliers.

### Median

**averagesample medianmedian-unbiased estimator**

The median is a robust measure of central tendency, while the mean is not. The median has a breakdown point of 50%, while the mean has a breakdown point of 0% (a single large observation can throw it off).

Because of this, the median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median will not give an arbitrarily large or small result.

### Median absolute deviation

**MAD**

The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not. The plots below show the bootstrap distributions of the standard deviation, median absolute deviation (MAD) and [[Robust measures of scale#Robust measures of scale based on absolute pairwise differences|Qn estimator]] of scale.

In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data.

### Parametric statistics

**parametricparametric testparametric inference**

Another motivation is to provide methods with good performance when there are small departures from parametric distributions.

However, as more is assumed by parametric methods, when the assumptions are not correct they have a greater chance of failing, and for this reason are not robust statistical methods.

### Estimator

**estimatorsestimateestimates**

Unfortunately, when there are outliers in the data, classical estimators often have very poor performance, when judged using the breakdown point and the influence function, described below. This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency, and reasonably small bias, as well as being asymptotically unbiased, meaning having a bias tending towards 0 as the sample size tends towards infinity.

However, in robust statistics, statistical theory goes on to consider the balance between having good properties, if tightly defined assumptions hold, and having less good properties that hold under wider conditions.

### Arithmetic mean

**meanaveragearithmetic**

The median is a robust measure of central tendency, while the mean is not. The median has a breakdown point of 50%, while the mean has a breakdown point of 0% (a single large observation can throw it off).

While the arithmetic mean is often used to report central tendencies, it is not a robust statistic, meaning that it is greatly influenced by outliers (values that are very much larger or smaller than most of the values).

### Statistic

**sample statisticempiricalmeasure**

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.

Important potential properties of statistics include completeness, consistency, sufficiency, unbiasedness, minimum mean square error, low variance, robustness, and computational convenience.

### Interquartile range

**inter-quartile rangebelowinterquartile**

The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not.

Unlike total range, the interquartile range has a breakdown point of 25%, and is thus often preferred to the total range.

### Robust measures of scale

**Qn estimatorrobust estimator of dispersionrobust measure of scale**

The plots below show the bootstrap distributions of the standard deviation, median absolute deviation (MAD) and [[Robust measures of scale#Robust measures of scale based on absolute pairwise differences|Qn estimator]] of scale.

In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of numerical data.

### Outlier

**outliersconservative estimateirregularities**

One motivation is to produce statistical methods that are not unduly affected by outliers.

In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution.

### Statistical assumption

**assumptionsmodel assumptionsstatistical assumptions**

Robust statistics seek to provide methods that emulate popular statistical methods, but which are not unduly affected by outliers or other small departures from model assumptions.

Robust statistics

### M-estimator

**M-estimationestimation**

In fact, the mean, median and trimmed mean are all special cases of M-estimators.

The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators.

### Mixture distribution

**mixturemixture densitydensity mixture**

by replacing estimators that are optimal under the assumption of a normal distribution with estimators that are optimal for, or at least derived for, other distributions: for example using the t-distribution with low degrees of freedom (high kurtosis; degrees of freedom between 4 and 6 have often been found to be useful in practice ) or with a mixture of two or more distributions.

Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a few outliers – and instead one uses robust statistics.

### Student's t-distribution

**Student's ''t''-distributiont''-distributiont-distribution**

by replacing estimators that are optimal under the assumption of a normal distribution with estimators that are optimal for, or at least derived for, other distributions: for example using the t-distribution with low degrees of freedom (high kurtosis; degrees of freedom between 4 and 6 have often been found to be useful in practice ) or with a mixture of two or more distributions.

However, it is not always easy to identify outliers (especially in high dimensions), and the t-distribution is a natural choice of model for such data and provides a parametric approach to robust statistics.

### Efficiency (statistics)

**efficientefficiencyinefficient**

This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency, and reasonably small bias, as well as being asymptotically unbiased, meaning having a bias tending towards 0 as the sample size tends towards infinity.

For example, the median is far more robust to outliers, so that if the Gaussian model is questionable or approximate, there may advantages to using the median (see Robust statistics).

### Truncated mean

**trimmed meanmodified mean**

Panels (c) and (d) of the plot show the bootstrap distribution of the mean (c) and the 10% trimmed mean (d).

In this regard it is referred to as a robust estimator.

### Robust regression

**robust estimationRobustrobust linear model**

Robust regression

In robust statistics, robust regression is a form of regression analysis designed to overcome some limitations of traditional parametric and non-parametric methods.

### Robust confidence intervals

Robust confidence intervals

In statistics a robust confidence interval is a robust modification of confidence intervals, meaning that one modifies the non-robust calculations of the confidence interval so that they are not badly affected by outlying or aberrant observations in a data-set.

### Unit-weighted regression

**unit weights**

Unit-weighted regression

In statistics, unit-weighted regression is a simplified and robust version (Wainer & Thissen, 1976) of multiple regression analysis where only the intercept term is estimated.

### Data set

**datasetdatasetsdata**

The data sets for that book can be found via the Classic data sets page, and the book's website contains more information on the data.

Robust statistics – Data sets used in Robust Regression and Outlier Detection (Rousseeuw and Leroy, 1986). Provided on-line at the University of Cologne.

### Missing data

**missing valuesincomplete datamissing at random**

Replacing missing data is called imputation.

In situations where missing values are likely to occur, the researcher is often advised on planning to use methods of data analysis methods that are robust to missingness.

### Bootstrapping (statistics)

**bootstrapbootstrappingbootstrap support**

The analysis was performed in R and 10,000 bootstrap samples were used for each of the raw and trimmed means.

(Note that the sample mean need not be a consistent estimator for any population mean, because no mean need exist for a heavy-tailed distribution.) A well-defined and robust statistic for central tendency is the sample median, which is consistent and median-unbiased for the population median.