Robust statistics

robustbreakdown pointrobustnessrobust statisticrobust estimatorinfluence functionsresistant statisticrobust estimationstatistically resistantinfluence
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.wikipedia
176 Related Articles

Normal distribution

normally distributednormalGaussian
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.
In those cases, a more heavy-tailed distribution should be assumed and the appropriate robust statistical inference methods applied.

Standard deviation

standard deviationssample standard deviationsigma
For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly. The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not.
It is algebraically simpler, though in practice less robust, than the average absolute deviation.

L-estimator

L-estimation
L-estimators are a general class of simple statistics, often robust, while M-estimators are a general class of robust statistics, and are now the preferred solution, though they can be quite involved to calculate.
The main benefits of L-estimators are that they are often extremely simple, and often robust statistics: assuming sorted data, they are very easy to calculate and interpret, and are often resistant to outliers.

Trimmed estimator

trimmedtrimming
Trimmed estimators and Winsorised estimators are general methods to make statistics more robust.
This is generally done to obtain a more robust statistic, and the extreme values are considered outliers.

Median

averagesample medianmedian-unbiased estimator
The median is a robust measure of central tendency, while the mean is not. The median has a breakdown point of 50%, while the mean has a breakdown point of 0% (a single large observation can throw it off).
Because of this, the median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median will not give an arbitrarily large or small result.

Median absolute deviation

MAD
The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not. The plots below show the bootstrap distributions of the standard deviation, median absolute deviation (MAD) and [[Robust measures of scale#Robust measures of scale based on absolute pairwise differences|Qn estimator]] of scale.
In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data.

Parametric statistics

parametricparametric testparametric inference
Another motivation is to provide methods with good performance when there are small departures from parametric distributions.
However, as more is assumed by parametric methods, when the assumptions are not correct they have a greater chance of failing, and for this reason are not robust statistical methods.

Estimator

estimatorsestimateestimates
Unfortunately, when there are outliers in the data, classical estimators often have very poor performance, when judged using the breakdown point and the influence function, described below. This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency, and reasonably small bias, as well as being asymptotically unbiased, meaning having a bias tending towards 0 as the sample size tends towards infinity.
However, in robust statistics, statistical theory goes on to consider the balance between having good properties, if tightly defined assumptions hold, and having less good properties that hold under wider conditions.

Arithmetic mean

meanaveragearithmetic
The median is a robust measure of central tendency, while the mean is not. The median has a breakdown point of 50%, while the mean has a breakdown point of 0% (a single large observation can throw it off).
While the arithmetic mean is often used to report central tendencies, it is not a robust statistic, meaning that it is greatly influenced by outliers (values that are very much larger or smaller than most of the values).

Statistic

sample statisticempiricalmeasure
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.
Important potential properties of statistics include completeness, consistency, sufficiency, unbiasedness, minimum mean square error, low variance, robustness, and computational convenience.

Interquartile range

inter-quartile rangebelowinterquartile
The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not.
Unlike total range, the interquartile range has a breakdown point of 25%, and is thus often preferred to the total range.

Robust measures of scale

Qn estimatorrobust estimator of dispersionrobust measure of scale
The plots below show the bootstrap distributions of the standard deviation, median absolute deviation (MAD) and [[Robust measures of scale#Robust measures of scale based on absolute pairwise differences|Qn estimator]] of scale.
In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of numerical data.

Outlier

outliersconservative estimateirregularities
One motivation is to produce statistical methods that are not unduly affected by outliers.
In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution.

Statistical assumption

assumptionsmodel assumptionsstatistical assumptions
Robust statistics seek to provide methods that emulate popular statistical methods, but which are not unduly affected by outliers or other small departures from model assumptions.
Robust statistics

M-estimator

M-estimationestimation
In fact, the mean, median and trimmed mean are all special cases of M-estimators.
The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators.

Mixture distribution

mixturemixture densitydensity mixture
by replacing estimators that are optimal under the assumption of a normal distribution with estimators that are optimal for, or at least derived for, other distributions: for example using the t-distribution with low degrees of freedom (high kurtosis; degrees of freedom between 4 and 6 have often been found to be useful in practice ) or with a mixture of two or more distributions.
Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a few outliers – and instead one uses robust statistics.

Student's t-distribution

Student's ''t''-distributiont''-distributiont-distribution
by replacing estimators that are optimal under the assumption of a normal distribution with estimators that are optimal for, or at least derived for, other distributions: for example using the t-distribution with low degrees of freedom (high kurtosis; degrees of freedom between 4 and 6 have often been found to be useful in practice ) or with a mixture of two or more distributions.
However, it is not always easy to identify outliers (especially in high dimensions), and the t-distribution is a natural choice of model for such data and provides a parametric approach to robust statistics.

Efficiency (statistics)

efficientefficiencyinefficient
This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency, and reasonably small bias, as well as being asymptotically unbiased, meaning having a bias tending towards 0 as the sample size tends towards infinity.
For example, the median is far more robust to outliers, so that if the Gaussian model is questionable or approximate, there may advantages to using the median (see Robust statistics).

Truncated mean

trimmed meanmodified mean
Panels (c) and (d) of the plot show the bootstrap distribution of the mean (c) and the 10% trimmed mean (d).
In this regard it is referred to as a robust estimator.

Robust regression

robust estimationRobustrobust linear model
Robust regression
In robust statistics, robust regression is a form of regression analysis designed to overcome some limitations of traditional parametric and non-parametric methods.

Robust confidence intervals

Robust confidence intervals
In statistics a robust confidence interval is a robust modification of confidence intervals, meaning that one modifies the non-robust calculations of the confidence interval so that they are not badly affected by outlying or aberrant observations in a data-set.

Unit-weighted regression

unit weights
Unit-weighted regression
In statistics, unit-weighted regression is a simplified and robust version (Wainer & Thissen, 1976) of multiple regression analysis where only the intercept term is estimated.

Data set

datasetdatasetsdata
The data sets for that book can be found via the Classic data sets page, and the book's website contains more information on the data.
Robust statistics – Data sets used in Robust Regression and Outlier Detection (Rousseeuw and Leroy, 1986). Provided on-line at the University of Cologne.

Missing data

missing valuesincomplete datamissing at random
Replacing missing data is called imputation.
In situations where missing values are likely to occur, the researcher is often advised on planning to use methods of data analysis methods that are robust to missingness.

Bootstrapping (statistics)

bootstrapbootstrappingbootstrap support
The analysis was performed in R and 10,000 bootstrap samples were used for each of the raw and trimmed means.
(Note that the sample mean need not be a consistent estimator for any population mean, because no mean need exist for a heavy-tailed distribution.) A well-defined and robust statistic for central tendency is the sample median, which is consistent and median-unbiased for the population median.