Robust statistics

robustbreakdown pointrobustnessrobust statisticrobust estimatorinfluence functionsresistant statisticrobust estimationstatistically resistantEmpirical influence function
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.wikipedia
182 Related Articles

Normal distribution

normally distributedGaussian distributionnormal
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.
In those cases, a more heavy-tailed distribution should be assumed and the appropriate robust statistical inference methods applied.

Standard deviation

standard deviationssample standard deviationSD
For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly. The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not.
It is algebraically simpler, though in practice less robust, than the average absolute deviation.

Estimator

estimatorsestimateestimates
Unfortunately, when there are outliers in the data, classical estimators often have very poor performance, when judged using the breakdown point and the influence function, described below. This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency, and reasonably small bias, as well as being asymptotically unbiased, meaning having a bias tending towards 0 as the sample size tends towards infinity.
However, in robust statistics, statistical theory goes on to consider the balance between having good properties, if tightly defined assumptions hold, and having less good properties that hold under wider conditions.

Median absolute deviation

MAD
The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not. The plots below show the bootstrap distributions of the standard deviation, the median absolute deviation (MAD) and the Rousseeuw–Croux [[Robust measures of scale#Robust measures of scale based on absolute pairwise differences|(Qn) estimator]] of scale.
In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data.

Median

averagesample medianmedian-unbiased estimator
The median is a robust measure of central tendency.
Because of this, the median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median will not give an arbitrarily large or small result.

L-estimator

L-estimation
L-estimators are a general class of simple statistics, often robust, while M-estimators are a general class of robust statistics, and are now the preferred solution, though they can be quite involved to calculate.
The main benefits of L-estimators are that they are often extremely simple, and often robust statistics: assuming sorted data, they are very easy to calculate and interpret, and are often resistant to outliers.

Trimmed estimator

trimmedtrimming
Trimmed estimators and Winsorised estimators are general methods to make statistics more robust.
This is generally done to obtain a more robust statistic, and the extreme values are considered outliers.

Arithmetic mean

meanaveragearithmetic
The mean is not a robust measure of central tendency.
While the arithmetic mean is often used to report central tendencies, it is not a robust statistic, meaning that it is greatly influenced by outliers (values that are very much larger or smaller than most of the values).

Statistic

sample statisticempiricalmeasure
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.
Important potential properties of statistics include completeness, consistency, sufficiency, unbiasedness, minimum mean square error, low variance, robustness, and computational convenience.

Interquartile range

inter-quartile rangebelowinterquartile
The median absolute deviation and interquartile range are robust measures of statistical dispersion, while the standard deviation and range are not.
Unlike total range, the interquartile range has a breakdown point of 25%, and is thus often preferred to the total range.

Robust measures of scale

Robust standard deviation(Qn) estimatorrobust estimator of dispersion
The plots below show the bootstrap distributions of the standard deviation, the median absolute deviation (MAD) and the Rousseeuw–Croux [[Robust measures of scale#Robust measures of scale based on absolute pairwise differences|(Qn) estimator]] of scale.
In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of numerical data.

Efficiency (statistics)

efficientefficiencyinefficient
This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency, and reasonably small bias, as well as being asymptotically unbiased, meaning having a bias tending towards 0 as the sample size tends towards infinity.
For example, the median is far more robust to outliers, so that if the Gaussian model is questionable or approximate, there may advantages to using the median (see Robust statistics).

Outlier

outliersstatistical outliersconservative estimate
One motivation is to produce statistical methods that are not unduly affected by outliers.
In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution.

Statistical assumption

assumptionsStatistical assumptionsdistributional assumption
Robust statistics seek to provide methods that emulate popular statistical methods, but which are not unduly affected by outliers or other small departures from model assumptions.

M-estimator

M-estimationM-estimatorsestimation
In fact, the mean, median and trimmed mean are all special cases of M-estimators.
The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators.

Mixture distribution

mixture densitymixturedensity mixture
Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a few outliers – and instead one uses robust statistics.

Student's t-distribution

Student's ''t''-distributiont-distributiont''-distribution
However, it is not always easy to identify outliers (especially in high dimensions), and the t-distribution is a natural choice of model for such data and provides a parametric approach to robust statistics.

Truncated mean

trimmed meanModified meanOlympic average
Panels (c) and (d) of the plot show the bootstrap distribution of the mean (c) and the 10% trimmed mean (d).
In this regard it is referred to as a robust estimator.

Data set

datasetdatasetsdata sets
The data sets for that book can be found via the Classic data sets page, and the book's website contains more information on the data.

Robust regression

robust estimationRobustrobust linear model
In robust statistics, robust regression is a form of regression analysis designed to overcome some limitations of traditional parametric and non-parametric methods.

Unit-weighted regression

unit weights
In statistics, unit-weighted regression is a simplified and robust version (Wainer & Thissen, 1976) of multiple regression analysis where only the intercept term is estimated.

Robust confidence intervals

In statistics a robust confidence interval is a robust modification of confidence intervals, meaning that one modifies the non-robust calculations of the confidence interval so that they are not badly affected by outlying or aberrant observations in a data-set.

Missing data

missing valuesmissing at randomincomplete data
Replacing missing data is called imputation.
In situations where missing values are likely to occur, the researcher is often advised on planning to use methods of data analysis methods that are robust to missingness.

Bootstrapping (statistics)

bootstrapbootstrappingbootstrap support
The analysis was performed in R and 10,000 bootstrap samples were used for each of the raw and trimmed means.
(The sample mean need not be a consistent estimator for any population mean, because no mean need exist for a heavy-tailed distribution.) A well-defined and robust statistic for central tendency is the sample median, which is consistent and median-unbiased for the population median.

Winsorizing

WinsorisingwinsorizationWinsorised estimators
Trimmed estimators and Winsorised estimators are general methods to make statistics more robust.
Winsorized estimators are usually more robust to outliers than their more standard forms, although there are alternatives, such as trimming, that will achieve a similar effect.