# Overfitting

**overfitover-fitover-fittedoverfittedexcessive number of parametersover-trainedoverfitsOverfitting (machine learning)robust machine learningUnderfitting**

In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".wikipedia

148 Related Articles

### Machine learning

**machine-learninglearningstatistical learning**

Overfitting and underfitting can occur in machine learning, in particular.

But if the hypothesis is too complex, then the model is subject to overfitting and generalization will be poorer.

### Cross-validation (statistics)

**cross-validationcross validationLeave-one-out cross-validation**

To lessen the chance of, or amount of, overfitting, several techniques are available (e.g. model comparison, cross-validation, regularization, early stopping, pruning, Bayesian priors, or dropout).

The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem).

### Regularization (mathematics)

**regularizationregularizedregularize**

To lessen the chance of, or amount of, overfitting, several techniques are available (e.g. model comparison, cross-validation, regularization, early stopping, pruning, Bayesian priors, or dropout).

In mathematics, statistics, and computer science, particularly in machine learning and inverse problems, regularization is the process of adding information in order to solve an ill-posed problem or to prevent overfitting.

### Early stopping

To lessen the chance of, or amount of, overfitting, several techniques are available (e.g. model comparison, cross-validation, regularization, early stopping, pruning, Bayesian priors, or dropout).

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent.

### Shrinkage (statistics)

**shrinkageshrinkshrinking**

In particular, the value of the coefficient of determination will shrink relative to the original data.

### Training, validation, and test sets

**training settraining datatest set**

For example, a model might be selected by maximizing its performance on some set of training data, and yet its suitability might be determined by its ability to perform well on unseen data; then overfitting occurs when a model begins to "memorize" training data rather than "learning" to generalize from a trend.

Validation datasets can be used for regularization by early stopping: stop training when the error on the validation dataset increases, as this is a sign of overfitting to the training dataset.

### Decision tree pruning

**pruningprunedPruning (decision trees)**

Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

### Linear regression

**regression coefficientmultiple linear regressionregression**

As an extreme example, if there are p variables in a linear regression with p data points, the fitted line can go exactly through every point.

### Dropout (neural networks)

**dropout**

Dropout is a regularization technique patented by Google for reducing overfitting in neural networks by preventing complex co-adaptations on training data.

### Bias–variance tradeoff

**Bias-variance dilemmabias-variance tradeoffvariance**

The bias–variance tradeoff is often used to overcome overfit models.

### Robustness (computer science)

**robustnessrobustNumerical robustness**

A learning algorithm that can reduce the chance of fitting noise is called "robust."

Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust Security Network.

### Occam's razor

**parsimonyparsimoniousOckham's razor**

Burnham & Anderson, in their much-cited text on model selection, argue that to avoid overfitting, we should adhere to the "Principle of Parsimony".

In the related concept of overfitting, excessively complex models are affected by statistical noise (a problem also known as the bias-variance trade-off), whereas simpler models may capture the underlying structure better and may thus have better predictive performance.

### One in ten rule

For logistic regression or Cox proportional hazards models, there are a variety of rules of thumb (e.g. 5–9, 10 and 10–15 — the guideline of 10 observations per independent variable is known as the "one in ten rule").

In statistics, the one in ten rule is a rule of thumb for how many predictor parameters can be estimated from data when doing regression analysis (in particular proportional hazards models in survival analysis and logistic regression) while keeping the risk of overfitting low.

### Feature selection

**variable selectionfeaturesselecting**

### Generalization error

**generalization**

Generalization error can be minimized by avoiding overfitting in the learning algorithm.

### Vapnik–Chervonenkis dimension

**VC dimensionVC-DimensionVapnik Chervonenkis dimension**

This is due to overfitting).

### Data dredging

**p-hackingp''-hackingdata snooping**

### Goodness of fit

**goodness-of-fitfitgoodness-of-fit test**

### Statistical model

**modelprobabilistic modelstatistical modeling**

An overfitted model is a statistical model that contains more parameters than can be justified by the data.

### Parameter

**parametersparametricargument**

An overfitted model is a statistical model that contains more parameters than can be justified by the data.

### Fraction of variance unexplained

**statistical noisenoisenoisy**

The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e. the noise) as if that variation represented underlying model structure.

### Model selection

**statistical model selectionselectingchoose a model**

To lessen the chance of, or amount of, overfitting, several techniques are available (e.g. model comparison, cross-validation, regularization, early stopping, pruning, Bayesian priors, or dropout). The possibility of overfitting exists because the criterion used for selecting the model is not the same as the criterion used to judge the suitability of a model.

### Coefficient of determination

**R-squaredR'' 2 R 2**

In particular, the value of the coefficient of determination will shrink relative to the original data.

### Prior probability

**prior distributionpriorprior probabilities**