Random forest

random forestsRandom multinomial logitRandom naive Bayesrandom decision forestsdecision forestKernel random forestRandom Forest Classificationrandom forest decision trees
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.wikipedia
106 Related Articles

Ensemble learning

ensembles of classifiersensembleBayesian model averaging
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
Fast algorithms such as decision trees are commonly used in ensemble methods (for example, random forests), although slower algorithms can benefit from ensemble techniques as well.

Adele Cutler

An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark (, owned by Minitab, Inc.).
Adele Cutler is a statistician known as one of the developers of archetypal analysis and of the random forest technique for ensemble learning.

Statistical classification

classificationclassifierclassifiers
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Tin Kam Ho

The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.
Ho is noted for introducing random decision forests in 1995, and for her pioneering work in ensemble learning and data complexity analysis.

Out-of-bag error

An optimal number of trees B can be found using cross-validation, or by observing the out-of-bag error: the mean prediction error on each training sample xᵢ, using only the trees that did not have xᵢ in their bootstrap sample.
Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging) to sub-sample data samples used for training.

Random subspace method

feature bagging
The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.
The random subspace method has been used for decision trees; when combined with "ordinary" bagging of decision trees, the resulting models are called random forests.

Leo Breiman

Breiman, LeoBreiman
An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark (, owned by Minitab, Inc.).
Another of Breiman's ensemble approaches is the random forest.

Donald Geman

GemanGeman, DonaldGeman brothers
The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance.
In another milestone paper, in collaboration with Y. Amit, he introduced the notion for randomized decision trees, which have been called random forests and popularized by Leo Breiman.

Decision tree learning

decision treesdecision treeClassification and regression tree
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Bootstrap aggregating

baggingBootstrap aggregationbagged nearest neighbour classifier
The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance.

Naive Bayes classifier

Naive Bayesnaive Bayes classificationNaïve Bayes
Instead of decision trees, linear models have been proposed and evaluated as base estimators in random forests, in particular multinomial logistic regression and naive Bayes classifiers.
Still, a comprehensive comparison with other classification algorithms in 2006 showed that Bayes classification is outperformed by other approaches, such as boosted trees or random forests.

Scikit-learn

sklearn
It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Regression analysis

regressionmultiple regressionregression model
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Mode (statistics)

modemodalmodes
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Overfitting

overfitover-fitover-fitted
Random decision forests correct for decision trees' habit of overfitting to their training set.

Training, validation, and test sets

training settraining datatest set
Random decision forests correct for decision trees' habit of overfitting to their training set.

Trademark

trademarkstrade marktrademarked
An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark (, owned by Minitab, Inc.).

Minitab

Minitab, Inc.
An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark (, owned by Minitab, Inc.).

Feature (machine learning)

feature vectorfeature spacefeatures
Ho established that forests of trees splitting with oblique hyperplanes can gain accuracy as they grow without suffering from overtraining, as long as the forests are randomly restricted to be sensitive to only selected feature dimensions.

Linear subspace

subspacesubspacesintersection
into a randomly chosen subspace before fitting each tree or each node.

Generalization error

generalization
form of a bound on the generalization error which depends on the strength of the