# Statistical learning theory

Learning theory (statistics)statistical machine learning
Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis.wikipedia
69 Related Articles

### Empirical risk minimization

empirical riskempirical risk functionalminimize empirical risk
the empirical risk is called empirical risk minimization.
Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on their performance.

### Proximal gradient methods for learning

Proximal gradient (forward backward splitting) methods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable.

### Reproducing kernel Hilbert space

reproducing kernelReproducing kernel Hilbert spacesBergman spaces
Reproducing kernel Hilbert spaces are particularly important in the field of statistical learning theory because of the celebrated representer theorem which states that every function in an RKHS that minimises an empirical risk functional can be written as a linear combination of the kernel function evaluated at the training points.

### Machine learning

machine-learninglearningstatistical learning
Statistical learning theory is a framework for machine learning

### Statistics

statisticalstatistical analysisstatistician
drawing from the fields of statistics and functional analysis.

### Functional analysis

functionalfunctional analyticalgebraic function theory
drawing from the fields of statistics and functional analysis.

### Computer vision

visionimage classificationImage recognition
Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics, and baseball.

### Speech recognition

voice recognitionautomatic speech recognitionvoice command
Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics, and baseball.

### Bioinformatics

bioinformaticbioinformaticianbio-informatics
Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics, and baseball.

### Baseball

playerbaseball playerbaseball team
Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, bioinformatics, and baseball.

### Supervised learning

supervisedsupervised machine learningsupervised classification
Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning.

### Unsupervised learning

unsupervisedunsupervised classificationunsupervised machine learning
Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning.

### Online machine learning

online learningon-line learningonline
Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning.

### Reinforcement learning

reward functionInverse reinforcement learningreinforcement
Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning.

### Training, validation, and test sets

training settraining datatest set
Supervised learning involves learning from a training set of data.

### Regression analysis

regressionmultiple regressionregression model
Depending on the type of output, supervised learning problems are either problems of regression or problems of classification.

### Statistical classification

classificationclassifierclassifiers
Depending on the type of output, supervised learning problems are either problems of regression or problems of classification.

### Ohm's law

ohmicOhmohmic losses
Using Ohm's Law as an example, a regression could be performed with voltage as input and current as an output.

### Facial recognition system

facial recognitionface recognitionfacial recognition technology
In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name.

### Vector space

vectorvector spacesvectors
Take X to be the vector space of all possible inputs, and Y to be

### Probability distribution

distributioncontinuous probability distributiondiscrete probability distribution
Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space, i.e. there exists some unknown.

### Loss function

objective functioncost functionrisk function
Let be the loss function, a metric for the difference between the predicted value f(\vec{x}) and the actual value y.

### Norm (mathematics)

normEuclidean normseminorm
The most common loss function for regression is the square loss function (also known as the L2-norm).

### Taxicab geometry

Manhattan distanceL1 normtaxicab metric
The absolute value loss (also known as the L1-norm) is also sometimes used:

### Indicator function

characteristic functionmembership functionindicator
In some sense the 0-1 indicator function is the most natural loss function for classification.