Feedforward neural network

feedforwardfeedforward neural networksfeedforward networksfeed-forward networkfeed-forwardfeed forwardfeed forward networkfeed-forward neural networkfeed-forward neural network (FFNN)feedforward neural
A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle.wikipedia
85 Related Articles

Recurrent neural network

recurrent neural networksrecurrentSimple recurrent network
As such, it is different from recurrent neural networks.
Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs.

Perceptron

Perceptronsperceptron algorithmFeedforward Neural Network (Perceptron)
In the literature the term perceptron often refers to networks consisting of just one of these units.
This caused the field of neural network research to stagnate for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had far greater processing power than perceptrons with one layer (also called a single layer perceptron).

Delta rule

Perceptrons can be trained by a simple learning algorithm that is usually called the delta rule.
In machine learning, the Delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network.

Artificial neuron

artificial neuronsMcCulloch–Pitts neuronneurons
Neurons with this kind of activation function are also called artificial neurons or linear threshold units.
Researchers also soon realized that cyclic networks, with feedbacks through neurons, could define dynamical systems with memory, but most of the research concentrated (and still does) on strictly feed-forward networks because of the smaller difficulty they present.

Universal approximation theorem

universal approximationuniversal approximatorapproximate
Although a single threshold unit is quite limited in its computational power, it has been shown that networks of parallel threshold units can approximate any continuous function from a compact interval of the real numbers into the interval [-1,1].
In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R n, under mild assumptions on the activation function.

Backpropagation

back-propagationback propagationbackpropagate
It has a continuous derivative, which allows it to be used in backpropagation.
In machine learning, specifically deep learning, backpropagation (backprop, BP) is an algorithm widely used in the training of feedforward neural networks for supervised learning; generalizations exist for other artificial neural networks (ANNs), and for functions generally.

Rprop

resilient backpropagationResilient Propagation (RProp)
Rprop, short for resilient backpropagation, is a learning heuristic for supervised learning in feedforward artificial neural networks.

Activation function

activation mapnonlinearity
Neurons with this kind of activation function are also called artificial neurons or linear threshold units.

Warren Sturgis McCulloch

Warren McCullochWarren S. McCullochMcCulloch
A similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s.

Walter Pitts

PittsWalter H. Pitts
A similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s.

Gradient descent

steepest descentgradient ascentgradient
It calculates the errors between calculated output and sample output data, and uses this to create an adjustment to the weights, thus implementing a form of gradient descent. To adjust weights properly, one applies a general method for non-linear optimization that is called gradient descent.

Linear separability

linearly separablelinear separatorlinearly separated
Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

Monograph

monographsmonographymonographic
Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

Perceptrons (book)

Perceptronsdevastating criticismPerceptron
Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

Marvin Minsky

MinskyMarvin L. Minskyartificial intelligence
Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

Seymour Papert

PapertPapert, SeymourPapert, S.
Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

Exclusive or

XORexclusive-orexclusive disjunction
Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

Step function

Piecewise constant functionpiecewise-constantpiecewise constant
A single-layer neural network can compute a continuous output instead of a step function.

Logistic function

logisticlogistic curvelogistic growth
A common choice is the so-called logistic function: The logistic function is also known as the sigmoid function.

Logistic regression

logit modellogisticlogistic model
With this choice, the single-layer network is identical to the logistic regression model, widely used in statistical modeling.

Statistical model

modelprobabilistic modelstatistical modeling
With this choice, the single-layer network is identical to the logistic regression model, widely used in statistical modeling.

Sigmoid function

sigmoidalsigmoidsigmoid curve
The logistic function is also known as the sigmoid function.

Chain rule

multivariate chain rulechainchain rule for several variables
(The fact that f satisfies the differential equation above can easily be shown by applying the Chain Rule.)

Mathematical optimization

optimizationmathematical programmingoptimal
To adjust weights properly, one applies a general method for non-linear optimization that is called gradient descent.

Overfitting

overfitover-fitover-fitted
The danger is that the network overfits the training data and fails to capture the true statistical process generating the data.