# Feedforward neural network

**feedforwardfeedforward neural networksfeedforward networksfeed-forward networkfeed-forwardfeed forwardfeed forward networkfeed-forward neural networkfeed-forward neural network (FFNN)feedforward neural**

A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle.wikipedia

85 Related Articles

### Recurrent neural network

**recurrent neural networksrecurrentSimple recurrent network**

As such, it is different from recurrent neural networks.

Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs.

### Perceptron

**Perceptronsperceptron algorithmFeedforward Neural Network (Perceptron)**

In the literature the term perceptron often refers to networks consisting of just one of these units.

This caused the field of neural network research to stagnate for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had far greater processing power than perceptrons with one layer (also called a single layer perceptron).

### Delta rule

Perceptrons can be trained by a simple learning algorithm that is usually called the delta rule.

In machine learning, the Delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network.

### Artificial neuron

**artificial neuronsMcCulloch–Pitts neuronneurons**

Neurons with this kind of activation function are also called artificial neurons or linear threshold units.

Researchers also soon realized that cyclic networks, with feedbacks through neurons, could define dynamical systems with memory, but most of the research concentrated (and still does) on strictly feed-forward networks because of the smaller difficulty they present.

### Universal approximation theorem

**universal approximationuniversal approximatorapproximate**

Although a single threshold unit is quite limited in its computational power, it has been shown that networks of parallel threshold units can approximate any continuous function from a compact interval of the real numbers into the interval [-1,1].

In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R n, under mild assumptions on the activation function.

### Backpropagation

**back-propagationback propagationbackpropagate**

It has a continuous derivative, which allows it to be used in backpropagation.

In machine learning, specifically deep learning, backpropagation (backprop, BP) is an algorithm widely used in the training of feedforward neural networks for supervised learning; generalizations exist for other artificial neural networks (ANNs), and for functions generally.

### Rprop

**resilient backpropagationResilient Propagation (RProp)**

Rprop, short for resilient backpropagation, is a learning heuristic for supervised learning in feedforward artificial neural networks.

### Activation function

**activation mapnonlinearity**

Neurons with this kind of activation function are also called artificial neurons or linear threshold units.

### Warren Sturgis McCulloch

**Warren McCullochWarren S. McCullochMcCulloch**

A similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s.

### Walter Pitts

**PittsWalter H. Pitts**

A similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s.

### Gradient descent

**steepest descentgradient ascentgradient**

It calculates the errors between calculated output and sample output data, and uses this to create an adjustment to the weights, thus implementing a form of gradient descent. To adjust weights properly, one applies a general method for non-linear optimization that is called gradient descent.

### Linear separability

**linearly separablelinear separatorlinearly separated**

Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

### Monograph

**monographsmonographymonographic**

Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

### Perceptrons (book)

**Perceptronsdevastating criticismPerceptron**

Single-layer perceptrons are only capable of learning linearly separable patterns; in 1969 in a famous monograph entitled Perceptrons, Marvin Minsky and Seymour Papert showed that it was impossible for a single-layer perceptron network to learn an XOR function (nonetheless, it was known that multi-layer perceptrons are capable of producing any possible boolean function).

### Marvin Minsky

**MinskyMarvin L. Minskyartificial intelligence**

### Seymour Papert

**PapertPapert, SeymourPapert, S.**

### Exclusive or

**XORexclusive-orexclusive disjunction**

### Step function

**Piecewise constant functionpiecewise-constantpiecewise constant**

A single-layer neural network can compute a continuous output instead of a step function.

### Logistic function

**logisticlogistic curvelogistic growth**

A common choice is the so-called logistic function: The logistic function is also known as the sigmoid function.

### Logistic regression

**logit modellogisticlogistic model**

With this choice, the single-layer network is identical to the logistic regression model, widely used in statistical modeling.

### Statistical model

**modelprobabilistic modelstatistical modeling**

With this choice, the single-layer network is identical to the logistic regression model, widely used in statistical modeling.

### Sigmoid function

**sigmoidalsigmoidsigmoid curve**

The logistic function is also known as the sigmoid function.

### Chain rule

**multivariate chain rulechainchain rule for several variables**

(The fact that f satisfies the differential equation above can easily be shown by applying the Chain Rule.)

### Mathematical optimization

**optimizationmathematical programmingoptimal**

To adjust weights properly, one applies a general method for non-linear optimization that is called gradient descent.

### Overfitting

**overfitover-fitover-fitted**

The danger is that the network overfits the training data and fails to capture the true statistical process generating the data.