Feedforward Neural Networks
editFeedforward neural networks are called networks because they are typically represented by composing together many different functions.
The model is associated with a directed acyclic graph describing how the functions are composed together. For example, we might have three functions f(1),f(2), and f(3) connected in a chain, to form f(x) =f(3)(f(2)(f(1)(x))). These chain structures are the most commonly used structures of neural networks. In this case,f(1)is called the first layer of the network, f(2)is called the second layer, and so on.
The final layer of a feedforward network is called the output layer. During neural network training, we drive f(x) to match f∗(x).
The training data provides us with noisy, approximate examples off∗(x) evaluated at different training points.
Each example x is accompanied by a label y ≈ f∗(x). The training examples specify directly what the output layer must do at each point x; it must produce a value that is close to y.
The behavior of the other layers is not directly specified by the training data. The learning algorithm must decide how to use those layers to produce the desired output, but the training data do not say what each individual layer should do. Instead, the learning algorithm must decide how to use these layers to best implement an approximation off∗. Because the training data does not show the desired output for each of these layers, they are called hidden layers.
Each hidden layer of the network is typically vector valued. Thedimensionality of these hidden layers determines thewidthof the model.
Rather than thinking of the layer as representing a single vector-to-vector function,we can also think of the layer as consisting of manyunitsthat act in parallel,each representing a vector-to-scalar function