Convolutional Neural Networks
In this section, we will discuss convolutional neural
networks for visual recognition tasks in detail. The first subsection is a
brief introduction to convolutional neural networks and also succinctly
describes the reason for selecting deep learning to solve the given task of
drowsiness detection. The next subsection describes various convolutional
neural network architectures available nowadays and the one architecture we
have selected for our task of drowsiness detection and why.
Machine learning is the idea that there are generic
algorithms that can tell us something interesting about a set of data instead
of humans coding for a specific problem. Instead of devising a logic and
writing a code to solve a specific problem, you present a training dataset to
the generic algorithm that then makes its own logic based on the training
Neural networks are just one of the machine learning
algorithms inspired by the human brain. The basic computational unit of a brain
is a neuron. A neuron takes in an input, processes it and transmits it to other
neurons. Hence, a brain cell (a neuron) consists of the following parts,
Accepts the input
Processes the input
Converts processed input into a form that can be accepted by the next neuron.
Electrochemical contacts between neurons. Output from one neuron is transmitted
to the input of the other neuron through synapses.
In the computational model of a neuron (artificial neurons
that make up the neural networks) the inputs are represented by . The inputs interact
with multiplicatively () with the dendrites of
other neurons based on the synaptic strength. The idea is that the
synaptic weights are learnable and control the strength of influence on one
neuron on another. The weighted inputs are then carried to the cells body where
they are summed. The activation function then converts the processed inputs to
outputs which are then carried by the synapse to the next neuron. Various
activation functions can be used, however the activation function has to be
differentiable, non-saturating and have preferably zero centered outputs. ReLu
(max function) is a popular activation function that is used nowadays.
Figure 2.1-1 below shows the biological
and mathematical/computational/artificial model of a neuron.
Left: Biological neuron model. Right: Mathematical neuron model. Image Source: http://cs231n.github.io/neural-networks-1/
Neural networks consists of a collection of neuron that are
connected in an acyclic graph. The neurons are usually arranged in a layer wise
organization. The most common layer in a neural network is called a
“fully-connected” layer. In this layer type, every neuron in the preceding
layer is connected to every neuron in the upcoming layer. The following figure
2.1-2 shows a typical neural network architecture.
A 3 layer (fully connected layers) neural network architecure. Notice that there are no connections within a layer, just
between adjacent layers. Image Source: http://cs231n.github.io/neural-networks-1/
So how do we select a neural network architecture i.e. how
many layers should it have? Increasing the number of neurons or hidden layers
enables the neural network to represent more complicated functions and hence is
more beneficial. However, this can also be a problem. Overfitting occurs when a
model with high capacity fits the noise in the data instead of the (assumed)
underlying relationship. Hence, the performance of the neural network can be adversely
affected. But that does not mean that one should use smaller networks.In fact ,
it is recommended that you use a neural network as big as your computational budget
allows and solve the problem of overfitting using other techniques such as regularization, dropout, and input noise.
Neural networks are a class of machine learning algorithms.
They learn tasks by considering examples, generally without task-specific
programming. For example, in image recognition, they might learn to identify
images that contain cats by analyzing example images that have been manually
labeled as “cat” or “no cat” and using the results to
identify cats in other images. They do this without any a priori knowledge
about cats, e.g., that they have fur, tails, whiskers and cat-like faces.
Instead, they evolve their own set of relevant characteristics from the
learning material that they process.
Hence, before a neural network can predict an output, it
needs to be trained. The training data set consists of input and output (labels
or targets) pairs.
The training starts with random initialization of weights in
a neural network. The input is then passed through the neural network to get
the final activations of the output layer. Initially, the output is extremely
inaccurate due to the random initialization of the weights. The goal of training
is to arrive at optimum value of weights, such that the neural network can
predict the output (given some input) with reasonable accuracy. The algorithm
used to do this is known as “Backpropagation”.
In backpropagation, the error in the output is calculated.
Since, the weights are randomly initialized, the output in the beginning is not
that accurate. So, the error is calculated in the output. The function that
measures the error is known as the “cost function”. We desire to adjust the
weights such that the cost function is minimized, and the technique used to
minimize the cost function is called “gradient descent”.
Gradient descent is an optimization algorithm that helps in
finding the minimum of a function. To find a local minimum of a function using
gradient descent, one takes steps proportional to the negative of the gradient
(or of the approximate gradient) of the function at the current point. In the
case of neural networks, we want to find the minimum of the cost function.
Weights in the neural network are updated in the neural network in the negative
direction of the gradient so as to get to the minimum of the cost function. The
gradient is calculated at the output layer and flows back into the hidden
layers of the neural network for weight updation to take place. This is where
the name “backpropagation” comes from.
Once the training is complete, i.e. a certain number of
training epochs have been completed or the output loss has reached below a
certain threshold, the neural network can be used to predict the output.