1

Convolutional Neural Networks

In this section, we will discuss convolutional neural

networks for visual recognition tasks in detail. The first subsection is a

brief introduction to convolutional neural networks and also succinctly

describes the reason for selecting deep learning to solve the given task of

drowsiness detection. The next subsection describes various convolutional

neural network architectures available nowadays and the one architecture we

have selected for our task of drowsiness detection and why.

1.1

Neural Networks

Machine learning is the idea that there are generic

algorithms that can tell us something interesting about a set of data instead

of humans coding for a specific problem. Instead of devising a logic and

writing a code to solve a specific problem, you present a training dataset to

the generic algorithm that then makes its own logic based on the training

dataset.

Neural networks are just one of the machine learning

algorithms inspired by the human brain. The basic computational unit of a brain

is a neuron. A neuron takes in an input, processes it and transmits it to other

neurons. Hence, a brain cell (a neuron) consists of the following parts,

1.

Dendrites:

Accepts the input

2.

Soma:

Processes the input

3.

Axon:

Converts processed input into a form that can be accepted by the next neuron.

4.

Synapses:

Electrochemical contacts between neurons. Output from one neuron is transmitted

to the input of the other neuron through synapses.

In the computational model of a neuron (artificial neurons

that make up the neural networks) the inputs are represented by . The inputs interact

with multiplicatively () with the dendrites of

other neurons based on the synaptic strength. The idea is that the

synaptic weights are learnable and control the strength of influence on one

neuron on another. The weighted inputs are then carried to the cells body where

they are summed. The activation function then converts the processed inputs to

outputs which are then carried by the synapse to the next neuron. Various

activation functions can be used, however the activation function has to be

differentiable, non-saturating and have preferably zero centered outputs. ReLu

(max function) is a popular activation function that is used nowadays.

Figure 2.1-1 below shows the biological

and mathematical/computational/artificial model of a neuron.

Figure 2.1?1:

Left: Biological neuron model. Right: Mathematical neuron model. Image Source: http://cs231n.github.io/neural-networks-1/

1.1.1 Neural

Network Architecture

Neural networks consists of a collection of neuron that are

connected in an acyclic graph. The neurons are usually arranged in a layer wise

organization. The most common layer in a neural network is called a

“fully-connected” layer. In this layer type, every neuron in the preceding

layer is connected to every neuron in the upcoming layer. The following figure

2.1-2 shows a typical neural network architecture.

Figure 2.1?2:

A 3 layer (fully connected layers) neural network architecure. Notice that there are no connections within a layer, just

between adjacent layers. Image Source: http://cs231n.github.io/neural-networks-1/

So how do we select a neural network architecture i.e. how

many layers should it have? Increasing the number of neurons or hidden layers

enables the neural network to represent more complicated functions and hence is

more beneficial. However, this can also be a problem. Overfitting occurs when a

model with high capacity fits the noise in the data instead of the (assumed)

underlying relationship. Hence, the performance of the neural network can be adversely

affected. But that does not mean that one should use smaller networks.In fact ,

it is recommended that you use a neural network as big as your computational budget

allows and solve the problem of overfitting using other techniques such as regularization, dropout, and input noise.

1.1.2 Working

Neural networks are a class of machine learning algorithms.

They learn tasks by considering examples, generally without task-specific

programming. For example, in image recognition, they might learn to identify

images that contain cats by analyzing example images that have been manually

labeled as “cat” or “no cat” and using the results to

identify cats in other images. They do this without any a priori knowledge

about cats, e.g., that they have fur, tails, whiskers and cat-like faces.

Instead, they evolve their own set of relevant characteristics from the

learning material that they process.

Hence, before a neural network can predict an output, it

needs to be trained. The training data set consists of input and output (labels

or targets) pairs.

The training starts with random initialization of weights in

a neural network. The input is then passed through the neural network to get

the final activations of the output layer. Initially, the output is extremely

inaccurate due to the random initialization of the weights. The goal of training

is to arrive at optimum value of weights, such that the neural network can

predict the output (given some input) with reasonable accuracy. The algorithm

used to do this is known as “Backpropagation”.

In backpropagation, the error in the output is calculated.

Since, the weights are randomly initialized, the output in the beginning is not

that accurate. So, the error is calculated in the output. The function that

measures the error is known as the “cost function”. We desire to adjust the

weights such that the cost function is minimized, and the technique used to

minimize the cost function is called “gradient descent”.

Gradient descent is an optimization algorithm that helps in

finding the minimum of a function. To find a local minimum of a function using

gradient descent, one takes steps proportional to the negative of the gradient

(or of the approximate gradient) of the function at the current point. In the

case of neural networks, we want to find the minimum of the cost function.

Weights in the neural network are updated in the neural network in the negative

direction of the gradient so as to get to the minimum of the cost function. The

gradient is calculated at the output layer and flows back into the hidden

layers of the neural network for weight updation to take place. This is where

the name “backpropagation” comes from.

Once the training is complete, i.e. a certain number of

training epochs have been completed or the output loss has reached below a

certain threshold, the neural network can be used to predict the output.