1 Convolutional Neural NetworksIn this section, we will discuss convolutional neuralnetworks for visual recognition tasks in detail.

The first subsection is abrief introduction to convolutional neural networks and also succinctlydescribes the reason for selecting deep learning to solve the given task ofdrowsiness detection. The next subsection describes various convolutionalneural network architectures available nowadays and the one architecture wehave selected for our task of drowsiness detection and why.1.1 Neural NetworksMachine learning is the idea that there are genericalgorithms that can tell us something interesting about a set of data insteadof humans coding for a specific problem. Instead of devising a logic andwriting a code to solve a specific problem, you present a training dataset tothe generic algorithm that then makes its own logic based on the trainingdataset.

Neural networks are just one of the machine learningalgorithms inspired by the human brain. The basic computational unit of a brainis a neuron. A neuron takes in an input, processes it and transmits it to otherneurons. Hence, a brain cell (a neuron) consists of the following parts,1.

Dendrites:Accepts the input2. Soma:Processes the input3. Axon:Converts processed input into a form that can be accepted by the next neuron.4. Synapses:Electrochemical contacts between neurons. Output from one neuron is transmittedto the input of the other neuron through synapses.In the computational model of a neuron (artificial neuronsthat make up the neural networks) the inputs are represented by . The inputs interactwith multiplicatively () with the dendrites ofother neurons based on the synaptic strength.

The idea is that thesynaptic weights are learnable and control the strength of influence on oneneuron on another. The weighted inputs are then carried to the cells body wherethey are summed. The activation function then converts the processed inputs tooutputs which are then carried by the synapse to the next neuron. Variousactivation functions can be used, however the activation function has to bedifferentiable, non-saturating and have preferably zero centered outputs. ReLu(max function) is a popular activation function that is used nowadays. Figure 2.1-1 below shows the biologicaland mathematical/computational/artificial model of a neuron.

Figure 2.1?1:Left: Biological neuron model. Right: Mathematical neuron model. Image Source: http://cs231n.

github.io/neural-networks-1/1.1.1 NeuralNetwork ArchitectureNeural networks consists of a collection of neuron that areconnected in an acyclic graph. The neurons are usually arranged in a layer wiseorganization.

The most common layer in a neural network is called a”fully-connected” layer. In this layer type, every neuron in the precedinglayer is connected to every neuron in the upcoming layer. The following figure2.1-2 shows a typical neural network architecture.Figure 2.

1?2:A 3 layer (fully connected layers) neural network architecure. Notice that there are no connections within a layer, justbetween adjacent layers. Image Source: http://cs231n.github.io/neural-networks-1/So how do we select a neural network architecture i.

e. howmany layers should it have? Increasing the number of neurons or hidden layersenables the neural network to represent more complicated functions and hence ismore beneficial. However, this can also be a problem. Overfitting occurs when amodel with high capacity fits the noise in the data instead of the (assumed)underlying relationship.

Hence, the performance of the neural network can be adverselyaffected. But that does not mean that one should use smaller networks.In fact ,it is recommended that you use a neural network as big as your computational budgetallows and solve the problem of overfitting using other techniques such as regularization, dropout, and input noise. 1.1.2 WorkingNeural networks are a class of machine learning algorithms.

They learn tasks by considering examples, generally without task-specificprogramming. For example, in image recognition, they might learn to identifyimages that contain cats by analyzing example images that have been manuallylabeled as “cat” or “no cat” and using the results toidentify cats in other images. They do this without any a priori knowledgeabout cats, e.g., that they have fur, tails, whiskers and cat-like faces.Instead, they evolve their own set of relevant characteristics from thelearning material that they process.

Hence, before a neural network can predict an output, itneeds to be trained. The training data set consists of input and output (labelsor targets) pairs. The training starts with random initialization of weights ina neural network. The input is then passed through the neural network to getthe final activations of the output layer. Initially, the output is extremelyinaccurate due to the random initialization of the weights. The goal of trainingis to arrive at optimum value of weights, such that the neural network canpredict the output (given some input) with reasonable accuracy. The algorithmused to do this is known as “Backpropagation”.In backpropagation, the error in the output is calculated.

Since, the weights are randomly initialized, the output in the beginning is notthat accurate. So, the error is calculated in the output. The function thatmeasures the error is known as the “cost function”. We desire to adjust theweights such that the cost function is minimized, and the technique used tominimize the cost function is called “gradient descent”.Gradient descent is an optimization algorithm that helps infinding the minimum of a function. To find a local minimum of a function usinggradient descent, one takes steps proportional to the negative of the gradient(or of the approximate gradient) of the function at the current point. In thecase of neural networks, we want to find the minimum of the cost function.

Weights in the neural network are updated in the neural network in the negativedirection of the gradient so as to get to the minimum of the cost function. Thegradient is calculated at the output layer and flows back into the hiddenlayers of the neural network for weight updation to take place. This is wherethe name “backpropagation” comes from.Once the training is complete, i.e.

a certain number oftraining epochs have been completed or the output loss has reached below acertain threshold, the neural network can be used to predict the output.