Back Error Propagation Simulator

In this report I shall use BPS (Back Error Propagation Simulator), this program stimulates a multi layer neural network using back error propagation as a learning algorithm. I will use BPS to create two multi-layer perceptron networks, which once completed and trained shall attempt to recognise five characters formed on a 5 x 5 pixel grid.

The first network will consist of two layers (an input layer and an output layer). The second network will consist of three layers (an input layer, a hidden layer and an output layer). I will train and test both of these networks, in order to test the two significantly I will add noise to each of the five characters and record the output given from both networks and compare and contrast the two.

1. Representation of the digits and letters

The following letters and digits were chosen to represent the input data: J, G, B, 1 and 2. The reason why these letters and digits were chosen is because they were fairly different from each other. For example the letters J and G have a significant difference between them. When I input these in two both of the networks, I will be able to add noise in various places and this will therefore allow me to test both of my networks more efficiently. As for the digits I decided to choose 1 and 2 this was also due to the fact that I could add more noise to both of these characters. When both of these digits are put through the networks they will give out interesting data since when noise is added they will change slightly but still look like the character and this will allow me to assess both networks performance effectively.

All of these characters will be represented on a 5 x 5 grid; they will all be capital letters and be displayed on this 5 x 5 grid, the same way they are on a keyboard. Where a pixel is used on the grid it will be represented by a 1.0 and a 0.0 will represent where there are no pixels. Positioning of the outputs will be in the form of 0.9 and 0.1. The figure 0.9 will represent where there is a character and where there is no character, 0.1will be shown. This is all done because when the output is received the closer the figure is to 0.9 the more it has recognised it as a character where as if the output figure is closer to 0.1 this means the network has not recognised the character very well.

The reason why I have decided to represent the figures on the grid as 0.0 and 1.0 and the outputs as 0.1 and 0.0 is because having the two represented as different forms will allows a clearer distinguish between the output and the input.

Below there is an example of how the characters are to be represented in the training set file (this file can be seen in appendix A):

Training set

2. Architecture of the two networks

I have produced two multi-layer perceptron networks, both networks recognise a set of five alphanumeric characters formed on a 5 x 5 pixel grid. The first network has 2 layers; the first layer (input) consists of 25 nodes and the second layer (output) has 5 nodes.

The second network consists of three layers and differs from network 1 as it contains one hidden layer. The first layer (input) has 25 nodes, the second layer (hidden) has 6 nodes and the final layer has 5 nodes. In order to work out the number of nodes in the hidden layer, as a rough guide, following calculation was used, where 25 is the number of nodes in the input layer and 5 is the number of nodes in the output layer:

Approximate number of nodes in hidder layer = 25 + 5

3. Training Parameters

The following training parameters were used for both of the networks:

* Present ETA 0.010000

* Present alpha 0.900000

* ETA bounded by 0.700000

* ETA increasing rate 0.010000

* Number of epochs per cycle 200

Both of the networks were trained at 200 epochs per cycle until the final error level reached a constant zero. Once the final error level had reached zero the networks were trained once more to ensure the error level was actually in fact zero. The reason for why a small 200 epochs per cycle were chosen was because this would allow more time to train the network and would allows me more time to record each value after running each set of epochs. Another reason for choosing such a small value of epochs is because since we have such small values for random weights (-0.33 and 0.33), this combined with a small number of epochs will not allow the weights to become too large thus allowing the network to learn mapping efficiently.

The other four parameters were set to default and these are the ones I used because these values allow the networks to slowly but reliably learn the patterns. The value of eta was kept at this low figure because a large value would result in instabilities within the network, as a result the weights would become too large and it would be near impossible to train the network properly.