Optical intelligence, computer vision, and pattern recognition. A

Optical Character Recognition
On Bank Cheques Using 2D Convolution Neural Network

                                                                                                                             

 

Abstract— Banking system worldwide suffers from
huge dependencies upon man power and written documents thus making conventional
banking processes tedious and time consuming. Existing methods for processing
transactions made through cheques causes delay in the processing as the details
have to be manually entered.  Recent
developments in image processing and computer vision enable machines to analyse
data and extract information from images. In banking sector, automation has
become one of the important aspects. Automatic cheque processing is a vision of
researchers for decades. Handwritten Character Recognition is a method in which
pattern recognition clubbed with machine learning is used to design Optical
Character Recognizer. The machine learning model used to develop the automatic
cheque processing is 2D Convolution Neural Network. The machine learning model
used for the prediction purposed was trained on the EMNIST dataset for
alphabets and digits. The training resulted in an accuracy of 97% for digits
and 98% for letters. When the same was tested with the input images for the
cheque OCR, the accuracy was 95.71%

 

 

Keywords— machine learning, optical character recognition, 2D
Convolution Neural Network, OCR image processing, handwritten character
recognition, pattern recognition.

 

1.    Introduction

 

Handwritten
character recognition is a field of research in artificial intelligence,
computer vision, and pattern recognition. A computer performing handwriting
recognition is said to be able to acquire and detect characters in paper
documents, pictures, touch-screen devices and other sources and convert them
into machine-encoded form. Its application is found in optical character
recognition and more advanced intelligent character recognition systems. Most
of these systems nowadays implement machine learning mechanisms such as neural
networks.

 

Optical character recognition (also optical
character reader, OCR) is the mechanical or electronic conversion of images of
typed, handwritten or printed text into machine-encoded text, whether from a
scanned document, a photo of a document, a scene-photo (for example the text on
signs and billboards in a landscape photo) or from subtitle text superimposed
on an image (for example from a television broadcast).

 

.Advanced systems
capable of producing a high degree of recognition accuracy for most fonts are
now common, and with support for a variety of digital image file format inputs.
13

 

Machine learning is
a branch of artificial intelligence inspired by psychology and biology that
deals with learning from a set of data and can be applied to solve a wide
spectrum of problems. A supervised machine learning model is given instances of
data specific to a problem domain and an answer that solves the problem for
each instance. When learning is complete, the model is able not only to provide
answers to the data it has learned on but also to yet unseen data with high
precision. 1

 

Neural Networks are
collections of mathematical models that represent some of the observed
properties of biological nervous systems and draw on the analogies of adaptive
biological learning. Neural networks can be applied in various fields such as
OCR pattern recognition and language translation etc. Unlike the original
Perceptron model, shown by Minsky and Papert to have limited computational
capability, the neural network of today consists of a large number of highly
interconnected processing elements (nodes) that are tied together with weighted
connections (links). Learning in biological systems involves adjustments to the
synaptic connections that exist between the neurons. This is true for neural
networks as well. Machine learning typically occurs by example through training
or exposure to a set of input/output data (pattern) where the training
algorithm adjusts the link weights. The link weights store the knowledge
necessary to solve specific problems. 11

 

In banking
applications, neural networks can be used to recognize characters on bank
cheques. The important text details on the cheque are segmented out and input
to the machine learning OCR that stores the output in the text format. The text
obtained can be directly fed to the system of the banks and the need of
manually typing the banking details in the banking system becomes automated.
The cheque is scanned and the required information from the cheque is obtained
as text that can be fed to the banking software.

2.    Literature
Survey   

 

Ivor Uhliarik et al., 1 Proposed
work primarily discusses machine learning methods; the
scope is limited to corresponding algorithms. Specifically, as the multilayer
perceptron has been chosen as a method for feature extraction
and classification, we deal with algorithms required for MLP implementation. Its
main advantages
include that it is able to achieve increased accuracy with top performance compared
to other systems The drawback of this work is that the cost of computation is high.

Yichang
Shih et al., 9 Proposed work
primarily recognizes handwritten Sanskrit characters using machine learning.
Combined with the robustness of nonparametric modeling and the expressiveness
of parametric modeling, but also the computation efficiency during training and
testing respectively.The
key aspect of this work is that the handwriting
recognition is done for a language other than English. However its limitation
is that the accuracy was 85% which could still be improved.        

Y. LeCun
et al., 3 The paper demonstrates how learning networks and backpropagation network
can be applied to the recognition of the handwritten zip code. The main feature
of this work is that the constrained backpropagation used by
them helps in shift-invariance but also vastly reduces the entropy. The defect
in this model is that mage segmentation performed was not able to separate the
characters which overlap.

Adam
Coates et al., 2 In this work character recognition was done using
Unsupervised feature Learning The main advantage is that t the work makes
use of larger scale algorithm for learning the features
automatically from the unlabeled data. However the disadvantage
is that the image has not been filtered to a great extent.

OCR in banking applications using neural networks
has been found as another useful application of machine learning for
efficiently and accurately recognizing characters on bank cheques to automate
the whole banking procedure. The method trains on the dataset given to it and
uses its prior training experience to recognize characters on bank cheques. The
efficiency of the method lies in the fact that software has to be trained many
times initially in optimal conditions and the best possible result with respect
to accuracy is extracted and used. From then on the neural network can recognize
characters with the prior experience it has had with the characters that were
used to train it.

3.    Proposed
Work

The
objective of the machine learning model is to provide an efficient system for
recognizing characters on bank cheques to do away with the manual data entering
methods prevalent in banks. The presence of neural networks ensures high
accuracy and efficiency in finding out the handwritten characters. The image
decomposes into segments of fixed pixel values and these segmented characters
are fed to the neural network which predicts the characters from the cheque and
stores it in the text format.

3.1 Algorithm

Input: Bank Cheque Image

Step1:- Segmentation: It involves
segmenting the text part of the bank cheque such as name,  amount in words, amount in digits, date etc.
by cropping or segmenting those parts from the cheque image.

Step 2:-Character Extraction: Segmenting
each word or numbers obtained from step 1 into individual characters.

 

Step 3:-Prediction: Feeding the
segmented characters into the convolution 2D neural networks for character recognition.
In this step the neural network recognizes the characters which are trained
with the EMNIST character and digit dataset given to it. The neural network is
trained for 15 iterations and the digit training neural network was trained for
9 iterations.

 

Output: The result obtained from the neural network is concatenated and stored
in a text file from where it can be fed to the banking system.

 

 

Table 1: Description Of Architectural Model

ARCHITECTURE

DESCRIPTION

INPUT LAYER

A bank cheque image is capture at 1601 X 719
pixels

SEGMENTATION AND EXTRACTION LAYER

Cropping

The input image is cropped into various
rectangular field that
capture the payable amount(words and
digits),account number, date, cheque number, signature, recipient’s name

Otsu Thresholding

Otsu Thresholding is used for background removal
of the above regions

Connected Component Labeling

Connected component labeling is done on the
outputted regions and the boundary location of the component are extracted

Component extraction

The Character are extracted from the original
image form the boundary location obtained in the previous step and is given
to the CNN for prediction

PREDICTION LAYER

A 2 Dimensional convolutional Neural network is
used to recognize the character extracted at the end of the previous step and
output the character

OUTPUT

The characters recognized by the prediction layer
are stored into a text file for further use

The EMNIST dataset is a set of handwritten character
digits derived from the NIST Special Database 19 and converted to a 28×28 pixel
image format and dataset structure that directly matches the MNIST dataset.

It is
important in picture processing to select an adequate threshold of gray level
for extracting objects from their background. This paper makes use of Otsu
thresholding which  is an automatic
threshold selection region based segmentation method. Otsu method is a type of
global thresholding in which it depends only on the gray value of the image. Otsu
method was proposed by Scholar Otsu in 1979. Which is widely used because it is
simple and effective 10. The proposed work involves usage of 2D
Convolution Neural Network which is an important operation in signal and image
processing. Convolution operates on two signals (in 1D) or two images (in 2D).
Convolution is an incredibly important concept in many areas of math and
Engineering.

 

3.3   
Performance Analysis

 

Trained Data

Accuracy in %

Lettersalpha

97.45

Digits

98.87

 

The
convolution model employed for the character recognition task was trained on
the EMNIST dataset that consists of handwritten digits and alphabets. The
dataset was trained separately for alphabets and digits where alphabets
training fetched a validation accuracy of 97.45% and the digit training fetched
a validation accuracy of 98.87%. The same model was applied to the character
and digit recognition on the bank cheques and accuracy of about 95% was
obtained on the test cases that were taken.

 

The
machine learning model was tested for its performance of the cheques as shown
in Figure 2. The accuracy varied from cheque to cheque as different writing
styles were set as input to the learning model. 
The best case accuracy was 100% when everything letter and digit was
predicted accurately, the worst case accuracy was 89.74 % and the average case
accuracy, when considered for the characters and digits in all the cheques, was
about 95.71%.

4. Conclusion and
Future Scope

The
neural network currently used in the system deploys handwriting character
recognition for block letters and digits using pattern recognition. This gives
us an optical character recognition tool built using machine learning and 2-D
convolution neural network. This OCR can be used in banking applications to
automate the whole banking transaction procedure. This reduces human effort and
can speed up the entire banking process. The proposed work produced an accuracy
of about 95.71% when tested with the bank cheques as shown in Table 2, however
the model’s validation accuracy while training was 98% for letters and 97% for
alphabets.

Future
scope of the system includes increasing the accuracy of the character
recognition, support for cursive handwriting and better image segmentation. The
system can also be scaled up to other applications that require reading of the
text as their main operation in other sectors as well.