1. Abstract sign language is language which uses visually exhibited sign patterns to define bysimultaneously combining hand shapes, orientation and movement of the hands,arms or body, and facial expressions to fluently express one’s thoughts or elseto communicate with others and is usually used by the physically impairedpeople who are physically challenged . Automatic Sign Language system needs faster and accurate methods foridentifying static signs or a sequence of produced signs to help interprettheir appropriate meaning.
Major components of Sign Languages are Hand Gesture. In this paper, a robust approach forrecognition of bare handed sign language which is static is presented, using anovel combination of features. These include Local Binary Patterns histogramfeatures based on colour and depth information, and also geometric features ofhand. Linear binary Support Vector Machine classifiers are used forrecognition, coupled with template matching in the case of multiple matches.The research aims working on hand gesture recognition for sign languageinterpretation as a Human Computer Interaction application. keywords—Indian Sign Language,Support Vector Machine, Linear Discriminant Analysis , and Local Binary Pattern .
INTRODUCTION Sign language is a language used by physically impaired persons. Itis language which uses hand gestures to convey the appropriate meaning,opposite to that of acoustically conveyed sound patterns. It is analogous tospoken languages and it is a reason why linguists consider it to be one of thenatural languages, but there are also some notable variations form spoken languages . Though sign language is usedover the globe, it is not universal. Several Hundreds of these are in use,which vary from place to place and are at the core of local deaf cultures.Some sign languages have achieved recognition legally, while some have nothing.
Regionally American Sign Language , German Sign Language ,French Sign Language , British Sign Language,Indian Sign Language, etc. have been evolved. Indian Sign Language isone of the oldest known sign languages and is considered extremely important inits history but is rarely used nowadays. In linguistic notations, in spite ofthe common gossip that they are not real languages, sign languages are as richand complex as any spoken language. Study on these languages by professionallinguists found that many sign languagesexhibit the basic properties of all the spoken languages . The elements of asign are Hand shape, or palm Orientation, Movement , and facial Expressionsummarized in the acronym HOLME.
The core concept behind the method proposed isto exploit a novel combination of color, depth, and geometric informationof hand sign to increase recognitionperformance while most approaches only attempt to use a combination of two orless . This enables to recognise a vast range of signs though they appear to bevery common. Fig.1. Overview of the proposed hand poserecognition system.II.
LITERATURE SURVEY Researchersface a very challenging problem of improvising a vision based human computerinteraction system for interpretating sign languages . This survey conveystheoretical and literature foundation . The researches based on sign languagesand the challenges faced are reviewed . Some of the problems that spoken andwritten language of a country is differs from other countries. The syntax andsemantics of a language is varies from one region to another in spite of thefact that same language has been used by several countries. For instance,English is the official language of many nations including the UK, the USA. Theusage of English differs at country level. Also the sign language also variesfrom one country to another.
The focus of this survey is on improvisation of sign languages atglobal level . Earlier, to obtain data for SLI, data gloves and alsoaccelerometers were used for specification of hand. Orientation and velocity,inaddition to location were measured using tracker and/ or data gloves. Thesemethods gave exact positions, but they had the disadvantage of high cost andrestricted movements, which changed the signs.
These disadvantages made visionbased systems come into screens and gain popularity . Sequence of images arecaptured from a combination of cameras ,as the input of vision based systems. Monocular, stereo and/or orthogonal cameras are used to capture a sequence of images. External lightsources were used to illuminate the scene and also a multi-view geometry toconstruct deeper image by Feris and team. Proposals of the advances in the concepts ofhybrid classification architectures with the consideration of hand gesture andface recognition was done by Xiaolong Zhu and team.
They built thehybrid architecture by the use of ensemble of connectionist networks- radialbasis functions and inductive decision trees, which helps in the combination ofmerits of holistic template matching with abstractive matching using discreteproperties and subject to both positive and negative learning. Investigation ofeffective body gesture in video sequences beyond facial reactions was done by C. Huang and team. Proposal to fuse body gesture and facial expressions atthe feature level using Canonical Correlation Analysis was given by them. Anintegration of hand gesture and face recognition was proposed by Z. Ren and team. They argued that face recognition rate could be better byrecognition of hand gestures. They have proposed security lift scenario.
Theymade it clear that the combination of two search engines that they proposed isgeneric and it is not shrunken to face and hand gesture recognition purposesalone. III. HAND GESTURE RECOGNITIONIn a sign language , asign consists of three main parts which include manual features, non-manualfeatures and finger spelling . Forthe interpretation of the meaning of a sign, analysis of all these parameters areto be done simultaneously. Sign language poses an important challenge of beingmultichannel. Every channel in the system is separately built , analysed andthe corresponding outputs are combined at the final level to come to a conclusion.The research in Sign Language Interpretation started with HandGesture Recognition.
Hand gestures are most commonly used in human non-verbalcommunication by hearing impaired and speech impaired persons. Sometimes normalpeople too use sign languages for communicating. But still sign language is notuniversal. Sign languages do exist in places where hearing impaired people live.To make communication between them and normal people simple and effective, itis essential that this process might be automated.
Number of methodologies havebeen developed for automating HGR. The overall process of Hand GestureRecognition system is shown as block diagram in figure 2. There are threesimilar steps in HGR:1. Hand acquisition that dealswith hand extraction from a given static image and tracking and hand extractionfrom a video.2.
Feature extraction that dealswith compressed representation of data that will enable the recognition of thehand gesture.3. Classification/ recognition ofthe hand gesture following some rules.Fig. 2. Block Diagram for Process of Hand Gesture Recognition IV.
DATA SETS ACQUISTION Two different data sets are made use in ISLrecognition system in this survey. Thedata sets are ISL digits (0-9) and single handed ISL alphabets (A-Z). For thepurpose of data set acquisition, dark background for uniformity and easy inmanipulation of images for feature extraction and division is preferred. Adigital camera, Cyber shot H70, is used for capturing images. All the imagesare captured with flash light in an intelligent auto mode. The usual file formatJPEG is used to capture images.
Each original image is 4608×3456 pixel andrequires roughly about 5.5 MB storage space. To create an efficient data setwith a reasonable size, the images are cropped to 200×300 RGB pixels and barely25 KB memory space is required per image. The data set is collected from 100signers. Out of these signers, 69 are male and 31 are female with average agegroup of 27. The average height of a signer is about 66 inches.
The data setcontains isolated ISL numerical signs (0-9). Five images per ISL digit sign iscaptured from each signer. Therefore, a total of 5,000 images are available inthe data set. The sample images of the data set are shown in figure 3. Fig.3. The ISL Digit Signs Data setIn the data set,totally 2600 images cropped to 200×300 RGB pixel sizes are available. Theimages are collected from four males and six females.
The backgrounds of signimages are dark, as only hand orientations are required for the featureextraction process. The images are stored in JPEG format because it can beeasily exported and manipulated in various software and hardware environments.Each preprocessed ISL sign image required nearly 25 KB space for storage with72 dpi. The size of the images is 200×300 pixels. The skin colors of these imagesare neither very dark complexion nor very white complexion. This is due to thereason that the application is proposed on consideration of Indian subcontinentonly.
The colors corresponding to human skins are mainly used in capturing thesign images. The sample data set is shown in figure 4. Fig. 4. The ISL SingleHanded Alphabet Signs Data Sets V.
HANDGESTURE SEGMENTATION USING LINEAR DISCRIMINANT ANALYSIS AND LOCAL BINARYPATTERN To detect hand from background, Segmentationis used . The experimentation in this work is carried out using two datasetsconveying hand gestures performed with one hand for alphabets A to Z usingIndian Sign Language. The images of this dataset before and after preprocessingstage are shown in figure 5. (a) (b)Figure 5. (a) Original images of alphabet ‘A’,(b) Images after RGB to Gray conversion and resizing. Linear Discriminant Analysis (LDA)The Linear Discriminant Analysis (LDA) is usedto perform class speci?c dimension reduction. It ?nds the combination that bestseparates different classes. To ?nd the class separation, LDA maximizes bothbetween class and within class scatters instead of maximizing the overallscatter.
As a result, same class members group together and different classmembers stay far away from each other in the lower dimensions. Let, X be a vector with samples from c classes. Let, X be a vector with samples from c classes. The between class and within class scatters, SB and SWare calculated as follows.Fig.6 Example ofLBP code generation Mean of vector data and mean of the class i,where i = 1; :::; c.LDA ?nds a projection, W that maximizes theclass separation criterion.
The rank of SW is at most (N c), where c is the number of classes and Nis the number of samples . Most of the time the number of samples is less thanthe dimension of the image data in pixels. Principal Component Analysis (PCA)is performed on the image data and projected on a (N to c) dimensional space.LDA is performed on this reduced data. The transformation matrix, W projectingthe sample in to(c 1)dimensional space is,Here, Local Binary Pattern (LBP)Ong and team proposed the LocalBinary Patterns (LBP). It performs local operations on the neighborhood of animage pixel. The neighborhood of a pixel is the pixel adjacent to a particularpixel.
In LBP an 8 bit binary code is for a 3 X 3 pixel neighborhood of image I is, Fig.7 Example of gesture model generation using LDA featuresLocal Binary Pattern (LBP)was proved to be very efficient means for image representation and have beenapplied in various analysis. The LBPs are tolerant against monotonicillumination changes and are able to detect various texture primitives likecorner, line end, spot, edge, etc. The most popular and efficient version ofLBP i.e. Block LBP (figure 8) with uniform / no uniform patterns is used as thefirst methodology for the extraction of hand features .
Figure. 8. Local binary patterns histogramgeneration for color image: output in order from (a) to (e).(a) color image(b) gray scale version of color image(c) LBP image representing colorinformation (LBP colour) (d) LBP colour divided into 16 regions(e) concatenated histogramoutput for LBP colour (color feature vector ?c).VI. FEATUREEXTRACTION USING SUPPORT VECTOR MACHINE CLASSIFIERThefeature extraction approaches in imageprocessing, acquires valuable information present in an image. This deals with conversion of high dimensionaldata space into lower dimensional data space. The lower dimensional dataextracted from the images should be containing accurate and precise informationwhich is the representation of the actual image.
The image can be reconstructed from the lowerdimensional data space. The lower dimensional data is required as input to anyclassification methodology as it is not possible to process higher dimensionaldata with accuracy and speed. The inputs to an automatic sign language recognition system are either static signs (images) or dynamic signs (video frames) . In order to divide input signs in an automatic signlanguage recognition system, acquisiton of valuable features from signs isrequired. All the algorithms that are used for facial feature extraction areused for Hand feature extraction as well. Classification is an essential part ofmachine learning. The technique is used to classify each item in a data setinto one of a predefined set of groups. Classification methods use mathematicalmodels including decision trees decision trees, linear programming, neuralnetworks and statistics for pattern classification.
In classification, asoftware module is created that shall learn the art of dividing the data itemsinto different groups. With initial experimentation using multiclass SVM anddecision trees, a huge number of misfits have been identified in the process ofclassification. Hence these classifiers are not further used for finalexperimentation towards recognition.During SVM classification, if more than onesign returns a positive match for a test image pair, the template matchingprocess is executed. At first, the test image pair is checked with all thesigns which returned a positive match if it falls within the range of height towidth ratios of that sign defined by rmin and rmax. If the range of ratios of a sign does not fall into that of thetest image pair, the sign will not be considered as a positive match in thesubsequent template matching steps. The cosine distance d cosine is thencalculated between the feature vector ? of the test image pair and the averagefeature vector ? avg of each sign that returned a positive match.
An edgetemplate similarity metric sedge is also calculated . Here a bitwise ANDoperation is done between the edge template of the test image pair Xtest andthe edge template of each sign that returned a positive match Xsign. The sum ofthe number of white pixels in the resulting image is considered to edge. Inspite of the image pairs being in different sizes, the resizing of the edgetemplate into a standard size allows a direct bitwise AND operation to beperformed.
The totalsimilarity metric stot is then defined according to (4). Here ? = 0.001 and ? = 1.2 werechosen as it produced optimum results. The sign for which the similarity metricstot returns a maximum will be considered as the final output sign. Figure 9. Output of ISL Digits Produced by the SystemAlthough26 classes are present in ISL single handed alphabet, the system is able topredict single handed characters with more than 95% accuracy.
This is possiblewith LBP and SVM feature extraction technique. A sample output is shown forsingle handed ISL sign ‘B’ in the fig 9. The input sign image is processedthrough the system and a prediction is shown in the right-hand side of theoutput screen. The sign interpreted as single handed ‘B’ which is the correctprediction. Performance ofSign Language Interpretation SystemFor sign languageinterpretation, N-fold cross validation method was used with the N = 5. For asingle hand (left or right) each fold is consisting of 200 images. The systemis trained using 800 images from four of the ?ve folds and tested against theremaining fold of 200 images. For both hands, each fold has 400 images.
Thesystem is trained using 1600 images from four of ?ve folds and tested againstthe remaining fold of 400 images. The system was tested under three criteriathat are Sign gestures performed by left, right and both hands. The accuracy ofall these criteria is measured using the following condition, where NC is thenumber correctly classi?er sign gestures and N is the number of all test signgestures.Table 1. comparison withother work using dataset Method Training Samples Overall Accuracy LDA 40 75% LBP 40 88.94% LBP – SVM 40 92.14% An overallaccuracy of 92.
14% was obtained with a relatively small training dataset. Itcould be seen that the system managed well with the variation of individualsigns caused by different users as well as the similarity that exist amongdifferent signs. V.CONCLUSION A vision-based automatic sign language recognitionwhich enables to recognize sentences in Indian Sign Language was presented inthis work. Several features and different methods to join them were investigatedand experimentally carried out. Tracking algorithms with applications to handand head tracking were presented and experiments were carried out to determinethe parameters of these algorithms. An emphasis was put on appearance-basedfeatures that use the images itself to represent signs. Other systems forautomatic sign language recognition usually require a segmentation of inputimages to calculate features for the segmented image parts.
The algorithm wasdesigned to function in real-time without requiring excessive computationalpower. The results reveal that it is possible to train the system to recognizemore static Indian Sign Language hand signs while maintaining high accuracy. Itis also feasible to build on the framework to recognize dynamic sign language.Future depth sensor technology with higher depth and higher colour resolution andmore accurate skeletal tracking has the potential to improve the results of theproposed algorithm to a greater extennt. The results conveyed in this work showthat the usage of appearance-based features yields a promising recognitionperformance.