EECS 349, Machine Learning

  Insturctor: Prof. Doug Downey
By Majed Valad Beigi
majed.beigi@northwestern.edu
Northwestern University
home Introduction Design Back Propagation NN LAMSTAR NN SVM Results Conclusion

Offline HandWritten Character Recognition
 

You can download the full project report and source code here:
FullReport_Source Code

Introduction

The development of handwriting recognition systems began in the 1950s when there were human operators whose job was to convert data from various documents into electronic format, making the process quite long and often affected by errors. Automatic text recognition aims at limiting these errors by using image preprocessing techniques that bring increased speed and precision to the entire recognition process. Handwriting recognition has been one of the most fascinating and challenging research areas in field of image processing and pattern recognition in the recent years. It contributes immensely to the advancement of automation process and improves the interface between man and machine in numerous applications. Optical character recognition is a field of study than can encompass many different solving techniques. Neural networks (Sandhu & Leon, 2009), support vector machines and statistical classifiers seem to be the preferred solutions to the problem due to their proven accuracy in classifying new data [1].

The Optical Character Recognizer actually is a convertor which translates handwritten text images to a machine based text. In general, handwriting recognition is classified into two types as off-line and on-line.  In the off-line recognition, the writing is usually capture optically by a scanner and the completed writing is available as an image. In other words, Offline Handwritten Text is when hand written text is scanned by a scanner into a digital format. But, in the on-line system the two dimensional coordinates of successive points are represented as a function of time and the order of strokes made by the writer. In other words, X-Y coordinates are given as a result that tells the location of the pen and the force applied by the user during writing and speed too. Online Handwritten Text is written by a stylus on a tablet. There is also a third method which is not as famous as the first two methods mentioned above in which laser, inkjet devices, can be used for obtaining machine printed text [2].

There is extensive work in the field of handwriting recognition, and a number of reviews exist. The on-line methods have been shown to be superior to their off-line counterparts in recognizing handwritten characters due to the temporal information available with the former [3] [4]. However, several applications including mail sorting, bank processing, document reading and postal address recognition require off-line handwriting recognition systems. Moreover, in the off-line systems, the neural networks and support vector machines have been successfully used to yield comparably high recognition accuracy levels. As a result, the off-line handwriting recognition continues to be an active area for research towards exploring the newer techniques that would improve recognition accuracy [5] [6]. Therefore, for this report, I have decided to work on an off-line handwritten alphabetical character recognition system using Back Propagation neural network, LAMSTAR neural network and Support Vector Machine (SVM).

Artificial Neural Network (ANN) is a computing model of brain, having paralleled distributed processing elements that are learned by adjusting the connected weights between the neurons. Due to its flexibility and strength, it has been now broadly used in different fields such as pattern recognition, decision-making optimization, market analysis, robot intelligence [7]. ANN can be more remarkable as computational processors for different tasks like data compression, classification, combinatorial optimization problem solving, pattern recognition etc. [8]. ANN has many advantages over the other classical methods. While having the computational complexity, ANN offered many advantages in pattern recognition adapting a very little context of human intelligence [9]. In the off-line recognition system, the neural networks have emerged as the fast and reliable tools for classification towards achieving high recognition accuracy [10]. Classification techniques have been applied to handwritten character recognition since the 1990s. These methods include statistical methods based on Bayes decision rule, Artificial Neural Networks (ANNs), Kernel Methods including Support Vector Machines (SVM) and multiple classifier combination [11], [12].

I have taken the main idea of this project from [13]. I have chosen to use the image processing Toolbox of MATLAB to solve the image pre-processing stage of the handwritten character recognition problem at hand as the authors of [13] did. In [13], a back propagation Artificial Neural Network is used for performing classification and recognition tasks. However, I have also checked the performance of the LAMSTAR neural network and Support Vector Machine classifier for this problem. Moreover, the authors of [13] have just calculated the average value in the 10*10 sub-matrices of their bigger original matrix obtained from the image of each character, but in this work I have resized the character images into two different sizes (50*70 pixels and 90*120 pixels) initially and got the average value in the 10*10 sub-matrices for the former and in the 15*15 sub-matrices for the latter.