Handwriting Recognition System Using Optical Character Recognition

Received: 26/Mar/2018, Revised: 03/Apr/2018, Accepted: 14/Apr/2018, Online: 30/Jun/ 2018 AbstractThis is an overview of the most recent published approaches to solving the handwriting recognition problem. This paper is aimed at clarifying the role of handwriting recognition in accordance with today's maturing technologies. It tries to list and clarify the components that build handwriting recognition and related technologies such as OCR (Optical Character Recognition) and Signature Verification. This paper could also be regarded as a survey of handwriting recognition and related topics with a rich list of references for the interested reader. A level of practicality of use of this technology for different languages and cultures is also discussed.


I. INTRODUCTION
One of the most obvious reasons that handwriting recognition capabilities are important for future personal systems is the fact that in crowded rooms or public places one might not wish to speak to his computer due to the confidentiality or personal nature of the data.
Another reason for the practicality of a system which would accept hand-input is that with today's technology it is possible to have handwriting recognition in very small handheld computers; however speech systems could not yet be made as small as in a standalone hand-held machine. It is much easier to dictate something than to write it, in regard to desktop computers.
Character recognition deserves attention owing to the fact that there is a wide variety of styles associated with the character. Now, with advances in technology, it is possible to scan a page of structured handwritten text and the convert engine can quickly use OCR software handwriting recognition to convert it to a word processing document.

LITERATURE SURVEY
The overwhelming volume of paper-based data in corporations and offices challenges their ability to manage documents and records. Computers, working faster and more efficiently than human operators, can be used to perform many of the tasks required for efficient document and content management. Computers understand alphanumeric characters as ASCII code typed on a keyboard where each character or letter represents a recognizable code.
Optical character recognition system (OCR) allows us to convert a document into electronic text, which we can edit and search etc. It is performed off-line after the writing or printing has been completed, as opposed to on-line recognition where the computer recognizes the characters as they are written. For these systems to effectively recognize hand-printed or machine printed forms, individual characters must be well separated. This is the reason why most typical administrative forms require people to enter data into neatly spaced boxes and force spaces between letters entered on a form. Without the use of these boxes, conventional technologies reject fields if people do not follow the structure when filling out forms, resulting in a significant overhead in the administration cost.
Optical character recognition for English has become one of the most successful applications of technology in pattern recognition and artificial intelligence. OCR is the machine replication of human reading and has been the subject of intensive research for more than five decades. To understand the evolution of OCR systems from their challenges, and to appreciate the present state of the OCRs, a brief historical survey of OCRs is in order now.
In this model, a modified quadratic classifier based scheme to recognize the offline handwritten numerals of six popular Indian scripts is proposed. Multilayer perceptron has been used for recognizing Handwritten English characters using Optical character recognition. The features are extracted from Boundary tracing and their Fourier Descriptors. The character is identified by analyzing its shape and comparing its features that distinguish each character. Also an analysis has been carried out to determine the number of hidden layer nodes to achieve high performance of the back propagation network. [2]A recognition accuracy of 94% has been reported for Handwritten English characters with less training time and space.

III. PROPOSED APPROACH a: Generic Handwriting Recognition Process
In most systems, the data signal undergoes some iteration process. Then the signal is normalized to a standard size and its slant and slope is corrected. After normalization, the writing is usually segmented into basic units and each segment is classified and labeled. Using a search algorithm in the context of a language model, the most likely path is then returned to the user as the intended string.

b: Principal lines of a word
The area surrounded by the base-line and the mid-line is the only part of any word which is always non-empty. [3] This makes this area the most reliable portion of the data for usage in size normalization. Once accurate estimates of the base-line and the mid-line are given, a magnification factor could be computed from the ratio of the nominal mid-portion size and that of the input. The entire input data may then be magnified using the obtained magnification factor.
Other possible normalizations are slant and slope correction. In slant correction, usually some mean dominant vertically oriented slope is computed. The slope of the data is then ousted using the difference between a slope of the vertical axis and the computed slope. This correction is usually done through shearing since for small deformations shearing is a good approximation for rotation. [5] Slope correction is usually an iterative process which uses both of the above normalizations to estimate and re-estimate the slope of the base-line and then the data is slope-corrected by shearing it along the vertical axis such that the base-line becomes horizontal.

c: Segmentation
Hand input is classified into different types.. In case of boxed-discrete input, basically the writer is segmenting his writing into separate characters. This is probably the simplest form of writing to be recognized. In the second type, the writer once again aids the recognizer in segmenting the writing into individual characters. [2,5] In this case, the problem of segmenting the data into separate characters is solved by ending those gaps between successive chunks of data in the horizontal direction which are greater than a statistically obtained threshold. In run-on writing, the problem of segmenting the word into characters becomes nontrivial. [3] In this case, the characters could even overlap such that gap information is no longer sufficient for character segmentation. The only restriction which is imposed on the method of writing run-on is that the pen should be lifted from the surface of the digitizer after each individual character is inputted.

d: Feature Extraction and Shape Classification
Once the writing is segmented into smaller units, these units are sent to a module which extracts features in the data, essential to the employed shape classification algorithm.

E: Error Analysis
• Current system errors stem from writing style differences.
• Additional data from Boeing will help address the problem.

F (1) Predictable Models:
It is possible to reduce the number of hypotheses which are explored by the search process. One way is to take advantage of the statistics available on the likelihood of certain characters following a given string of characters. These models work in the same manner as the above character model by using statistics of certain words following a string of given words and grammatical restrictions. These ideas have started being used in handwriting recognition. [3]Word-level language models might not be very useful for everyday usage in handwriting recognition. Given the slow nature of writing, most people use handwriting recognition in an ungrammatical manner with lots of abbreviated and broken sentences. It is very difficult to handle these situations if a grammatical restriction is imposed. [2]The idea behind this choice is that electronic mail is usually very informal and portrays the same style of writing as might be used on a pen-computer. In some cases, the recognizer is expecting a especially syntaxes entry or it is expecting a subset of the normally supported characters.
Using this information, many search paths could be pruned out in favor of faster recognition speeds and higher accuracy. [2,6] An example of such cases is an end which expects an auto license number. Most countries have a set format for their licenses.
This format could be used in re-estimating the probabilities of members of a character hypothesis list. [4] In general, the start and end point of the letter. The points where the tangent is horizontal or vertical and the corner points. Where the tangent is ambiguous, separate the arc segments of the template from.

g: Applications:
Handwriting recognition allows more efficient drafting and document generation as well as applications such as form lining and keyboard less interface with a computer. As the handwriting recognition technology becomes more mature, applications such as longhand note-taking in the class-room are going to be more of a reality. [3,6] Professionals could write their documents and have them converted to text instantly without having to go through a few iterations of having their secretary type the document for them. OCR could speed up postal delivery of mail pieces. Check amounts could be recognized automatically and without any human intervention.
Signature verification could be done on-line and through a communication link while the credit-card user is purchasing merchandise.

V. CONCLUSION
The training set contained 245 characters with 10 samples from each character class. All the data were obtained from four writers on the preformatted papers. Writers were allowed to write freely with a varying frequency of characters of each class. The segmentation procedure was able to correctly identify all the reference lines.

VI. ACKNOWLEDGEMENT
During this ongoing research I was been lucky to have such a supportive partner who helped me a lot in mathematical calculation. With the heroics of the depth knowledge inbuilt I would like to thank Ms. Preeti Gangania who helped in letting us understand the domain knowledge, Prof. Anshu Sharma for helping us understanding the Automate system through Automata Theory, Prof. Gaurav Chaudhary who believes in us and provides us support and always standing as our backbone to remove all hurdles. In the end I would like to thanks my mother and God for their showering blessings and also Ms. Shreshtha Garg and Ms. Sonam Agarwal for always supporting and understanding me and having trust with such a dedication. Thanks to all for believing in us.