A 63-Way Character Recognition / Ismail Mohamed Sobhy

By:

Ismail Mohamed Sobhy

Material type: Text

TextLanguage: English Summary language: English Publication details: 2014Description: 96 p. ill. 21 cmSubject(s):

Genre/Form:

Dissertation, Academic

DDC classification:

Contents:

Contents: 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Advantages Of Acquiring Text . . . . . . . . . . . . . . . . 1 1.1.2 Types of Text . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Text Extraction in Videos and Images . . . . . . . . . . . 2 1.2 Aim of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background and Related Work 5 2.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Natural Scene Text . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Text Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 The Traditional Text Extraction Pipeline . . . . . . . . . . . . . . 7 2.4.1 Text Detection and Localization . . . . . . . . . . . . . . . 8 2.4.1.1 Region Based Text Detection and Localization Technique: Stroke Width Transform [1] . . . . . 8 2.4.1.2 Texture Based Detection: A Laplacian Method for Video Text Detection [2] . . . . . . . . . . . . 12 v CONTENTS 2.4.2 Text Frame Selection/Classication . . . . . . . . . . . . . 13 2.4.2.1 Text Tracking: An Eective Video Text Tracking Algorithm Based on SIFT Feature and Geometric Constraint [3] . . . . . . . . . . . . . . . . . . . . 14 2.4.3 Text Enhancement . . . . . . . . . . . . . . . . . . . . . . 18 2.4.3.1 Text Enhancement: Edge based Binarization for Video Text Images[4] . . . . . . . . . . . . . . . . 19 2.4.4 Character Recognition . . . . . . . . . . . . . . . . . . . . 22 2.4.4.1 Tesseract . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Alternative Methods . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.1 Character Detection . . . . . . . . . . . . . . . . . . . . . 24 2.5.2 Non-Maximal Suppression . . . . . . . . . . . . . . . . . . 25 2.5.3 Character Recognition . . . . . . . . . . . . . . . . . . . . 26 2.6 Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Algorithms Behind The Proposed Character and Background Recognition 29 3.1 Perface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Algorithms For Feature Extraction . . . . . . . . . . . . . . . . . 31 3.2.1 Histogram of Oriented Gradients (HOG) . . . . . . . . . . 31 3.2.2 Felzenszwalb's HOG[5] . . . . . . . . . . . . . . . . . . . . 34 3.3 Used Classier: Support Vector Machines(SVMs) . . . . . . . . . 37 3.3.1 Hard Margin . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.1.1 Primal Form . . . . . . . . . . . . . . . . . . . . 39 3.3.1.2 Dual Form . . . . . . . . . . . . . . . . . . . . . 39 3.3.2 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3.2.1 Linear Kernel . . . . . . . . . . . . . . . . . . . . 41 3.3.2.2 Radial Basis Function . . . . . . . . . . . . . . . 42 3.3.3 Soft Margin . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.3.1 Updated Primal Form . . . . . . . . . . . . . . . 43 3.3.3.2 Updated Dual Form . . . . . . . . . . . . . . . . 44 3.4 Grid Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 vi CONTENTS 3.5 Data Synthesization: Synthetic Minority Over-sampling TEchnique (SMOTE)[6] . . . . . . . . . . . . . . . 46 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4 Results and Discussion 51 4.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.1 Histogram of Oriented Gradients (HOG) and Felzenzwalb's HOG Conguration . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Grid Search Conguration . . . . . . . . . . . . . . . . . . 52 4.2.3 Support Vector Machines (SVMs) Conguration . . . . . . 53 4.2.4 Available Datasets . . . . . . . . . . . . . . . . . . . . . . 53 4.2.5 Datasets' Requirements . . . . . . . . . . . . . . . . . . . 55 4.2.6 Datasets Combinations . . . . . . . . . . . . . . . . . . . . 57 4.2.6.1 First Combination . . . . . . . . . . . . . . . . . 57 4.2.6.2 Second Combination . . . . . . . . . . . . . . . . 58 4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.1 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.2 F-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4 Experiments & Results . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4.1 63 Classes with HOG . . . . . . . . . . . . . . . . . . . . . 63 4.4.2 63 Classes with FHOG . . . . . . . . . . . . . . . . . . . . 65 4.4.3 62 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4.4 62 and 63 Classes vs [7] . . . . . . . . . . . . . . . . . . . 67 4.4.5 56 Classes with HOG . . . . . . . . . . . . . . . . . . . . . 68 5 Conclusion and Future work 71 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Bibliography

Dissertation note: Thesis (M.A.)—Nile University, Egypt, 2014 . Abstract: Abstract: Text extraction from documents is an essential task which proved its contribution to many applications. The same benets can be gained when it comes for images and videos. However, text extraction from images and videos is not as advanced as in the case of documents. It is trailing behind because of the characteristics of natural scene text. One of the most important operations in any text extraction pipeline is the character recognition module. Character recognition is a substantial phase which needs to be performed in a better way so that it is improved and its dependence on other phases decrease. Its renement means better results for the whole text extraction process. With many advancements in object detection methods, it is an opportunity to introduce the same methodologies used for objects on text that appears in images and videos. The thesis aims to have a combined 63-way character recognition to deal with characters and background at same time. Moreover, the thesis deals with preparing datasets and synthesizing samples to train character recognition. The nal results will be compared versus normal 62-way character recognition which only works for characters.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Date due	Barcode
Thesis	Main library	610/ IS.A 2014 (Browse shelf(Opens below))	Not For Loan

Supervisor: Mohamed A. El-Helw

Thesis (M.A.)—Nile University, Egypt, 2014 .

"Includes bibliographical references"

Contents:
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Advantages Of Acquiring Text . . . . . . . . . . . . . . . . 1
1.1.2 Types of Text . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Text Extraction in Videos and Images . . . . . . . . . . . 2
1.2 Aim of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background and Related Work 5
2.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Natural Scene Text . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Text Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 The Traditional Text Extraction Pipeline . . . . . . . . . . . . . . 7
2.4.1 Text Detection and Localization . . . . . . . . . . . . . . . 8
2.4.1.1 Region Based Text Detection and Localization
Technique: Stroke Width Transform [1] . . . . . 8
2.4.1.2 Texture Based Detection: A Laplacian Method
for Video Text Detection [2] . . . . . . . . . . . . 12
v
CONTENTS
2.4.2 Text Frame Selection/Classication . . . . . . . . . . . . . 13
2.4.2.1 Text Tracking: An Eective Video Text Tracking
Algorithm Based on SIFT Feature and Geometric
Constraint [3] . . . . . . . . . . . . . . . . . . . . 14
2.4.3 Text Enhancement . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3.1 Text Enhancement: Edge based Binarization for
Video Text Images[4] . . . . . . . . . . . . . . . . 19
2.4.4 Character Recognition . . . . . . . . . . . . . . . . . . . . 22
2.4.4.1 Tesseract . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Alternative Methods . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5.1 Character Detection . . . . . . . . . . . . . . . . . . . . . 24
2.5.2 Non-Maximal Suppression . . . . . . . . . . . . . . . . . . 25
2.5.3 Character Recognition . . . . . . . . . . . . . . . . . . . . 26
2.6 Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Algorithms Behind The Proposed Character and Background
Recognition 29
3.1 Perface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Algorithms For Feature Extraction . . . . . . . . . . . . . . . . . 31
3.2.1 Histogram of Oriented Gradients (HOG) . . . . . . . . . . 31
3.2.2 Felzenszwalb's HOG[5] . . . . . . . . . . . . . . . . . . . . 34
3.3 Used Classier: Support Vector Machines(SVMs) . . . . . . . . . 37
3.3.1 Hard Margin . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1.1 Primal Form . . . . . . . . . . . . . . . . . . . . 39
3.3.1.2 Dual Form . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2.1 Linear Kernel . . . . . . . . . . . . . . . . . . . . 41
3.3.2.2 Radial Basis Function . . . . . . . . . . . . . . . 42
3.3.3 Soft Margin . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3.1 Updated Primal Form . . . . . . . . . . . . . . . 43
3.3.3.2 Updated Dual Form . . . . . . . . . . . . . . . . 44
3.4 Grid Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
vi
CONTENTS
3.5 Data Synthesization: Synthetic Minority
Over-sampling TEchnique (SMOTE)[6] . . . . . . . . . . . . . . . 46
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Results and Discussion 51
4.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Histogram of Oriented Gradients (HOG) and Felzenzwalb's
HOG Conguration . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Grid Search Conguration . . . . . . . . . . . . . . . . . . 52
4.2.3 Support Vector Machines (SVMs) Conguration . . . . . . 53
4.2.4 Available Datasets . . . . . . . . . . . . . . . . . . . . . . 53
4.2.5 Datasets' Requirements . . . . . . . . . . . . . . . . . . . 55
4.2.6 Datasets Combinations . . . . . . . . . . . . . . . . . . . . 57
4.2.6.1 First Combination . . . . . . . . . . . . . . . . . 57
4.2.6.2 Second Combination . . . . . . . . . . . . . . . . 58
4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 F-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Experiments & Results . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.1 63 Classes with HOG . . . . . . . . . . . . . . . . . . . . . 63
4.4.2 63 Classes with FHOG . . . . . . . . . . . . . . . . . . . . 65
4.4.3 62 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.4 62 and 63 Classes vs [7] . . . . . . . . . . . . . . . . . . . 67
4.4.5 56 Classes with HOG . . . . . . . . . . . . . . . . . . . . . 68
5 Conclusion and Future work 71
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Bibliography

Abstract:
Text extraction from documents is an essential task which proved its
contribution to many applications. The same benets can be gained
when it comes for images and videos. However, text extraction from
images and videos is not as advanced as in the case of documents. It
is trailing behind because of the characteristics of natural scene text.
One of the most important operations in any text extraction pipeline
is the character recognition module. Character recognition is a substantial
phase which needs to be performed in a better way so that it
is improved and its dependence on other phases decrease. Its renement
means better results for the whole text extraction process. With
many advancements in object detection methods, it is an opportunity
to introduce the same methodologies used for objects on text that
appears in images and videos. The thesis aims to have a combined
63-way character recognition to deal with characters and background
at same time. Moreover, the thesis deals with preparing datasets and
synthesizing samples to train character recognition. The nal results
will be compared versus normal 62-way character recognition which
only works for characters.

Text in English, abstracts in English.

There are no comments on this title.

to post a comment.