Ahmed Mamdouh Abd el-kariem

Sequence to Sequence Learning for Unconstrained Scene Text Recognition / Ahmed Mamdouh Abd el-kariem - 2015 - 49 p. ill. 21 cm.

Supervisor: Mohamed A. El-Helw

Thesis (M.A.)—Nile University, Egypt, 2015 .

"Includes bibliographical references"

Contents:
Abstract ........................................................................................................................................................................... vi
Keywords......................................................................................................................................................................... vi
List of Figures .................................................................................................................................................................. vii
Introduction .............................................................................................................................................. 1
1.1 Motivation ........................................................................................................................................................ 1
1.2 Objectives ......................................................................................................................................................... 1
Background ............................................................................................................................................... 3
2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4
2.2 Convolutional Neural Networks ........................................................................................................................ 4
2.2.1 Convolutional Layer .............................................................................................................................. 5
2.2.2 Pooling Layer ........................................................................................................................................ 5
2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6
2.3.1 Binary Classification .............................................................................................................................. 7
2.3.2 Multiclass Classification ........................................................................................................................ 7
2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10
2.4.1 Recurrent Neural Networks ................................................................................................................ 10
2.4.2 Constant Error Carrousels ................................................................................................................... 12
State‐of‐the‐Art Approaches .................................................................................................................... 14
3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14
3.2 Character Sequence Encoding ......................................................................................................................... 15
3.3 N‐gram Encoding ............................................................................................................................................ 15
3.4 The Joint Model .............................................................................................................................................. 16
3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17
Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18
4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18
4.2 Training ........................................................................................................................................................... 19
4.3 Lasagne ........................................................................................................................................................... 20
Experiments ............................................................................................................................................ 21
5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21
5.2 LSTM Architecture Experiments ...................................................................................................................... 23
5.2.1 The models’ architecture .................................................................................................................... 23
5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28
5.4 Generalisation Experiment .............................................................................................................................. 31
Traffic sense ............................................................................................................................................ 33
6.1 Extracting suitable corner points and their trajectories .................................................................................. 33
Acknowledgements
iv
6.2 Initial Clustering .............................................................................................................................................. 34
6.3 Adaptive Background Construction ................................................................................................................. 34
6.4 Extracting bounding boxes around vehicles .................................................................................................... 35
6.5 Experimental results and analysis.................................................................................................................... 35
Conclusion .............................................................................................................................................. 37
7.1 Achieved results ............................................................................................................................................. 37
7.2 Future development ....................................................................................................................................... 37
Appendix A ..................................................................................................................................................................... 38
Appendix B ..................................................................................................................................................................... 39
Datasets ......................................................................................................................................................................... 39
References ...................................................................................................................................................................... 41

In this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition.
We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture
followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters
and uses them with a softmax layer to detect sequence of characters. While the CNN gives very
good recognition results, it does not model relation between characters, hence gives rise to false positive
and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing
background patches with characters; either removing existing characters or adding non‐existing
ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode
contextual information. We show that the LSTM can dramatically reduce such errors and achieve
state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we
manually remove all occurrences of the words that exist in the test set from our training set to test
whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and
compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for
traffic monitoring.
Keywords
CNN: Convolutional Neural Networks
LSTM: Long Short Term Memory
SVM: Support Vector Machines
HOG: Histogram of Oriented Gradient
ICDAR: International Conference on Document Analysis and Recognition
RNN: Recurrent Neural Networks
BPTT: Back Propagation Through Time
FNN: Feedforward Neural Networks
BLSTML: Bidirectional Long Short Term Memory layer
OCR: Optical Character Recognition
SIFT: Scale Invariant Feature Transform
JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model
JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model

Text in English, abstracts in English .

Subjects--Topical Terms:
Informatics-IFM

Index Terms--Genre/Form:
Dissertation, Academic

Dewey Class. No.: 610