Sequence to Sequence Learning for Unconstrained Scene Text Recognition / Ahmed Mamdouh Abd el-kariem

By:

Ahmed Mamdouh Abd el-kariem

Material type: Text

TextLanguage: English Summary language: English Publication details: 2015Description: 49 p. ill. 21 cmSubject(s):

Genre/Form:

Dissertation, Academic

DDC classification:

Contents:

Contents: Abstract ........................................................................................................................................................................... vi Keywords......................................................................................................................................................................... vi List of Figures .................................................................................................................................................................. vii Introduction .............................................................................................................................................. 1 1.1 Motivation ........................................................................................................................................................ 1 1.2 Objectives ......................................................................................................................................................... 1 Background ............................................................................................................................................... 3 2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4 2.2 Convolutional Neural Networks ........................................................................................................................ 4 2.2.1 Convolutional Layer .............................................................................................................................. 5 2.2.2 Pooling Layer ........................................................................................................................................ 5 2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6 2.3.1 Binary Classification .............................................................................................................................. 7 2.3.2 Multiclass Classification ........................................................................................................................ 7 2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10 2.4.1 Recurrent Neural Networks ................................................................................................................ 10 2.4.2 Constant Error Carrousels ................................................................................................................... 12 State‐of‐the‐Art Approaches .................................................................................................................... 14 3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14 3.2 Character Sequence Encoding ......................................................................................................................... 15 3.3 N‐gram Encoding ............................................................................................................................................ 15 3.4 The Joint Model .............................................................................................................................................. 16 3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17 Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18 4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18 4.2 Training ........................................................................................................................................................... 19 4.3 Lasagne ........................................................................................................................................................... 20 Experiments ............................................................................................................................................ 21 5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21 5.2 LSTM Architecture Experiments ...................................................................................................................... 23 5.2.1 The models’ architecture .................................................................................................................... 23 5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28 5.4 Generalisation Experiment .............................................................................................................................. 31 Traffic sense ............................................................................................................................................ 33 6.1 Extracting suitable corner points and their trajectories .................................................................................. 33 Acknowledgements iv 6.2 Initial Clustering .............................................................................................................................................. 34 6.3 Adaptive Background Construction ................................................................................................................. 34 6.4 Extracting bounding boxes around vehicles .................................................................................................... 35 6.5 Experimental results and analysis.................................................................................................................... 35 Conclusion .............................................................................................................................................. 37 7.1 Achieved results ............................................................................................................................................. 37 7.2 Future development ....................................................................................................................................... 37 Appendix A ..................................................................................................................................................................... 38 Appendix B ..................................................................................................................................................................... 39 Datasets ......................................................................................................................................................................... 39 References ...................................................................................................................................................................... 41

Dissertation note: Thesis (M.A.)—Nile University, Egypt, 2015 . Abstract: In this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very good recognition results, it does not model relation between characters, hence gives rise to false positive and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing background patches with characters; either removing existing characters or adding non‐existing ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode contextual information. We show that the LSTM can dramatically reduce such errors and achieve state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we manually remove all occurrences of the words that exist in the test set from our training set to test whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for traffic monitoring. Keywords CNN: Convolutional Neural Networks LSTM: Long Short Term Memory SVM: Support Vector Machines HOG: Histogram of Oriented Gradient ICDAR: International Conference on Document Analysis and Recognition RNN: Recurrent Neural Networks BPTT: Back Propagation Through Time FNN: Feedforward Neural Networks BLSTML: Bidirectional Long Short Term Memory layer OCR: Optical Character Recognition SIFT: Scale Invariant Feature Transform JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Date due	Barcode
Thesis	Main library	610/ A.M.S 2015 (Browse shelf(Opens below))	Not for loan

Supervisor: Mohamed A. El-Helw

Thesis (M.A.)—Nile University, Egypt, 2015 .

"Includes bibliographical references"

Contents:
Abstract ........................................................................................................................................................................... vi
Keywords......................................................................................................................................................................... vi
List of Figures .................................................................................................................................................................. vii
Introduction .............................................................................................................................................. 1
1.1 Motivation ........................................................................................................................................................ 1
1.2 Objectives ......................................................................................................................................................... 1
Background ............................................................................................................................................... 3
2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4
2.2 Convolutional Neural Networks ........................................................................................................................ 4
2.2.1 Convolutional Layer .............................................................................................................................. 5
2.2.2 Pooling Layer ........................................................................................................................................ 5
2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6
2.3.1 Binary Classification .............................................................................................................................. 7
2.3.2 Multiclass Classification ........................................................................................................................ 7
2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10
2.4.1 Recurrent Neural Networks ................................................................................................................ 10
2.4.2 Constant Error Carrousels ................................................................................................................... 12
State‐of‐the‐Art Approaches .................................................................................................................... 14
3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14
3.2 Character Sequence Encoding ......................................................................................................................... 15
3.3 N‐gram Encoding ............................................................................................................................................ 15
3.4 The Joint Model .............................................................................................................................................. 16
3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17
Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18
4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18
4.2 Training ........................................................................................................................................................... 19
4.3 Lasagne ........................................................................................................................................................... 20
Experiments ............................................................................................................................................ 21
5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21
5.2 LSTM Architecture Experiments ...................................................................................................................... 23
5.2.1 The models’ architecture .................................................................................................................... 23
5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28
5.4 Generalisation Experiment .............................................................................................................................. 31
Traffic sense ............................................................................................................................................ 33
6.1 Extracting suitable corner points and their trajectories .................................................................................. 33
Acknowledgements
iv
6.2 Initial Clustering .............................................................................................................................................. 34
6.3 Adaptive Background Construction ................................................................................................................. 34
6.4 Extracting bounding boxes around vehicles .................................................................................................... 35
6.5 Experimental results and analysis.................................................................................................................... 35
Conclusion .............................................................................................................................................. 37
7.1 Achieved results ............................................................................................................................................. 37
7.2 Future development ....................................................................................................................................... 37
Appendix A ..................................................................................................................................................................... 38
Appendix B ..................................................................................................................................................................... 39
Datasets ......................................................................................................................................................................... 39
References ...................................................................................................................................................................... 41

In this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition.
We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture
followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters
and uses them with a softmax layer to detect sequence of characters. While the CNN gives very
good recognition results, it does not model relation between characters, hence gives rise to false positive
and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing
background patches with characters; either removing existing characters or adding non‐existing
ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode
contextual information. We show that the LSTM can dramatically reduce such errors and achieve
state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we
manually remove all occurrences of the words that exist in the test set from our training set to test
whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and
compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for
traffic monitoring.
Keywords
CNN: Convolutional Neural Networks
LSTM: Long Short Term Memory
SVM: Support Vector Machines
HOG: Histogram of Oriented Gradient
ICDAR: International Conference on Document Analysis and Recognition
RNN: Recurrent Neural Networks
BPTT: Back Propagation Through Time
FNN: Feedforward Neural Networks
BLSTML: Bidirectional Long Short Term Memory layer
OCR: Optical Character Recognition
SIFT: Scale Invariant Feature Transform
JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model
JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model

Text in English, abstracts in English .

There are no comments on this title.

to post a comment.