Image from Google Jackets

Sequence to Sequence Learning for Unconstrained Scene Text Recognition / Ahmed Mamdouh Abd el-kariem 

By: Material type: TextTextLanguage: English Summary language: English Publication details: 2015Description: 49 p. ill. 21 cmSubject(s): Genre/Form: DDC classification:
  • 610
Contents:
Contents: Abstract ........................................................................................................................................................................... vi Keywords......................................................................................................................................................................... vi List of Figures .................................................................................................................................................................. vii Introduction .............................................................................................................................................. 1 1.1 Motivation ........................................................................................................................................................ 1 1.2 Objectives ......................................................................................................................................................... 1 Background ............................................................................................................................................... 3 2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4 2.2 Convolutional Neural Networks ........................................................................................................................ 4 2.2.1 Convolutional Layer .............................................................................................................................. 5 2.2.2 Pooling Layer ........................................................................................................................................ 5 2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6 2.3.1 Binary Classification .............................................................................................................................. 7 2.3.2 Multiclass Classification ........................................................................................................................ 7 2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10 2.4.1 Recurrent Neural Networks ................................................................................................................ 10 2.4.2 Constant Error Carrousels ................................................................................................................... 12 State‐of‐the‐Art Approaches .................................................................................................................... 14 3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14 3.2 Character Sequence Encoding ......................................................................................................................... 15 3.3 N‐gram Encoding ............................................................................................................................................ 15 3.4 The Joint Model .............................................................................................................................................. 16 3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17 Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18 4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18 4.2 Training ........................................................................................................................................................... 19 4.3 Lasagne ........................................................................................................................................................... 20 Experiments ............................................................................................................................................ 21 5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21 5.2 LSTM Architecture Experiments ...................................................................................................................... 23 5.2.1 The models’ architecture .................................................................................................................... 23 5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28 5.4 Generalisation Experiment .............................................................................................................................. 31 Traffic sense ............................................................................................................................................ 33 6.1 Extracting suitable corner points and their trajectories .................................................................................. 33 Acknowledgements iv 6.2 Initial Clustering .............................................................................................................................................. 34 6.3 Adaptive Background Construction ................................................................................................................. 34 6.4 Extracting bounding boxes around vehicles .................................................................................................... 35 6.5 Experimental results and analysis.................................................................................................................... 35 Conclusion .............................................................................................................................................. 37 7.1 Achieved results ............................................................................................................................................. 37 7.2 Future development ....................................................................................................................................... 37 Appendix A ..................................................................................................................................................................... 38 Appendix B ..................................................................................................................................................................... 39 Datasets ......................................................................................................................................................................... 39 References ...................................................................................................................................................................... 41
Dissertation note: Thesis (M.A.)—Nile University, Egypt, 2015 . Abstract: In this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very good recognition results, it does not model relation between characters, hence gives rise to false positive and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing background patches with characters; either removing existing characters or adding non‐existing ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode contextual information. We show that the LSTM can dramatically reduce such errors and achieve state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we manually remove all occurrences of the words that exist in the test set from our training set to test whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for traffic monitoring. Keywords CNN: Convolutional Neural Networks LSTM: Long Short Term Memory SVM: Support Vector Machines HOG: Histogram of Oriented Gradient ICDAR: International Conference on Document Analysis and Recognition RNN: Recurrent Neural Networks BPTT: Back Propagation Through Time FNN: Feedforward Neural Networks BLSTML: Bidirectional Long Short Term Memory layer OCR: Optical Character Recognition SIFT: Scale Invariant Feature Transform JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Call number Status Date due Barcode
Thesis Thesis Main library 610/ A.M.S 2015 (Browse shelf(Opens below)) Not for loan

Supervisor: Mohamed A. El-Helw

Thesis (M.A.)—Nile University, Egypt, 2015 .

"Includes bibliographical references"

Contents:
Abstract ........................................................................................................................................................................... vi
Keywords......................................................................................................................................................................... vi
List of Figures .................................................................................................................................................................. vii
Introduction .............................................................................................................................................. 1
1.1 Motivation ........................................................................................................................................................ 1
1.2 Objectives ......................................................................................................................................................... 1
Background ............................................................................................................................................... 3
2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4
2.2 Convolutional Neural Networks ........................................................................................................................ 4
2.2.1 Convolutional Layer .............................................................................................................................. 5
2.2.2 Pooling Layer ........................................................................................................................................ 5
2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6
2.3.1 Binary Classification .............................................................................................................................. 7
2.3.2 Multiclass Classification ........................................................................................................................ 7
2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10
2.4.1 Recurrent Neural Networks ................................................................................................................ 10
2.4.2 Constant Error Carrousels ................................................................................................................... 12
State‐of‐the‐Art Approaches .................................................................................................................... 14
3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14
3.2 Character Sequence Encoding ......................................................................................................................... 15
3.3 N‐gram Encoding ............................................................................................................................................ 15
3.4 The Joint Model .............................................................................................................................................. 16
3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17
Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18
4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18
4.2 Training ........................................................................................................................................................... 19
4.3 Lasagne ........................................................................................................................................................... 20
Experiments ............................................................................................................................................ 21
5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21
5.2 LSTM Architecture Experiments ...................................................................................................................... 23
5.2.1 The models’ architecture .................................................................................................................... 23
5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28
5.4 Generalisation Experiment .............................................................................................................................. 31
Traffic sense ............................................................................................................................................ 33
6.1 Extracting suitable corner points and their trajectories .................................................................................. 33
Acknowledgements
iv
6.2 Initial Clustering .............................................................................................................................................. 34
6.3 Adaptive Background Construction ................................................................................................................. 34
6.4 Extracting bounding boxes around vehicles .................................................................................................... 35
6.5 Experimental results and analysis.................................................................................................................... 35
Conclusion .............................................................................................................................................. 37
7.1 Achieved results ............................................................................................................................................. 37
7.2 Future development ....................................................................................................................................... 37
Appendix A ..................................................................................................................................................................... 38
Appendix B ..................................................................................................................................................................... 39
Datasets ......................................................................................................................................................................... 39
References ...................................................................................................................................................................... 41

In this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition.
We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture
followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters
and uses them with a softmax layer to detect sequence of characters. While the CNN gives very
good recognition results, it does not model relation between characters, hence gives rise to false positive
and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing
background patches with characters; either removing existing characters or adding non‐existing
ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode
contextual information. We show that the LSTM can dramatically reduce such errors and achieve
state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we
manually remove all occurrences of the words that exist in the test set from our training set to test
whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and
compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for
traffic monitoring.
Keywords
CNN: Convolutional Neural Networks
LSTM: Long Short Term Memory
SVM: Support Vector Machines
HOG: Histogram of Oriented Gradient
ICDAR: International Conference on Document Analysis and Recognition
RNN: Recurrent Neural Networks
BPTT: Back Propagation Through Time
FNN: Feedforward Neural Networks
BLSTML: Bidirectional Long Short Term Memory layer
OCR: Optical Character Recognition
SIFT: Scale Invariant Feature Transform
JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model
JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model

Text in English, abstracts in English .

There are no comments on this title.

to post a comment.