000 10257nam a22002537a 4500
008 210830s2015 ||||f mb|| 00| 0 eng d
040 _aEG-CaNU
_cEG-CaNU
041 0 _aeng
_beng
082 _a610
100 0 _aAhmed Mamdouh Abd el-kariem 
_9559
245 1 _aSequence to Sequence Learning for Unconstrained Scene Text Recognition /
_cAhmed Mamdouh Abd el-kariem 
260 _c2015
300 _a49 p.
_bill.
_c21 cm.
500 _aSupervisor: Mohamed A. El-Helw
502 _aThesis (M.A.)—Nile University, Egypt, 2015 .
504 _a"Includes bibliographical references"
505 0 _aContents: Abstract ........................................................................................................................................................................... vi Keywords......................................................................................................................................................................... vi List of Figures .................................................................................................................................................................. vii Introduction .............................................................................................................................................. 1 1.1 Motivation ........................................................................................................................................................ 1 1.2 Objectives ......................................................................................................................................................... 1 Background ............................................................................................................................................... 3 2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4 2.2 Convolutional Neural Networks ........................................................................................................................ 4 2.2.1 Convolutional Layer .............................................................................................................................. 5 2.2.2 Pooling Layer ........................................................................................................................................ 5 2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6 2.3.1 Binary Classification .............................................................................................................................. 7 2.3.2 Multiclass Classification ........................................................................................................................ 7 2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10 2.4.1 Recurrent Neural Networks ................................................................................................................ 10 2.4.2 Constant Error Carrousels ................................................................................................................... 12 State‐of‐the‐Art Approaches .................................................................................................................... 14 3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14 3.2 Character Sequence Encoding ......................................................................................................................... 15 3.3 N‐gram Encoding ............................................................................................................................................ 15 3.4 The Joint Model .............................................................................................................................................. 16 3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17 Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18 4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18 4.2 Training ........................................................................................................................................................... 19 4.3 Lasagne ........................................................................................................................................................... 20 Experiments ............................................................................................................................................ 21 5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21 5.2 LSTM Architecture Experiments ...................................................................................................................... 23 5.2.1 The models’ architecture .................................................................................................................... 23 5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28 5.4 Generalisation Experiment .............................................................................................................................. 31 Traffic sense ............................................................................................................................................ 33 6.1 Extracting suitable corner points and their trajectories .................................................................................. 33 Acknowledgements iv 6.2 Initial Clustering .............................................................................................................................................. 34 6.3 Adaptive Background Construction ................................................................................................................. 34 6.4 Extracting bounding boxes around vehicles .................................................................................................... 35 6.5 Experimental results and analysis.................................................................................................................... 35 Conclusion .............................................................................................................................................. 37 7.1 Achieved results ............................................................................................................................................. 37 7.2 Future development ....................................................................................................................................... 37 Appendix A ..................................................................................................................................................................... 38 Appendix B ..................................................................................................................................................................... 39 Datasets ......................................................................................................................................................................... 39 References ...................................................................................................................................................................... 41
520 3 _aIn this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very good recognition results, it does not model relation between characters, hence gives rise to false positive and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing background patches with characters; either removing existing characters or adding non‐existing ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode contextual information. We show that the LSTM can dramatically reduce such errors and achieve state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we manually remove all occurrences of the words that exist in the test set from our training set to test whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for traffic monitoring. Keywords CNN: Convolutional Neural Networks LSTM: Long Short Term Memory SVM: Support Vector Machines HOG: Histogram of Oriented Gradient ICDAR: International Conference on Document Analysis and Recognition RNN: Recurrent Neural Networks BPTT: Back Propagation Through Time FNN: Feedforward Neural Networks BLSTML: Bidirectional Long Short Term Memory layer OCR: Optical Character Recognition SIFT: Scale Invariant Feature Transform JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model
546 _aText in English, abstracts in English .
650 4 _aInformatics-IFM
_9266
655 7 _2NULIB
_aDissertation, Academic
_9187
690 _aInformatics-IFM
_9266
942 _2ddc
_cTH
999 _c9065
_d9065