MARC View

000			10257nam a22002537a 4500
008			210830s2015 \|\|\|\|f mb\|\| 00\| 0 eng d
040			_aEG-CaNU _cEG-CaNU
041	0		_aeng _beng
082			_a610
100	0		_aAhmed Mamdouh Abd el-kariem _9559
245	1		_aSequence to Sequence Learning for Unconstrained Scene Text Recognition / _cAhmed Mamdouh Abd el-kariem
260			_c2015
300			_a49 p. _bill. _c21 cm.
500			_aSupervisor: Mohamed A. El-Helw
502			_aThesis (M.A.)—Nile University, Egypt, 2015 .
504			_a"Includes bibliographical references"
505	0		_aContents: Abstract ........................................................................................................................................................................... vi Keywords......................................................................................................................................................................... vi List of Figures .................................................................................................................................................................. vii Introduction .............................................................................................................................................. 1 1.1 Motivation ........................................................................................................................................................ 1 1.2 Objectives ......................................................................................................................................................... 1 Background ............................................................................................................................................... 3 2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4 2.2 Convolutional Neural Networks ........................................................................................................................ 4 2.2.1 Convolutional Layer .............................................................................................................................. 5 2.2.2 Pooling Layer ........................................................................................................................................ 5 2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6 2.3.1 Binary Classification .............................................................................................................................. 7 2.3.2 Multiclass Classification ........................................................................................................................ 7 2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10 2.4.1 Recurrent Neural Networks ................................................................................................................ 10 2.4.2 Constant Error Carrousels ................................................................................................................... 12 State‐of‐the‐Art Approaches .................................................................................................................... 14 3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14 3.2 Character Sequence Encoding ......................................................................................................................... 15 3.3 N‐gram Encoding ............................................................................................................................................ 15 3.4 The Joint Model .............................................................................................................................................. 16 3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17 Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18 4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18 4.2 Training ........................................................................................................................................................... 19 4.3 Lasagne ........................................................................................................................................................... 20 Experiments ............................................................................................................................................ 21 5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21 5.2 LSTM Architecture Experiments ...................................................................................................................... 23 5.2.1 The models’ architecture .................................................................................................................... 23 5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28 5.4 Generalisation Experiment .............................................................................................................................. 31 Traffic sense ............................................................................................................................................ 33 6.1 Extracting suitable corner points and their trajectories .................................................................................. 33 Acknowledgements iv 6.2 Initial Clustering .............................................................................................................................................. 34 6.3 Adaptive Background Construction ................................................................................................................. 34 6.4 Extracting bounding boxes around vehicles .................................................................................................... 35 6.5 Experimental results and analysis.................................................................................................................... 35 Conclusion .............................................................................................................................................. 37 7.1 Achieved results ............................................................................................................................................. 37 7.2 Future development ....................................................................................................................................... 37 Appendix A ..................................................................................................................................................................... 38 Appendix B ..................................................................................................................................................................... 39 Datasets ......................................................................................................................................................................... 39 References ...................................................................................................................................................................... 41
520	3		_aIn this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very good recognition results, it does not model relation between characters, hence gives rise to false positive and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing background patches with characters; either removing existing characters or adding non‐existing ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode contextual information. We show that the LSTM can dramatically reduce such errors and achieve state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we manually remove all occurrences of the words that exist in the test set from our training set to test whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for traffic monitoring. Keywords CNN: Convolutional Neural Networks LSTM: Long Short Term Memory SVM: Support Vector Machines HOG: Histogram of Oriented Gradient ICDAR: International Conference on Document Analysis and Recognition RNN: Recurrent Neural Networks BPTT: Back Propagation Through Time FNN: Feedforward Neural Networks BLSTML: Bidirectional Long Short Term Memory layer OCR: Optical Character Recognition SIFT: Scale Invariant Feature Transform JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model
546			_aText in English, abstracts in English .
650		4	_aInformatics-IFM _9266
655		7	_2NULIB _aDissertation, Academic _9187
690			_aInformatics-IFM _9266
942			_2ddc _cTH
999			_c9065 _d9065