Normal view MARC view ISBD view

Sequence to Sequence Learning for Unconstrained Scene Text Recognition / (Record no. 9065)

MARC details
000 -LEADER
fixed length control field	10257nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	210830s2015 \|\|\|\|f mb\|\| 00\| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency	EG-CaNU
Transcribing agency	EG-CaNU
041 0# - Language Code
Language code of text	eng
Language code of abstract	eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name	Ahmed Mamdouh Abd el-kariem
245 1# - TITLE STATEMENT
Title	Sequence to Sequence Learning for Unconstrained Scene Text Recognition /
Statement of responsibility, etc.	Ahmed Mamdouh Abd el-kariem
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc.	2015
300 ## - PHYSICAL DESCRIPTION
Extent	49 p.
Other physical details	ill.
Dimensions	21 cm.
500 ## - GENERAL NOTE
General note	Supervisor: Mohamed A. El-Helw
502 ## - Dissertation Note
Dissertation type	Thesis (M.A.)—Nile University, Egypt, 2015 .
504 ## - Bibliography
Bibliography	"Includes bibliographical references"
505 0# - Contents
Formatted contents note	Contents:<br/>Abstract ........................................................................................................................................................................... vi<br/>Keywords......................................................................................................................................................................... vi<br/>List of Figures .................................................................................................................................................................. vii<br/>Introduction .............................................................................................................................................. 1<br/>1.1 Motivation ........................................................................................................................................................ 1<br/>1.2 Objectives ......................................................................................................................................................... 1<br/>Background ............................................................................................................................................... 3<br/>2.1 Convolutional Neural Networks CNNs ............................................................................................................... 4<br/>2.2 Convolutional Neural Networks ........................................................................................................................ 4<br/>2.2.1 Convolutional Layer .............................................................................................................................. 5<br/>2.2.2 Pooling Layer ........................................................................................................................................ 5<br/>2.3 Logistic Regression ‐ Softmax ............................................................................................................................ 6<br/>2.3.1 Binary Classification .............................................................................................................................. 7<br/>2.3.2 Multiclass Classification ........................................................................................................................ 7<br/>2.4 Long Short‐Term Memory ‐ LSTM.................................................................................................................... 10<br/>2.4.1 Recurrent Neural Networks ................................................................................................................ 10<br/>2.4.2 Constant Error Carrousels ................................................................................................................... 12<br/>State‐of‐the‐Art Approaches .................................................................................................................... 14<br/>3.1 Lexicon‐Based CNN Model .............................................................................................................................. 14<br/>3.2 Character Sequence Encoding ......................................................................................................................... 15<br/>3.3 N‐gram Encoding ............................................................................................................................................ 15<br/>3.4 The Joint Model .............................................................................................................................................. 16<br/>3.5 Sequence‐to‐Sequence Learning with Neural Networks ................................................................................. 17<br/>Sequence‐to‐Sequence Learning for Unconstrained Scene Text Recognition ............................................ 18<br/>4.1 Arbitrary Length Sequence to Sequence Modeling ......................................................................................... 18<br/>4.2 Training ........................................................................................................................................................... 19<br/>4.3 Lasagne ........................................................................................................................................................... 20<br/>Experiments ............................................................................................................................................ 21<br/>5.1 Extending CNN Model with LSTM for error correction .................................................................................... 21<br/>5.2 LSTM Architecture Experiments ...................................................................................................................... 23<br/>5.2.1 The models’ architecture .................................................................................................................... 23<br/>5.3 Extending CNN Model with optimized LSTM Architecture for error correction .............................................. 28<br/>5.4 Generalisation Experiment .............................................................................................................................. 31<br/>Traffic sense ............................................................................................................................................ 33<br/>6.1 Extracting suitable corner points and their trajectories .................................................................................. 33<br/>Acknowledgements<br/>iv<br/>6.2 Initial Clustering .............................................................................................................................................. 34<br/>6.3 Adaptive Background Construction ................................................................................................................. 34<br/>6.4 Extracting bounding boxes around vehicles .................................................................................................... 35<br/>6.5 Experimental results and analysis.................................................................................................................... 35<br/>Conclusion .............................................................................................................................................. 37<br/>7.1 Achieved results ............................................................................................................................................. 37<br/>7.2 Future development ....................................................................................................................................... 37<br/>Appendix A ..................................................................................................................................................................... 38<br/>Appendix B ..................................................................................................................................................................... 39<br/>Datasets ......................................................................................................................................................................... 39<br/>References ...................................................................................................................................................................... 41
520 3# - Abstract
Abstract	In this work we present a state‐of‐the‐art approach for unconstrained natural scene text recognition.<br/>We propose a cascade approach that incorporates a convolutional neural network 􁈺CNN􁈻 architecture<br/>followed by a long short term memory model 􁈺LSTM􁈻. The CNN learns visual features for the characters<br/>and uses them with a softmax layer to detect sequence of characters. While the CNN gives very<br/>good recognition results, it does not model relation between characters, hence gives rise to false positive<br/>and false negative cases 􁈺confusing characters due to visual similarities like “g” and “9”, or confusing<br/>background patches with characters; either removing existing characters or adding non‐existing<br/>ones􁈻 To alleviate these problems we leverage recent developments in LSTM architectures to encode<br/>contextual information. We show that the LSTM can dramatically reduce such errors and achieve<br/>state‐of‐the‐art accuracy in the task of unconstrained natural scene text recognition. Moreover we<br/>manually remove all occurrences of the words that exist in the test set from our training set to test<br/>whether our approach will generalize to unseen data. We use the ICDAR 13 test set for evaluation and<br/>compare the results with the state of the art approaches 􁈾11, 18􁈿. We also present an algorithm for<br/>traffic monitoring.<br/>Keywords<br/>CNN: Convolutional Neural Networks<br/>LSTM: Long Short Term Memory<br/>SVM: Support Vector Machines<br/>HOG: Histogram of Oriented Gradient<br/>ICDAR: International Conference on Document Analysis and Recognition<br/>RNN: Recurrent Neural Networks<br/>BPTT: Back Propagation Through Time<br/>FNN: Feedforward Neural Networks<br/>BLSTML: Bidirectional Long Short Term Memory layer<br/>OCR: Optical Character Recognition<br/>SIFT: Scale Invariant Feature Transform<br/>JOINT‐CNN: A model that joins the character sequence encoding model with the n‐gram model<br/>JOINT‐LSTM: A model that joins the output of our proposed model with the n‐gram model
546 ## - Language Note
Language Note	Text in English, abstracts in English .
650 #4 - Subject
Subject	Informatics-IFM
655 #7 - Index Term-Genre/Form
Source of term	NULIB
focus term	Dissertation, Academic
690 ## - Subject
School	Informatics-IFM
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
650 #4 - Subject
--	266
655 #7 - Index Term-Genre/Form
--	187
690 ## - Subject
--	266

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Home library	Current library	Date acquired	Total Checkouts	Full call number	Date last seen	Price effective from	Koha item type
		Dewey Decimal Classification			Main library	Main library	08/30/2021		610/ A.M.S 2015	08/30/2021	08/30/2021	Thesis