Normal view MARC view ISBD view

Enhanced Transformer-based Deep Semantic Segmentation Architecture for Lidar 3D Point Clouds / (Record no. 9580)

MARC details
000 -LEADER
fixed length control field	05835nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	220426b2022 \|\|\|ad\|\|\| mb\|\| 00\| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency	EG-CaNU
Transcribing agency	EG-CaNU
041 0# - Language Code
Language code of text	eng
Language code of abstract	eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name	Mohammed Moustafa Mohamed Hassoubah
245 1# - TITLE STATEMENT
Title	Enhanced Transformer-based Deep Semantic Segmentation Architecture for Lidar 3D Point Clouds /
Statement of responsibility, etc.	Mohammed Moustafa Mohamed Hassoubah
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc.	2022
300 ## - PHYSICAL DESCRIPTION
Extent	94 p.
Other physical details	ill.
Dimensions	21 cm.
500 ## - GENERAL NOTE
Materials specified	Supervisor: Mohamed Elhelw
502 ## - Dissertation Note
Dissertation type	Thesis (M.A.)—Nile University, Egypt, 2022 .
504 ## - Bibliography
Bibliography	"Includes bibliographical references"
505 0# - Contents
Formatted contents note	Contents:<br/>Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv<br/>Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi<br/>List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix<br/>List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi<br/>Chapters:<br/>1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 2<br/>1.3 Thesis Outline and Summary of Contributions . . . . . . . . . 3<br/>2. Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1 KITTI Velodyne Dataset . . . . . . . . . . . . . . . . . . . . 5<br/>2.2 Point cloud segmentation . . . . . . . . . . . . . . . . . . . . 6<br/>2.3 Transformer Mechanism . . . . . . . . . . . . . . . . . . . . . 9<br/>2.4 Transformer applications for 3D point cloud . . . . . . . . . . 13<br/>2.5 Transformer and Attention mechanisms applications for 2D<br/>images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>2.6 Self-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 21<br/>2.7 Uncertainty estimation in Deep neural networks applications . 27<br/>3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br/>3.1 Spherical projection . . . . . . . . . . . . . . . . . . . . . . . 31<br/>3.2 Semantic Segmentation Network . . . . . . . . . . . . . . . . 32<br/>3.3 Self-Supervision pre-training . . . . . . . . . . . . . . . . . . . 35<br/>3.3.1 Data Augmentation and Corruption . . . . . . . . . . 36<br/>3.3.2 Pre-Training Tasks . . . . . . . . . . . . . . . . . . . . 36<br/>3.3.3 Noise-Contrastive estimation . . . . . . . . . . . . . . 38<br/>vii<br/>3.3.4 Learning Process Smoothness . . . . . . . . . . . . . . 40<br/>3.4 Estimating the Model Uncertainty . . . . . . . . . . . . . . . 41<br/>3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br/>4. Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . 44<br/>4.1 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 44<br/>4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br/>4.1.2 Evaluation metrics . . . . . . . . . . . . . . . . . . . . 45<br/>4.1.3 Training configuration . . . . . . . . . . . . . . . . . . 46<br/>4.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br/>4.1.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . 51<br/>4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br/>4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br/>5. Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . 67<br/>5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br/>5.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . 68<br/>Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
520 3# - Abstract
Abstract	Abstract:<br/>For the task of semantic segmentation of 2D or 3D inputs, transformer<br/>architecture suffers limitations in the ability of localization because of lacking<br/>low-level details. Also, for the transformer to function well, it has to be<br/>pre-trained first. Still pre-training transformers is an open area of research.<br/>In this work, we introduce a novel architecture for semantic segmentation of<br/>3D point clouds generated from Light Detection and Ranging (LiDAR) sensors.<br/>A transformer is integrated into the U-Net 2D segmentation network<br/>[1] and the new architecture is trained to conduct semantic segmentation of<br/>2D spherical images generated from projecting 3D LiDAR point clouds. Such<br/>integration allows capturing the local and region level dependencies from CNN<br/>backbone processing of the input, followed by transformer processing to capture<br/>the long range dependencies. Obtained results demonstrate that the new<br/>architecture provides enhanced segmentation results over existing state-of-theart<br/>approaches. Furthermore, to define the best pre-training settings, multiple<br/>ablations have been conducted on network architecture, self-training loss function<br/>and self-training procedures. It’s proved that, the integrated architecture,<br/>pre-trained over the augmented version of the training dataset to reconstruct<br/>the original data from the corrupted input, while randomly initializing the<br/>batch normalization layers when fine-tuning, all these together outperforms<br/>the SalsaNext [2] (to our knowledge it’s the best projection based semantic<br/>segmentation network), where results are reported on the SemanticKITTI [3]<br/>iv<br/>validation dataset with 2D input dimension 1024 × 64. In our evaluation it<br/>is found that self-supervision pre-training much reduces the epistemic uncertainty<br/>(the model weights uncertainty ex.introducing new input example to<br/>the model different from those seen in training dataset would increase such<br/>uncertainty) of the output of the segmentation model.
546 ## - Language Note
Language Note	Text in English, abstracts in English.
650 #4 - Subject
Subject	Informatics-IFM
655 #7 - Index Term-Genre/Form
Source of term	NULIB
focus term	Dissertation, Academic
690 ## - Subject
School	Informatics-IFM
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
650 #4 - Subject
--	266
655 #7 - Index Term-Genre/Form
--	187
690 ## - Subject
--	266

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Home library	Current library	Date acquired	Total Checkouts	Full call number	Date last seen	Price effective from	Koha item type
		Dewey Decimal Classification			Main library	Main library	04/26/2022		610/ M.M.E/ 2022	04/26/2022	04/26/2022	Thesis