MARC details
| 000 -LEADER |
| fixed length control field |
05835nam a22002537a 4500 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
| fixed length control field |
220426b2022 |||ad||| mb|| 00| 0 eng d |
| 040 ## - CATALOGING SOURCE |
| Original cataloging agency |
EG-CaNU |
| Transcribing agency |
EG-CaNU |
| 041 0# - Language Code |
| Language code of text |
eng |
| Language code of abstract |
eng |
| 082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER |
| Classification number |
610 |
| 100 0# - MAIN ENTRY--PERSONAL NAME |
| Personal name |
Mohammed Moustafa Mohamed Hassoubah |
| 245 1# - TITLE STATEMENT |
| Title |
Enhanced Transformer-based Deep Semantic Segmentation Architecture for Lidar 3D Point Clouds / |
| Statement of responsibility, etc. |
Mohammed Moustafa Mohamed Hassoubah |
| 260 ## - PUBLICATION, DISTRIBUTION, ETC. |
| Date of publication, distribution, etc. |
2022 |
| 300 ## - PHYSICAL DESCRIPTION |
| Extent |
94 p. |
| Other physical details |
ill. |
| Dimensions |
21 cm. |
| 500 ## - GENERAL NOTE |
| Materials specified |
Supervisor: Mohamed Elhelw |
| 502 ## - Dissertation Note |
| Dissertation type |
Thesis (M.A.)—Nile University, Egypt, 2022 . |
| 504 ## - Bibliography |
| Bibliography |
"Includes bibliographical references" |
| 505 0# - Contents |
| Formatted contents note |
Contents:<br/>Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv<br/>Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi<br/>List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix<br/>List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi<br/>Chapters:<br/>1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 2<br/>1.3 Thesis Outline and Summary of Contributions . . . . . . . . . 3<br/>2. Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1 KITTI Velodyne Dataset . . . . . . . . . . . . . . . . . . . . 5<br/>2.2 Point cloud segmentation . . . . . . . . . . . . . . . . . . . . 6<br/>2.3 Transformer Mechanism . . . . . . . . . . . . . . . . . . . . . 9<br/>2.4 Transformer applications for 3D point cloud . . . . . . . . . . 13<br/>2.5 Transformer and Attention mechanisms applications for 2D<br/>images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>2.6 Self-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 21<br/>2.7 Uncertainty estimation in Deep neural networks applications . 27<br/>3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br/>3.1 Spherical projection . . . . . . . . . . . . . . . . . . . . . . . 31<br/>3.2 Semantic Segmentation Network . . . . . . . . . . . . . . . . 32<br/>3.3 Self-Supervision pre-training . . . . . . . . . . . . . . . . . . . 35<br/>3.3.1 Data Augmentation and Corruption . . . . . . . . . . 36<br/>3.3.2 Pre-Training Tasks . . . . . . . . . . . . . . . . . . . . 36<br/>3.3.3 Noise-Contrastive estimation . . . . . . . . . . . . . . 38<br/>vii<br/>3.3.4 Learning Process Smoothness . . . . . . . . . . . . . . 40<br/>3.4 Estimating the Model Uncertainty . . . . . . . . . . . . . . . 41<br/>3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br/>4. Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . 44<br/>4.1 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 44<br/>4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br/>4.1.2 Evaluation metrics . . . . . . . . . . . . . . . . . . . . 45<br/>4.1.3 Training configuration . . . . . . . . . . . . . . . . . . 46<br/>4.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br/>4.1.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . 51<br/>4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br/>4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br/>5. Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . 67<br/>5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br/>5.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . 68<br/>Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 |
| 520 3# - Abstract |
| Abstract |
Abstract:<br/>For the task of semantic segmentation of 2D or 3D inputs, transformer<br/>architecture suffers limitations in the ability of localization because of lacking<br/>low-level details. Also, for the transformer to function well, it has to be<br/>pre-trained first. Still pre-training transformers is an open area of research.<br/>In this work, we introduce a novel architecture for semantic segmentation of<br/>3D point clouds generated from Light Detection and Ranging (LiDAR) sensors.<br/>A transformer is integrated into the U-Net 2D segmentation network<br/>[1] and the new architecture is trained to conduct semantic segmentation of<br/>2D spherical images generated from projecting 3D LiDAR point clouds. Such<br/>integration allows capturing the local and region level dependencies from CNN<br/>backbone processing of the input, followed by transformer processing to capture<br/>the long range dependencies. Obtained results demonstrate that the new<br/>architecture provides enhanced segmentation results over existing state-of-theart<br/>approaches. Furthermore, to define the best pre-training settings, multiple<br/>ablations have been conducted on network architecture, self-training loss function<br/>and self-training procedures. It’s proved that, the integrated architecture,<br/>pre-trained over the augmented version of the training dataset to reconstruct<br/>the original data from the corrupted input, while randomly initializing the<br/>batch normalization layers when fine-tuning, all these together outperforms<br/>the SalsaNext [2] (to our knowledge it’s the best projection based semantic<br/>segmentation network), where results are reported on the SemanticKITTI [3]<br/>iv<br/>validation dataset with 2D input dimension 1024 × 64. In our evaluation it<br/>is found that self-supervision pre-training much reduces the epistemic uncertainty<br/>(the model weights uncertainty ex.introducing new input example to<br/>the model different from those seen in training dataset would increase such<br/>uncertainty) of the output of the segmentation model. |
| 546 ## - Language Note |
| Language Note |
Text in English, abstracts in English. |
| 650 #4 - Subject |
| Subject |
Informatics-IFM |
| 655 #7 - Index Term-Genre/Form |
| Source of term |
NULIB |
| focus term |
Dissertation, Academic |
| 690 ## - Subject |
| School |
Informatics-IFM |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) |
| Source of classification or shelving scheme |
Dewey Decimal Classification |
| Koha item type |
Thesis |
| 650 #4 - Subject |
| -- |
266 |
| 655 #7 - Index Term-Genre/Form |
| -- |
187 |
| 690 ## - Subject |
| -- |
266 |