Enhanced Transformer-based Deep Semantic Segmentation Architecture for Lidar 3D Point Clouds / Mohammed Moustafa Mohamed Hassoubah

By:

Mohammed Moustafa Mohamed Hassoubah

Material type: Text

TextLanguage: English Summary language: English Publication details: 2022Description: 94 p. ill. 21 cmSubject(s):

Genre/Form:

Dissertation, Academic

DDC classification:

Contents:

Contents: Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapters: 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Outline and Summary of Contributions . . . . . . . . . 3 2. Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 KITTI Velodyne Dataset . . . . . . . . . . . . . . . . . . . . 5 2.2 Point cloud segmentation . . . . . . . . . . . . . . . . . . . . 6 2.3 Transformer Mechanism . . . . . . . . . . . . . . . . . . . . . 9 2.4 Transformer applications for 3D point cloud . . . . . . . . . . 13 2.5 Transformer and Attention mechanisms applications for 2D images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Self-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 21 2.7 Uncertainty estimation in Deep neural networks applications . 27 3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1 Spherical projection . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Semantic Segmentation Network . . . . . . . . . . . . . . . . 32 3.3 Self-Supervision pre-training . . . . . . . . . . . . . . . . . . . 35 3.3.1 Data Augmentation and Corruption . . . . . . . . . . 36 3.3.2 Pre-Training Tasks . . . . . . . . . . . . . . . . . . . . 36 3.3.3 Noise-Contrastive estimation . . . . . . . . . . . . . . 38 vii 3.3.4 Learning Process Smoothness . . . . . . . . . . . . . . 40 3.4 Estimating the Model Uncertainty . . . . . . . . . . . . . . . 41 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4. Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 44 4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Evaluation metrics . . . . . . . . . . . . . . . . . . . . 45 4.1.3 Training configuration . . . . . . . . . . . . . . . . . . 46 4.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . 51 4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5. Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . 67 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . 68 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Dissertation note: Thesis (M.A.)—Nile University, Egypt, 2022 . Abstract: Abstract: For the task of semantic segmentation of 2D or 3D inputs, transformer architecture suffers limitations in the ability of localization because of lacking low-level details. Also, for the transformer to function well, it has to be pre-trained first. Still pre-training transformers is an open area of research. In this work, we introduce a novel architecture for semantic segmentation of 3D point clouds generated from Light Detection and Ranging (LiDAR) sensors. A transformer is integrated into the U-Net 2D segmentation network [1] and the new architecture is trained to conduct semantic segmentation of 2D spherical images generated from projecting 3D LiDAR point clouds. Such integration allows capturing the local and region level dependencies from CNN backbone processing of the input, followed by transformer processing to capture the long range dependencies. Obtained results demonstrate that the new architecture provides enhanced segmentation results over existing state-of-theart approaches. Furthermore, to define the best pre-training settings, multiple ablations have been conducted on network architecture, self-training loss function and self-training procedures. It’s proved that, the integrated architecture, pre-trained over the augmented version of the training dataset to reconstruct the original data from the corrupted input, while randomly initializing the batch normalization layers when fine-tuning, all these together outperforms the SalsaNext [2] (to our knowledge it’s the best projection based semantic segmentation network), where results are reported on the SemanticKITTI [3] iv validation dataset with 2D input dimension 1024 × 64. In our evaluation it is found that self-supervision pre-training much reduces the epistemic uncertainty (the model weights uncertainty ex.introducing new input example to the model different from those seen in training dataset would increase such uncertainty) of the output of the segmentation model.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Date due	Barcode
Thesis	Main library	610/ M.M.E/ 2022 (Browse shelf(Opens below))	Not for loan

Browsing Main library shelves Close shelf browser (Hides shelf browser)

Previous	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	Next
Previous	610 / M.I.N 2011 NileStore : Secure and Fault-Tolerant Distributed Storage System /	610/ M.K.D / 2022 A Data Mining Strategy For Predictive Maintenance In The Operations Industry/	610 / M.K.I2010 Interaction Segmentation of Cardiac Magnetc Resonance Images /	610/ M.M.E/ 2022 Enhanced Transformer-based Deep Semantic Segmentation Architecture for Lidar 3D Point Clouds /	610/ M.M.F/ 2022 From Biological Data to Biomarker discovery and Precision Medicine/	610/ M.M.L/ 2019 Learning Meters of Arabic Poems with Deep Learning/	610/ M.N.A / 2022 Anomaly Detection in LTE Mobile Network /	Next

Supervisor: Mohamed Elhelw

Thesis (M.A.)—Nile University, Egypt, 2022 .

"Includes bibliographical references"

Contents:
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapters:
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Outline and Summary of Contributions . . . . . . . . . 3
2. Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 KITTI Velodyne Dataset . . . . . . . . . . . . . . . . . . . . 5
2.2 Point cloud segmentation . . . . . . . . . . . . . . . . . . . . 6
2.3 Transformer Mechanism . . . . . . . . . . . . . . . . . . . . . 9
2.4 Transformer applications for 3D point cloud . . . . . . . . . . 13
2.5 Transformer and Attention mechanisms applications for 2D
images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Self-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 21
2.7 Uncertainty estimation in Deep neural networks applications . 27
3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 Spherical projection . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Semantic Segmentation Network . . . . . . . . . . . . . . . . 32
3.3 Self-Supervision pre-training . . . . . . . . . . . . . . . . . . . 35
3.3.1 Data Augmentation and Corruption . . . . . . . . . . 36
3.3.2 Pre-Training Tasks . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Noise-Contrastive estimation . . . . . . . . . . . . . . 38
vii
3.3.4 Learning Process Smoothness . . . . . . . . . . . . . . 40
3.4 Estimating the Model Uncertainty . . . . . . . . . . . . . . . 41
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4. Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 44
4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Evaluation metrics . . . . . . . . . . . . . . . . . . . . 45
4.1.3 Training configuration . . . . . . . . . . . . . . . . . . 46
4.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . 51
4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5. Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . 67
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . 68
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Abstract:
For the task of semantic segmentation of 2D or 3D inputs, transformer
architecture suffers limitations in the ability of localization because of lacking
low-level details. Also, for the transformer to function well, it has to be
pre-trained first. Still pre-training transformers is an open area of research.
In this work, we introduce a novel architecture for semantic segmentation of
3D point clouds generated from Light Detection and Ranging (LiDAR) sensors.
A transformer is integrated into the U-Net 2D segmentation network
[1] and the new architecture is trained to conduct semantic segmentation of
2D spherical images generated from projecting 3D LiDAR point clouds. Such
integration allows capturing the local and region level dependencies from CNN
backbone processing of the input, followed by transformer processing to capture
the long range dependencies. Obtained results demonstrate that the new
architecture provides enhanced segmentation results over existing state-of-theart
approaches. Furthermore, to define the best pre-training settings, multiple
ablations have been conducted on network architecture, self-training loss function
and self-training procedures. It’s proved that, the integrated architecture,
pre-trained over the augmented version of the training dataset to reconstruct
the original data from the corrupted input, while randomly initializing the
batch normalization layers when fine-tuning, all these together outperforms
the SalsaNext [2] (to our knowledge it’s the best projection based semantic
segmentation network), where results are reported on the SemanticKITTI [3]
iv
validation dataset with 2D input dimension 1024 × 64. In our evaluation it
is found that self-supervision pre-training much reduces the epistemic uncertainty
(the model weights uncertainty ex.introducing new input example to
the model different from those seen in training dataset would increase such
uncertainty) of the output of the segmentation model.

Text in English, abstracts in English.

There are no comments on this title.

to post a comment.