MARC details
| 000 -LEADER |
| fixed length control field |
08157nam a22002657a 4500 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
| fixed length control field |
201210s2024 a|||f bm|| 00| 0 eng d |
| 024 7# - Author Identifier |
| Standard number or code |
0009-0001-2561-9884 |
| Source of number or code |
ORCID |
| 040 ## - CATALOGING SOURCE |
| Original cataloging agency |
EG-CaNU |
| Transcribing agency |
EG-CaNU |
| 041 0# - Language Code |
| Language code of text |
eng |
| Language code of abstract |
eng |
| -- |
ara |
| 082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER |
| Classification number |
621 |
| 100 0# - MAIN ENTRY--PERSONAL NAME |
| Personal name |
Mahmoud Kamel Ismail Hasabrabou |
| 245 1# - TITLE STATEMENT |
| Title |
Novel Edge AI with Power-Efficient Re-configurable MAC Processing Elements |
| Statement of responsibility, etc. |
/Mahmoud Kamel Ismail Hasabrabou |
| 260 ## - PUBLICATION, DISTRIBUTION, ETC. |
| Date of publication, distribution, etc. |
2024 |
| 300 ## - PHYSICAL DESCRIPTION |
| Extent |
80p. |
| Other physical details |
ill. |
| Dimensions |
21 cm. |
| 500 ## - GENERAL NOTE |
| Materials specified |
Supervisor: Prof. Dr. Ahmed H. Madian |
| 502 ## - Dissertation Note |
| Dissertation type |
Thesis (M.A.)—Nile University, Egypt, 2024 . |
| 504 ## - Bibliography |
| Bibliography |
"Includes bibliographical references" |
| 505 0# - Contents |
| Formatted contents note |
Contents:<br/>Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br/>List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br/>Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br/>Chapters:<br/>1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.3 Research Organization . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>2. Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1 Background and Related Work Survey . . . . . . . . . . . . . . . . 5<br/>2.1.1 Common Strategies to Decrease Power in AI Edge devices . 9<br/>2.1.2 Neural Networks architecture optimizing techniques . . . . . 11<br/>2.1.3 Lightweight Networks . . . . . . . . . . . . . . . . . . . . . 15<br/>2.1.4 Neural network with reduced precision (int8) . . . . . . . . 17<br/>2.1.5 State-of-the-Art MAC Architectures and Vector Multiplications 17<br/>2.1.6 Research examples of MAC architectures and Vector Multiplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br/>6<br/>3. LP-MAC Algorithm & Implementation . . . . . . . . . . . . . . . . . . 25<br/>3.1 Algorithm General Idea Introduction . . . . . . . . . . . . . . . . . 25<br/>3.1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.1.3 Binary Multiplication Example . . . . . . . . . . . . . . . . 26<br/>3.1.4 Algorithm for Binary Multiplication with Bitsnum Bits . . 26<br/>3.1.5 Binary Multiplication of Vectors Example . . . . . . . . . . 26<br/>3.1.6 Binary Multiplication of One Vector with Two Vectors Example 27<br/>3.1.7 Step-by-Step Multiplication . . . . . . . . . . . . . . . . . . 28<br/>3.1.8 Algorithm for Binary Vector Multiplication with N Elements 29<br/>3.1.9 Optimization opportunities in Binary Vector Multiplication 29<br/>3.2 Low Power MAC Novel Algorithm . . . . . . . . . . . . . . . . . . 32<br/>3.2.1 LP-MAC Algorithm Overview . . . . . . . . . . . . . . . . . 32<br/>3.2.2 LP-MAC algorithm example . . . . . . . . . . . . . . . . . . 33<br/>3.2.3 Detailed example for LP-MAC algorithm . . . . . . . . . . . 35<br/>3.3 Applications in Neural Networks . . . . . . . . . . . . . . . . . . . 38<br/>3.3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 38<br/>3.3.2 Steps to Optimize Multiplications Using Partial Products . 38<br/>3.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br/>3.3.4 General Algorithm in Pseudocode for high vector length . . 39<br/>3.4 LP-MAC Vector Multiplication Generic Implementation . . . . . . 40<br/>3.4.1 Special Case if Vector Inputs are signed . . . . . . . . . . . 40<br/>3.5 Algorithm Working Conditions & Limitations . . . . . . . . . . . . 42<br/>3.5.1 Power Cost of Vector Multiplications in Normal MAC and<br/>LP-MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br/>3.5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br/>4. Architecture for common Neural networks using LP-MAC . . . . . . . . 45<br/>4.1 Low Power MAC Deployment in Fully Connected Layers . . . . . . 46<br/>4.2 Math operations model for CONV Layer . . . . . . . . . . . . . . 50<br/>4.3 Convolutional Layer with LP-MAC . . . . . . . . . . . . . . . . . . 51<br/>4.4 Low Power MAC Deployment in Attention Networks . . . . . . . . 53<br/>5. Low Power MAC Co-processor for Embedded Devices with AHB interface 55<br/>5.1 DNN AHB Co-processors for Embedded Systems . . . . . . . . . . 55<br/>5.2 LP-MAC Co-processor with Tunable Activation Function . . . . . . 57<br/>5.2.1 LP-MAC Array with AHB Interface . . . . . . . . . . . . . 57<br/>5.2.2 Tunable Activation Function . . . . . . . . . . . . . . . . . 58<br/>7<br/>5.2.3 Tunable ReLU Function . . . . . . . . . . . . . . . . . . . . 59<br/>5.2.4 Tunable ReLU Implementation . . . . . . . . . . . . . . . . 60<br/>5.2.5 Novel Fixed-Point Implementation for Power Function . . . 61<br/>6. Power Comparison & Verification Results . . . . . . . . . . . . . . . . . . 65<br/>6.1 Example Implementation and Simulation of an Eyeriss CNN, Both<br/>with the Inclusion of Low Power MAC and Without It . . . . . . . 66<br/>6.2 Simulations and Verification Results . . . . . . . . . . . . . . . . . 68<br/>7. Future Work and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>7.1 Working Conditions and Limitations . . . . . . . . . . . . . . . . . 73<br/>7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br/>References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br/>Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br/>Chapters:<br/>A. RTL Verilog Code for LP-MAC . . . . . . . . . . . . . . . . . . . . . . . 1<br/>B. Synthesis scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 |
| 520 3# - Abstract |
| Abstract |
Abstract:<br/>Deep learning has gained importance in multiple fields, including robotics, speech<br/>recognition, and image processing. Nevertheless, deploying deep learning models on<br/>edge and embedded devices with limited power and area resources can be difficult<br/>due to their high computational demands. This thesis introduces a new optimized<br/>power approach for deployment deep learning networks on edge devices, named LPMAC (Low Power Multiply Accumulate). Low power MAC is a low-power technique<br/>introduced to target fixed-point format numbers and optimized for reusing vectors of<br/>inputs for Multiply-Accumulate (MAC) operations. Unlike conventional MAC units<br/>that use multipliers, LP-MAC uses only adders, shifters, and multiplexers, which consume less power. This results in over 30% power reduction, making LP-MAC more<br/>power-efficient. LP-MAC also offers efficient dynamically precision control of MAC<br/>operations, which results in low latency and real-time performance. The characteristics of LP-MAC position it as a highly suitable option for deploying deep learning<br/>architectures on edge devices that have limited power resources. This is especially<br/>true for models such as Convolutional Neural Networks (CNNs), Fully Connected<br/>networks, and networks that utilize Transformer-based attention mechanisms. The<br/>dissertation outlines the hardware realizations for these types of networks and proposes a systematic process for transitioning current networks to leverage LP-MAC<br/>over the traditional MAC units. Furthermore, the thesis details the creation of an<br/>15<br/>AHB-SLAVE co-processor equipped with an LP-MAC array, specifically tailored for<br/>embedded system applications.<br/>Keywords— Artificial intelligence, Hardware accelerators, Machine learning, Deep learning, Neural networks, High performance computing, FPGA, ASIC, GPU, Edge computing,<br/>Cloud computing, Energy efficiency, Performance optimization, System architecture,Parallel<br/>processing, Training and inference, Model compression, Quantization, Sparsity, Pruning,<br/>Multiply Accumulate |
| 546 ## - Language Note |
| Language Note |
Text in English, abstracts in English and Arabic |
| 650 #4 - Subject |
| Subject |
MSD |
| 655 #7 - Index Term-Genre/Form |
| Source of term |
NULIB |
| focus term |
Dissertation, Academic |
| 690 ## - Subject |
| School |
MSD |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) |
| Source of classification or shelving scheme |
Dewey Decimal Classification |
| Koha item type |
Thesis |
| 650 #4 - Subject |
| -- |
317 |
| 655 #7 - Index Term-Genre/Form |
| -- |
187 |
| 690 ## - Subject |
| -- |
317 |