Novel Edge AI with Power-Efficient Re-configurable MAC Processing Elements (Record no. 10892)

MARC details
000 -LEADER
fixed length control field 08157nam a22002657a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 201210s2024 a|||f bm|| 00| 0 eng d
024 7# - Author Identifier
Standard number or code 0009-0001-2561-9884
Source of number or code ORCID
040 ## - CATALOGING SOURCE
Original cataloging agency EG-CaNU
Transcribing agency EG-CaNU
041 0# - Language Code
Language code of text eng
Language code of abstract eng
-- ara
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 621
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name Mahmoud Kamel Ismail Hasabrabou
245 1# - TITLE STATEMENT
Title Novel Edge AI with Power-Efficient Re-configurable MAC Processing Elements
Statement of responsibility, etc. /Mahmoud Kamel Ismail Hasabrabou
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2024
300 ## - PHYSICAL DESCRIPTION
Extent 80p.
Other physical details ill.
Dimensions 21 cm.
500 ## - GENERAL NOTE
Materials specified Supervisor: Prof. Dr. Ahmed H. Madian
502 ## - Dissertation Note
Dissertation type Thesis (M.A.)—Nile University, Egypt, 2024 .
504 ## - Bibliography
Bibliography "Includes bibliographical references"
505 0# - Contents
Formatted contents note Contents:<br/>Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br/>List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br/>Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br/>Chapters:<br/>1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.3 Research Organization . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>2. Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1 Background and Related Work Survey . . . . . . . . . . . . . . . . 5<br/>2.1.1 Common Strategies to Decrease Power in AI Edge devices . 9<br/>2.1.2 Neural Networks architecture optimizing techniques . . . . . 11<br/>2.1.3 Lightweight Networks . . . . . . . . . . . . . . . . . . . . . 15<br/>2.1.4 Neural network with reduced precision (int8) . . . . . . . . 17<br/>2.1.5 State-of-the-Art MAC Architectures and Vector Multiplications 17<br/>2.1.6 Research examples of MAC architectures and Vector Multiplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br/>6<br/>3. LP-MAC Algorithm & Implementation . . . . . . . . . . . . . . . . . . 25<br/>3.1 Algorithm General Idea Introduction . . . . . . . . . . . . . . . . . 25<br/>3.1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.1.3 Binary Multiplication Example . . . . . . . . . . . . . . . . 26<br/>3.1.4 Algorithm for Binary Multiplication with Bitsnum Bits . . 26<br/>3.1.5 Binary Multiplication of Vectors Example . . . . . . . . . . 26<br/>3.1.6 Binary Multiplication of One Vector with Two Vectors Example 27<br/>3.1.7 Step-by-Step Multiplication . . . . . . . . . . . . . . . . . . 28<br/>3.1.8 Algorithm for Binary Vector Multiplication with N Elements 29<br/>3.1.9 Optimization opportunities in Binary Vector Multiplication 29<br/>3.2 Low Power MAC Novel Algorithm . . . . . . . . . . . . . . . . . . 32<br/>3.2.1 LP-MAC Algorithm Overview . . . . . . . . . . . . . . . . . 32<br/>3.2.2 LP-MAC algorithm example . . . . . . . . . . . . . . . . . . 33<br/>3.2.3 Detailed example for LP-MAC algorithm . . . . . . . . . . . 35<br/>3.3 Applications in Neural Networks . . . . . . . . . . . . . . . . . . . 38<br/>3.3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 38<br/>3.3.2 Steps to Optimize Multiplications Using Partial Products . 38<br/>3.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br/>3.3.4 General Algorithm in Pseudocode for high vector length . . 39<br/>3.4 LP-MAC Vector Multiplication Generic Implementation . . . . . . 40<br/>3.4.1 Special Case if Vector Inputs are signed . . . . . . . . . . . 40<br/>3.5 Algorithm Working Conditions & Limitations . . . . . . . . . . . . 42<br/>3.5.1 Power Cost of Vector Multiplications in Normal MAC and<br/>LP-MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br/>3.5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br/>4. Architecture for common Neural networks using LP-MAC . . . . . . . . 45<br/>4.1 Low Power MAC Deployment in Fully Connected Layers . . . . . . 46<br/>4.2 Math operations model for CONV Layer . . . . . . . . . . . . . . 50<br/>4.3 Convolutional Layer with LP-MAC . . . . . . . . . . . . . . . . . . 51<br/>4.4 Low Power MAC Deployment in Attention Networks . . . . . . . . 53<br/>5. Low Power MAC Co-processor for Embedded Devices with AHB interface 55<br/>5.1 DNN AHB Co-processors for Embedded Systems . . . . . . . . . . 55<br/>5.2 LP-MAC Co-processor with Tunable Activation Function . . . . . . 57<br/>5.2.1 LP-MAC Array with AHB Interface . . . . . . . . . . . . . 57<br/>5.2.2 Tunable Activation Function . . . . . . . . . . . . . . . . . 58<br/>7<br/>5.2.3 Tunable ReLU Function . . . . . . . . . . . . . . . . . . . . 59<br/>5.2.4 Tunable ReLU Implementation . . . . . . . . . . . . . . . . 60<br/>5.2.5 Novel Fixed-Point Implementation for Power Function . . . 61<br/>6. Power Comparison & Verification Results . . . . . . . . . . . . . . . . . . 65<br/>6.1 Example Implementation and Simulation of an Eyeriss CNN, Both<br/>with the Inclusion of Low Power MAC and Without It . . . . . . . 66<br/>6.2 Simulations and Verification Results . . . . . . . . . . . . . . . . . 68<br/>7. Future Work and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>7.1 Working Conditions and Limitations . . . . . . . . . . . . . . . . . 73<br/>7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br/>References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br/>Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br/>Chapters:<br/>A. RTL Verilog Code for LP-MAC . . . . . . . . . . . . . . . . . . . . . . . 1<br/>B. Synthesis scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
520 3# - Abstract
Abstract Abstract:<br/>Deep learning has gained importance in multiple fields, including robotics, speech<br/>recognition, and image processing. Nevertheless, deploying deep learning models on<br/>edge and embedded devices with limited power and area resources can be difficult<br/>due to their high computational demands. This thesis introduces a new optimized<br/>power approach for deployment deep learning networks on edge devices, named LPMAC (Low Power Multiply Accumulate). Low power MAC is a low-power technique<br/>introduced to target fixed-point format numbers and optimized for reusing vectors of<br/>inputs for Multiply-Accumulate (MAC) operations. Unlike conventional MAC units<br/>that use multipliers, LP-MAC uses only adders, shifters, and multiplexers, which consume less power. This results in over 30% power reduction, making LP-MAC more<br/>power-efficient. LP-MAC also offers efficient dynamically precision control of MAC<br/>operations, which results in low latency and real-time performance. The characteristics of LP-MAC position it as a highly suitable option for deploying deep learning<br/>architectures on edge devices that have limited power resources. This is especially<br/>true for models such as Convolutional Neural Networks (CNNs), Fully Connected<br/>networks, and networks that utilize Transformer-based attention mechanisms. The<br/>dissertation outlines the hardware realizations for these types of networks and proposes a systematic process for transitioning current networks to leverage LP-MAC<br/>over the traditional MAC units. Furthermore, the thesis details the creation of an<br/>15<br/>AHB-SLAVE co-processor equipped with an LP-MAC array, specifically tailored for<br/>embedded system applications.<br/>Keywords— Artificial intelligence, Hardware accelerators, Machine learning, Deep learning, Neural networks, High performance computing, FPGA, ASIC, GPU, Edge computing,<br/>Cloud computing, Energy efficiency, Performance optimization, System architecture,Parallel<br/>processing, Training and inference, Model compression, Quantization, Sparsity, Pruning,<br/>Multiply Accumulate
546 ## - Language Note
Language Note Text in English, abstracts in English and Arabic
650 #4 - Subject
Subject MSD
655 #7 - Index Term-Genre/Form
Source of term NULIB
focus term Dissertation, Academic
690 ## - Subject
School MSD
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
650 #4 - Subject
-- 317
655 #7 - Index Term-Genre/Form
-- 187
690 ## - Subject
-- 317
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Date acquired Total Checkouts Full call number Date last seen Price effective from Koha item type
    Dewey Decimal Classification     Main library Main library 08/27/2024   621/M.H.N/ 2024 08/27/2024 08/27/2024 Thesis