Normal view MARC view ISBD view

Novel Edge AI with Power-Efficient Re-configurable MAC Processing Elements (Record no. 10892)

MARC details
000 -LEADER
fixed length control field	08157nam a22002657a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	201210s2024 a\|\|\|f bm\|\| 00\| 0 eng d
024 7# - Author Identifier
Standard number or code	0009-0001-2561-9884
Source of number or code	ORCID
040 ## - CATALOGING SOURCE
Original cataloging agency	EG-CaNU
Transcribing agency	EG-CaNU
041 0# - Language Code
Language code of text	eng
Language code of abstract	eng
--	ara
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	621
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name	Mahmoud Kamel Ismail Hasabrabou
245 1# - TITLE STATEMENT
Title	Novel Edge AI with Power-Efficient Re-configurable MAC Processing Elements
Statement of responsibility, etc.	/Mahmoud Kamel Ismail Hasabrabou
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc.	2024
300 ## - PHYSICAL DESCRIPTION
Extent	80p.
Other physical details	ill.
Dimensions	21 cm.
500 ## - GENERAL NOTE
Materials specified	Supervisor: Prof. Dr. Ahmed H. Madian
502 ## - Dissertation Note
Dissertation type	Thesis (M.A.)—Nile University, Egypt, 2024 .
504 ## - Bibliography
Bibliography	"Includes bibliographical references"
505 0# - Contents
Formatted contents note	Contents:<br/>Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br/>List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br/>Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br/>Chapters:<br/>1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.3 Research Organization . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>2. Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1 Background and Related Work Survey . . . . . . . . . . . . . . . . 5<br/>2.1.1 Common Strategies to Decrease Power in AI Edge devices . 9<br/>2.1.2 Neural Networks architecture optimizing techniques . . . . . 11<br/>2.1.3 Lightweight Networks . . . . . . . . . . . . . . . . . . . . . 15<br/>2.1.4 Neural network with reduced precision (int8) . . . . . . . . 17<br/>2.1.5 State-of-the-Art MAC Architectures and Vector Multiplications 17<br/>2.1.6 Research examples of MAC architectures and Vector Multiplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br/>6<br/>3. LP-MAC Algorithm & Implementation . . . . . . . . . . . . . . . . . . 25<br/>3.1 Algorithm General Idea Introduction . . . . . . . . . . . . . . . . . 25<br/>3.1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.1.3 Binary Multiplication Example . . . . . . . . . . . . . . . . 26<br/>3.1.4 Algorithm for Binary Multiplication with Bitsnum Bits . . 26<br/>3.1.5 Binary Multiplication of Vectors Example . . . . . . . . . . 26<br/>3.1.6 Binary Multiplication of One Vector with Two Vectors Example 27<br/>3.1.7 Step-by-Step Multiplication . . . . . . . . . . . . . . . . . . 28<br/>3.1.8 Algorithm for Binary Vector Multiplication with N Elements 29<br/>3.1.9 Optimization opportunities in Binary Vector Multiplication 29<br/>3.2 Low Power MAC Novel Algorithm . . . . . . . . . . . . . . . . . . 32<br/>3.2.1 LP-MAC Algorithm Overview . . . . . . . . . . . . . . . . . 32<br/>3.2.2 LP-MAC algorithm example . . . . . . . . . . . . . . . . . . 33<br/>3.2.3 Detailed example for LP-MAC algorithm . . . . . . . . . . . 35<br/>3.3 Applications in Neural Networks . . . . . . . . . . . . . . . . . . . 38<br/>3.3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . 38<br/>3.3.2 Steps to Optimize Multiplications Using Partial Products . 38<br/>3.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br/>3.3.4 General Algorithm in Pseudocode for high vector length . . 39<br/>3.4 LP-MAC Vector Multiplication Generic Implementation . . . . . . 40<br/>3.4.1 Special Case if Vector Inputs are signed . . . . . . . . . . . 40<br/>3.5 Algorithm Working Conditions & Limitations . . . . . . . . . . . . 42<br/>3.5.1 Power Cost of Vector Multiplications in Normal MAC and<br/>LP-MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br/>3.5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br/>4. Architecture for common Neural networks using LP-MAC . . . . . . . . 45<br/>4.1 Low Power MAC Deployment in Fully Connected Layers . . . . . . 46<br/>4.2 Math operations model for CONV Layer . . . . . . . . . . . . . . 50<br/>4.3 Convolutional Layer with LP-MAC . . . . . . . . . . . . . . . . . . 51<br/>4.4 Low Power MAC Deployment in Attention Networks . . . . . . . . 53<br/>5. Low Power MAC Co-processor for Embedded Devices with AHB interface 55<br/>5.1 DNN AHB Co-processors for Embedded Systems . . . . . . . . . . 55<br/>5.2 LP-MAC Co-processor with Tunable Activation Function . . . . . . 57<br/>5.2.1 LP-MAC Array with AHB Interface . . . . . . . . . . . . . 57<br/>5.2.2 Tunable Activation Function . . . . . . . . . . . . . . . . . 58<br/>7<br/>5.2.3 Tunable ReLU Function . . . . . . . . . . . . . . . . . . . . 59<br/>5.2.4 Tunable ReLU Implementation . . . . . . . . . . . . . . . . 60<br/>5.2.5 Novel Fixed-Point Implementation for Power Function . . . 61<br/>6. Power Comparison & Verification Results . . . . . . . . . . . . . . . . . . 65<br/>6.1 Example Implementation and Simulation of an Eyeriss CNN, Both<br/>with the Inclusion of Low Power MAC and Without It . . . . . . . 66<br/>6.2 Simulations and Verification Results . . . . . . . . . . . . . . . . . 68<br/>7. Future Work and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>7.1 Working Conditions and Limitations . . . . . . . . . . . . . . . . . 73<br/>7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br/>References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br/>Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br/>Chapters:<br/>A. RTL Verilog Code for LP-MAC . . . . . . . . . . . . . . . . . . . . . . . 1<br/>B. Synthesis scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
520 3# - Abstract
Abstract	Abstract:<br/>Deep learning has gained importance in multiple fields, including robotics, speech<br/>recognition, and image processing. Nevertheless, deploying deep learning models on<br/>edge and embedded devices with limited power and area resources can be difficult<br/>due to their high computational demands. This thesis introduces a new optimized<br/>power approach for deployment deep learning networks on edge devices, named LPMAC (Low Power Multiply Accumulate). Low power MAC is a low-power technique<br/>introduced to target fixed-point format numbers and optimized for reusing vectors of<br/>inputs for Multiply-Accumulate (MAC) operations. Unlike conventional MAC units<br/>that use multipliers, LP-MAC uses only adders, shifters, and multiplexers, which consume less power. This results in over 30% power reduction, making LP-MAC more<br/>power-efficient. LP-MAC also offers efficient dynamically precision control of MAC<br/>operations, which results in low latency and real-time performance. The characteristics of LP-MAC position it as a highly suitable option for deploying deep learning<br/>architectures on edge devices that have limited power resources. This is especially<br/>true for models such as Convolutional Neural Networks (CNNs), Fully Connected<br/>networks, and networks that utilize Transformer-based attention mechanisms. The<br/>dissertation outlines the hardware realizations for these types of networks and proposes a systematic process for transitioning current networks to leverage LP-MAC<br/>over the traditional MAC units. Furthermore, the thesis details the creation of an<br/>15<br/>AHB-SLAVE co-processor equipped with an LP-MAC array, specifically tailored for<br/>embedded system applications.<br/>Keywords— Artificial intelligence, Hardware accelerators, Machine learning, Deep learning, Neural networks, High performance computing, FPGA, ASIC, GPU, Edge computing,<br/>Cloud computing, Energy efficiency, Performance optimization, System architecture,Parallel<br/>processing, Training and inference, Model compression, Quantization, Sparsity, Pruning,<br/>Multiply Accumulate
546 ## - Language Note
Language Note	Text in English, abstracts in English and Arabic
650 #4 - Subject
Subject	MSD
655 #7 - Index Term-Genre/Form
Source of term	NULIB
focus term	Dissertation, Academic
690 ## - Subject
School	MSD
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
650 #4 - Subject
--	317
655 #7 - Index Term-Genre/Form
--	187
690 ## - Subject
--	317

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Home library	Current library	Date acquired	Total Checkouts	Full call number	Date last seen	Price effective from	Koha item type
		Dewey Decimal Classification			Main library	Main library	08/27/2024		621/M.H.N/ 2024	08/27/2024	08/27/2024	Thesis