Keyed Watermarks (Record no. 10897)

MARC details
000 -LEADER
fixed length control field 08218nam a22002657a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 201210s2024 a|||f bm|| 00| 0 eng d
024 7# - Author Identifier
Standard number or code 0009-0007-1846-825X
Source of number or code ORCID
040 ## - CATALOGING SOURCE
Original cataloging agency EG-CaNU
Transcribing agency EG-CaNU
041 0# - Language Code
Language code of text eng
Language code of abstract eng
-- ara
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name Tawfik Yasser Tawfik Abseif
245 1# - TITLE STATEMENT
Title Keyed Watermarks
Remainder of title : A Fine-grained Watermark Generation for Apache Flink
Statement of responsibility, etc. /Tawfik Yasser Tawfik Abseif
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2024
300 ## - PHYSICAL DESCRIPTION
Extent 66 p.
Other physical details ill.
Dimensions 21 cm.
500 ## - GENERAL NOTE
Materials specified Supervisor: <br/>Mohamed ElHelw
502 ## - Dissertation Note
Dissertation type Thesis (M.A.)—Nile University, Egypt, 2024 .
504 ## - Bibliography
Bibliography "Includes bibliographical references"
505 0# - Contents
Formatted contents note Contents:<br/>Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV<br/>0.1 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV<br/>Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V<br/>Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI<br/>List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX<br/>List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X<br/>Chapters:<br/>1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.2.1 Background and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.2.3 Keyed Watermarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.2.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.2.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.3 Publications in the Course of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>2. Background and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1 Stream Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1.1 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1.2 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br/>2.1.3 Stateful Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br/>2.2 Notions of Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br/>2.2.1 Processing-time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br/>2.2.2 Event-time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7<br/>2.2.3 Out-of-order Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7<br/>2.3 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br/>2.3.1 Tumbling window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>2.3.2 Sliding window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br/>2.3.3 Session window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br/>2.4 Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br/>VII<br/>3. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br/>3.1 Punctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br/>3.2 Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br/>3.3 Watermarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br/>3.4 Order-agnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>3.5 Ordered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>3.6 Timestamp Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>3.7 Keyed Watermark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>3.8 Extended Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br/>4. Keyed Watermarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>4.1 Watermarks in Apache Flink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>4.2 Overview of Flink Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>4.3 Vanilla Watermark Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br/>4.4 Keyed Watermark Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br/>5. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>5.1.1 How we built the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br/>5.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br/>5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br/>6. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br/>6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br/>6.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br/>6.1.2 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br/>6.1.3 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br/>6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br/>Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
520 3# - Abstract
Abstract Abstract:<br/>Big Data Stream processing engines, exemplified by Apache Flink, employ windowing techniques<br/>to manage unbounded streams of events. Aggregating relevant data within Windows is important for<br/>event-time windowing due to its impact on result accuracy. A pivotal role in this process is attributed<br/>to watermarks, unique timestamps signifying event progression in time. The existing watermark<br/>generation method within Apache Flink, operating at the input stream level, exhibits a bias towards<br/>faster sub-streams, causing the omission of events from slower counterparts. Our analysis determined<br/>that Apache Flink’s standard watermark generation approach results in an approximate 33% data<br/>loss when 50% of median-proximate keys experience delays. Furthermore, this loss exceeds 37% in<br/>cases where 50% of randomly selected keys encounter delays.<br/>In this thesis, we introduce an approach termed keyed watermarks to address data loss concerns<br/>and enhance data processing precision to a minimum of 99% in most scenarios. Moreover, our<br/>proposed solution reduces the latency of the watermark processing with multiple parallelism degrees<br/>for the watermark generator which enhances the scalability of stream processing in Apache Flink.<br/>Our strategy facilitates distinct progress monitoring by creating individualized watermarks for each<br/>logical sub-stream (key). Within our investigation, we delineate the essential architectural and API<br/>modifications requisite for integrating keyed watermarks while also highlighting our experience in<br/>navigating the expansion of Apache Flink’s extensive codebase. Moreover, we conduct a comparative<br/>evaluation between the efficacy of our approach and the conventional watermark generation technique<br/>concerning the accuracy of event-time tracking, the latency of watermark processing, and the growth<br/>of Flink’s maintained state.<br/>0.1 Keywords:<br/>Keyed Watermarks, Big Data Stream Processing, Event-Time Tracking, Apache Flink
546 ## - Language Note
Language Note Text in English, abstracts in English and Arabic
650 #4 - Subject
Subject informatics
655 #7 - Index Term-Genre/Form
Source of term NULIB
focus term Dissertation, Academic
690 ## - Subject
School informatics
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
655 #7 - Index Term-Genre/Form
-- 187
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Date acquired Total Checkouts Full call number Date last seen Price effective from Koha item type
    Dewey Decimal Classification     Main library Main library 08/27/2024   610/T.Y.K/2024 08/27/2024 08/27/2024 Thesis