Incorporating Perfect Hashing in the Enhanced Suffix Array to Improve Exact Pattern Matching / Ayat Ahmed Reda Dawood Hatem

By:

Ayat Ahmed Reda Dawood Hatem

Material type: Text

TextLanguage: English Summary language: English Publication details: 2009Description: 75 p. ill. 21 cmSubject(s):

Genre/Form:

Dissertation, Academic

DDC classification:

Contents:

Contents: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Exact Pattern Matching Problem . . . . . . . . . . . . . . . . . . . 1 1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Exact Pattern Matching Over Indexing data structures . . . . . . . . . . 4 2.1 Basic Notions and Definitions . . . . . . . . . . . . . . . . . . . . . 4 2.2 Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.2 Exact Pattern Matching over The Suffix Tree . . . . . . . . 6 2.2.3 Drawbacks of The Suffix Tree . . . . . . . . . . . . . . . . . 6 2.3 Suffix Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 Exact Pattern Matching Over The Suffix Array . . . . . . . 8 2.3.3 Drawbacks of The Suffix Array . . . . . . . . . . . . . . . . 10 2.4 The Enhanced Suffix Array . . . . 2.4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.2 Exact Pattern Matching Over The Enhanced Suffix Array . 13 3. Minimal Perfect Hash Functions . . . . . . . . . . . . . . . . . . . . . . . 21 3.1 Minimal perfect hash functions (MPHF) . . . . . . . . . . . . . . . 21 3.2 Methods for Constructing Perfect Hashing . . . . . . . . . . . . . . 22 3.2.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Reducing the search space . . . . . . . . . . . . . . . . . . . 24 3.2.3 Probabilistic perfect hashing . . . . . . . . . . . . . . . . . 26 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4. Exact Pattern Matching in O(m) . . . . . . . . . . . . . . . . . . . . . . 32 4.1 Proposed Searching Method . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.1 Lcp-interval tree pruning . . . . . . . . . . . . . . . . . . . 37 4.2.2 Replacing the home table with the index tree . . . . . . . . 38 4.2.3 Binary search instead of sequential search . . . . . . . . . . 40 4.2.4 Lcp-table in Manber Myer’s . . . . . . . . . . . . . . . . . . 40 4.2.5 Other variations . . . . . . . . . . . . . . . . . . . . . . . . 40 5. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.1 Algorithm variations . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Running Time for The Different Algorithms . . . . . . . . . . . . . 46 5.2.1 Enumerative Search . . . . . . . . . . . . . . . . . . . . . . 49 5.2.2 Counting Search . . . . . . . . . . . . . . . . . . . . . . . . 54 6. Conclusions and Future Work . . . . .

Dissertation note: Thesis (M.A.)—Nile University, Egypt, 2009 . Abstract: Abstract: The exact pattern matching problem is the problem of finding the exact match for a pattern in a text. This problem is a basic problem in many applications in computer science. Many algorithms recently presented to solve this problem are based on indexing data structures. A major indexing data structure used by these algorithms is the enhanced suffix array, which is basically the suffix array enhanced with extra tables that give further information. The best theoretical time bound based on this data structure is O(mlog ||), where m is the pattern length and || is the alphabet length. Although this complexity is theoretically considered the best achieved, it could not practically outperform the performance of the algorithm of Manber and Myers based only on the suffix array that takes O(mlog n), where n is the length of the text. In fact, Manber and Myers based algorithm is faster 1.5 up to 6.5 times compared to the other algorithms. In this thesis, we present a modification to the enhanced suffix array which enables us to achieve O(m) time to find the pattern, i.e., independent of . The basic idea behind this improvement is the extensive use of the minimal perfect hashing technique, by which a static set of n elements can be accessed in O(1) time while maintaining a linear space. Furthermore, we explored different implementation tricks to speed up our searching algorithm. Interestingly, our algorithm became faster than Manber and Myers’ algorithm for certain alphabet size and range of query lengths

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Date due	Barcode
Thesis	Main library	627/ A.H.I 2009 (Browse shelf(Opens below))	Not For Loan

Browsing Main library shelves Close shelf browser (Hides shelf browser)

Previous	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	No cover image available No cover image available	Next
Previous	627/ A.E.V 2016 Visual Analytics of Firewall Log Events/	627/ A.F.P 2012 Perspectives on Improving Software Maintenance Based on ITIL/	627/ A.H.E 2018 EMBEDDED LOGIC ANLYZER FOR STORAGE PROTOCOLS /	627/ A.H.I 2009 Incorporating Perfect Hashing in the Enhanced Suffix Array to Improve Exact Pattern Matching /	627/A.H.S/2025 Semi formally verified enhanced code translators of mobile applications programming languages	627/ A.I.R 2022 A Reliable Firmware-Over-The-Air Architecture for Industrial Internet of Things	627/A.M.A/2025 Automotive Cybersecurity Engineering Standardization and Regulation	Next

Supervisor: Mohamed Abouelhoda

Thesis (M.A.)—Nile University, Egypt, 2009 .

"Includes bibliographical references"

Contents:
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Exact Pattern Matching Problem . . . . . . . . . . . . . . . . . . . 1
1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Exact Pattern Matching Over Indexing data structures . . . . . . . . . . 4
2.1 Basic Notions and Definitions . . . . . . . . . . . . . . . . . . . . . 4
2.2 Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Exact Pattern Matching over The Suffix Tree . . . . . . . . 6
2.2.3 Drawbacks of The Suffix Tree . . . . . . . . . . . . . . . . . 6
2.3 Suffix Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Exact Pattern Matching Over The Suffix Array . . . . . . . 8
2.3.3 Drawbacks of The Suffix Array . . . . . . . . . . . . . . . . 10
2.4 The Enhanced Suffix Array . . . .
2.4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.2 Exact Pattern Matching Over The Enhanced Suffix Array . 13
3. Minimal Perfect Hash Functions . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Minimal perfect hash functions (MPHF) . . . . . . . . . . . . . . . 21
3.2 Methods for Constructing Perfect Hashing . . . . . . . . . . . . . . 22
3.2.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Reducing the search space . . . . . . . . . . . . . . . . . . . 24
3.2.3 Probabilistic perfect hashing . . . . . . . . . . . . . . . . . 26
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4. Exact Pattern Matching in O(m) . . . . . . . . . . . . . . . . . . . . . . 32
4.1 Proposed Searching Method . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Lcp-interval tree pruning . . . . . . . . . . . . . . . . . . . 37
4.2.2 Replacing the home table with the index tree . . . . . . . . 38
4.2.3 Binary search instead of sequential search . . . . . . . . . . 40
4.2.4 Lcp-table in Manber Myer’s . . . . . . . . . . . . . . . . . . 40
4.2.5 Other variations . . . . . . . . . . . . . . . . . . . . . . . . 40
5. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.1 Algorithm variations . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Running Time for The Different Algorithms . . . . . . . . . . . . . 46
5.2.1 Enumerative Search . . . . . . . . . . . . . . . . . . . . . . 49
5.2.2 Counting Search . . . . . . . . . . . . . . . . . . . . . . . . 54
6. Conclusions and Future Work . . . . .

Abstract:
The exact pattern matching problem is the problem of finding the exact match
for a pattern in a text. This problem is a basic problem in many applications in
computer science. Many algorithms recently presented to solve this problem are
based on indexing data structures. A major indexing data structure used by these
algorithms is the enhanced suffix array, which is basically the suffix array enhanced
with extra tables that give further information. The best theoretical time bound
based on this data structure is O(mlog ||), where m is the pattern length and ||
is the alphabet length. Although this complexity is theoretically considered the best
achieved, it could not practically outperform the performance of the algorithm of
Manber and Myers based only on the suffix array that takes O(mlog n), where n is
the length of the text. In fact, Manber and Myers based algorithm is faster 1.5 up to
6.5 times compared to the other algorithms.
In this thesis, we present a modification to the enhanced suffix array which enables
us to achieve O(m) time to find the pattern, i.e., independent of . The basic idea
behind this improvement is the extensive use of the minimal perfect hashing technique,
by which a static set of n elements can be accessed in O(1) time while maintaining
a linear space. Furthermore, we explored different implementation tricks to speed up
our searching algorithm. Interestingly, our algorithm became faster than Manber and
Myers’ algorithm for certain alphabet size and range of query lengths

Text in English, abstracts in English.

There are no comments on this title.

to post a comment.