Incorporating Perfect Hashing in the Enhanced Suffix Array to Improve Exact Pattern Matching / Ayat Ahmed Reda Dawood Hatem
Material type:
TextLanguage: English Summary language: English Publication details: 2009Description: 75 p. ill. 21 cmSubject(s): Genre/Form: DDC classification: - 627
| Item type | Current library | Call number | Status | Date due | Barcode | |
|---|---|---|---|---|---|---|
Thesis
|
Main library | 627/ A.H.I 2009 (Browse shelf(Opens below)) | Not For Loan |
Browsing Main library shelves Close shelf browser (Hides shelf browser)
Supervisor: Mohamed Abouelhoda
Thesis (M.A.)—Nile University, Egypt, 2009 .
"Includes bibliographical references"
Contents:
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Exact Pattern Matching Problem . . . . . . . . . . . . . . . . . . . 1
1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Exact Pattern Matching Over Indexing data structures . . . . . . . . . . 4
2.1 Basic Notions and Definitions . . . . . . . . . . . . . . . . . . . . . 4
2.2 Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Exact Pattern Matching over The Suffix Tree . . . . . . . . 6
2.2.3 Drawbacks of The Suffix Tree . . . . . . . . . . . . . . . . . 6
2.3 Suffix Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Exact Pattern Matching Over The Suffix Array . . . . . . . 8
2.3.3 Drawbacks of The Suffix Array . . . . . . . . . . . . . . . . 10
2.4 The Enhanced Suffix Array . . . .
2.4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.2 Exact Pattern Matching Over The Enhanced Suffix Array . 13
3. Minimal Perfect Hash Functions . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Minimal perfect hash functions (MPHF) . . . . . . . . . . . . . . . 21
3.2 Methods for Constructing Perfect Hashing . . . . . . . . . . . . . . 22
3.2.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Reducing the search space . . . . . . . . . . . . . . . . . . . 24
3.2.3 Probabilistic perfect hashing . . . . . . . . . . . . . . . . . 26
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4. Exact Pattern Matching in O(m) . . . . . . . . . . . . . . . . . . . . . . 32
4.1 Proposed Searching Method . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Lcp-interval tree pruning . . . . . . . . . . . . . . . . . . . 37
4.2.2 Replacing the home table with the index tree . . . . . . . . 38
4.2.3 Binary search instead of sequential search . . . . . . . . . . 40
4.2.4 Lcp-table in Manber Myer’s . . . . . . . . . . . . . . . . . . 40
4.2.5 Other variations . . . . . . . . . . . . . . . . . . . . . . . . 40
5. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.1 Algorithm variations . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Running Time for The Different Algorithms . . . . . . . . . . . . . 46
5.2.1 Enumerative Search . . . . . . . . . . . . . . . . . . . . . . 49
5.2.2 Counting Search . . . . . . . . . . . . . . . . . . . . . . . . 54
6. Conclusions and Future Work . . . . .
Abstract:
The exact pattern matching problem is the problem of finding the exact match
for a pattern in a text. This problem is a basic problem in many applications in
computer science. Many algorithms recently presented to solve this problem are
based on indexing data structures. A major indexing data structure used by these
algorithms is the enhanced suffix array, which is basically the suffix array enhanced
with extra tables that give further information. The best theoretical time bound
based on this data structure is O(mlog ||), where m is the pattern length and ||
is the alphabet length. Although this complexity is theoretically considered the best
achieved, it could not practically outperform the performance of the algorithm of
Manber and Myers based only on the suffix array that takes O(mlog n), where n is
the length of the text. In fact, Manber and Myers based algorithm is faster 1.5 up to
6.5 times compared to the other algorithms.
In this thesis, we present a modification to the enhanced suffix array which enables
us to achieve O(m) time to find the pattern, i.e., independent of . The basic idea
behind this improvement is the extensive use of the minimal perfect hashing technique,
by which a static set of n elements can be accessed in O(1) time while maintaining
a linear space. Furthermore, we explored different implementation tricks to speed up
our searching algorithm. Interestingly, our algorithm became faster than Manber and
Myers’ algorithm for certain alphabet size and range of query lengths
Text in English, abstracts in English.
There are no comments on this title.