Implementation of Work Flow Integration Techniques in Tavaxy for Bioinformatics Applications / (Record no. 8819)
[ view plain ]
| 000 -LEADER | |
|---|---|
| fixed length control field | 12515nam a22002537a 4500 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
| fixed length control field | 210112b2013 a|||f mb|| 00| 0 eng d |
| 040 ## - CATALOGING SOURCE | |
| Original cataloging agency | EG-CaNU |
| Transcribing agency | EG-CaNU |
| 041 0# - Language Code | |
| Language code of text | eng |
| Language code of abstract | eng |
| 082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER | |
| Classification number | 610 |
| 100 0# - MAIN ENTRY--PERSONAL NAME | |
| Personal name | Shady Alaa Eldin Eissa |
| 245 1# - TITLE STATEMENT | |
| Title | Implementation of Work Flow Integration Techniques in Tavaxy for Bioinformatics Applications / |
| Statement of responsibility, etc. | Shady Alaa Eldin Eissa |
| 260 ## - PUBLICATION, DISTRIBUTION, ETC. | |
| Date of publication, distribution, etc. | 2013 |
| 300 ## - PHYSICAL DESCRIPTION | |
| Extent | 155 p. |
| Other physical details | ill. |
| Dimensions | 21 cm. |
| 500 ## - GENERAL NOTE | |
| Materials specified | Supervisor: Mohamed Abou El-Hoda |
| 502 ## - Dissertation Note | |
| Dissertation type | Thesis (M.A.)—Nile University, Egypt, 2013 . |
| 504 ## - Bibliography | |
| Bibliography | "Includes bibliographical references" |
| 505 0# - Contents | |
| Formatted contents note | Contents:<br/>1 Introduction 1<br/>1.1 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br/>1.3 Sequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br/>1.4 Workflows and workflow management systems . . . . . . . . . . . . . . . 3<br/>1.5 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>1.6 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br/>1.7 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br/>1.8 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>2 Sequence Analysis Introduction 11<br/>2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br/>2.2 Sequencing technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br/>2.2.1 Sequencing technologies comparison . . . . . . . . . . . . . . . . . 13<br/>vii<br/>viii TABLE OF CONTENTS<br/>2.3 Data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>2.3.1 FASTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>2.3.2 FASTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br/>2.3.3 SAM/BAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>2.3.4 VCF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>2.4 Software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br/>2.4.1 Utility Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br/>2.4.2 BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br/>2.4.3 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>2.4.4 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>2.4.5 Variant Calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>3 Related Work 23<br/>3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br/>3.2 Workflow Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br/>3.2.1 Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br/>3.2.2 Formal description . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br/>3.2.3 Workflow editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.2.4 Application domain . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.2.5 Control flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.2.6 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br/>3.2.7 Data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br/>3.2.8 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.9 Execution engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.10 Execution middleware for grid or cluster . . . . . . . . . . . . . . 27<br/>3.2.11 Enactment layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.12 Support for legacy code . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.13 Hierarchical workflows . . . . . . . . . . . . . . . . . . . . . . . . 28<br/>TABLE OF CONTENTS ix<br/>3.2.14 Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br/>3.2.15 Workflow repository . . . . . . . . . . . . . . . . . . . . . . . . . 29<br/>3.3 Workflow interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br/>3.3.1 Interoperability between Taverna and Galaxy . . . . . . . . . . . 30<br/>3.4 Workflow patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br/>3.5 Workflows and Cloud computing . . . . . . . . . . . . . . . . . . . . . . 31<br/>3.5.1 Galaxy and the cloud . . . . . . . . . . . . . . . . . . . . . . . . . 31<br/>4 Taverna and Galaxy 33<br/>4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br/>4.2 Taverna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br/>4.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br/>4.2.2 Workflow description language . . . . . . . . . . . . . . . . . . . . 35<br/>4.2.3 Workflow execution engine . . . . . . . . . . . . . . . . . . . . . . 36<br/>4.3 Galaxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br/>4.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br/>4.3.2 Workflow description language . . . . . . . . . . . . . . . . . . . . 38<br/>4.3.3 Workflow execution engine . . . . . . . . . . . . . . . . . . . . . . 39<br/>4.4 Comparison of Taverna and Galaxy . . . . . . . . . . . . . . . . . . . . . 41<br/>5 Tavaxy 43<br/>5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br/>5.2 Tavaxy architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br/>5.3 Workflow pattern database . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br/>5.3.1 Control patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br/>5.3.2 Advanced data patterns . . . . . . . . . . . . . . . . . . . . . . . 49<br/>5.4 Workflow authoring tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br/>5.5 Workflow description language . . . . . . . . . . . . . . . . . . . . . . . . 52<br/>5.5.1 Tool description file . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br/>x TABLE OF CONTENTS<br/>5.6 Workflow mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br/>5.6.1 Executable workflow . . . . . . . . . . . . . . . . . . . . . . . . . 53<br/>5.6.2 Translating workflow description languages . . . . . . . . . . . . . 54<br/>5.6.3 Workflow execution optimization . . . . . . . . . . . . . . . . . . 59<br/>5.7 Integrating Galaxy and Taverna workflows in Tavaxy . . . . . . . . . . . 60<br/>5.8 Workflow engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br/>6 Cloud computing support inside Tavaxy 67<br/>6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br/>6.2 Amazon Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br/>6.3 elasticHPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br/>6.4 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br/>6.5 Tavaxy on the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br/>6.6 Tool on the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>6.7 Subworkflow on the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br/>7 Case studies and Experiments 77<br/>7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br/>7.2 Case study I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br/>7.2.1 Composing heterogeneous sub-workflows on Tavaxy . . . . . . . . 77<br/>7.2.2 Measuring the performance . . . . . . . . . . . . . . . . . . . . . 82<br/>7.3 Case study II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br/>7.3.1 Metagenomics workflow . . . . . . . . . . . . . . . . . . . . . . . 85<br/>7.3.2 Measuring the performance . . . . . . . . . . . . . . . . . . . . . 89<br/>7.4 Use of cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br/>8 Conclusion and future work 95<br/>8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br/>8.1.1 Pattern based workflows interoperability . . . . . . . . . . . . . . 95<br/>TABLE OF CONTENTS xi<br/>8.1.2 Use of patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96<br/>8.1.3 Cloud computing support . . . . . . . . . . . . . . . . . . . . . . 96<br/>8.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br/>8.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br/>A Translating SCUFL to tSCUFL 99<br/>A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br/>A.1.1 Taverna language . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br/>A.1.2 Tavaxy language . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br/>A.1.3 Galaxy language . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br/>A.2 Simple Workflow example . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br/>A.2.1 SCUFL representation . . . . . . . . . . . . . . . . . . . . . . . . 102<br/>A.2.2 Galaxy JSON representation . . . . . . . . . . . . . . . . . . . . . 102<br/>A.3 Translating processors and links . . . . . . . . . . . . . . . . . . . . . . . 104<br/>A.3.1 Translation with replacement . . . . . . . . . . . . . . . . . . . . 104<br/>A.3.2 Translation without replacement . . . . . . . . . . . . . . . . . . . 104<br/>A.4 Translating dependency links . . . . . . . . . . . . . . . . . . . . . . . . 104<br/>A.5 Translating conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br/>A.6 Translating Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br/>B Workflow management systems survey 113<br/>B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br/>B.2 General features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<br/>B.3 Supported workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116<br/>B.4 Workflow execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br/>B.5 Storage and sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119<br/>C Patterns implementation 121<br/>C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>xii TABLE OF CONTENTS<br/>C.2 Switch pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>C.3 Data select pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>C.4 Data merge pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>C.5 Iteration pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>Bibliography 127 |
| 520 3# - Abstract | |
| Abstract | Abstract:<br/>Sequence analysis is one of the most important domains within bioinformatics. This<br/>is because the sequence of nucleotides in DNA determines the function and structure of<br/>the produced protein which in turn controls the functions and reproduction of the whole<br/>cell.<br/>The sequence analysis domain includes versatile tasks such as sequences alignment,<br/>protein structure prediction, database search, among others. These different tasks are<br/>addressed using growing set of tools, and the tools are different in terms of usage and<br/>computational requirements. Recent analysis endeavor require the use of multiple tools<br/>in a pipelined fashion. Workflow management systems have emerged to facilitate the<br/>design and execution of such pipelines.<br/>In the last few years, many workflow systems have been developed. Within the bioinformatics<br/>community, Taverna and Galaxy are the most popular. However, there is an<br/>interoperability problem between them where a workflow developed for one system can<br/>not work for the other. Furthermore, not all these systems can easily handle large amounts<br/>of biological data such as those produced by Next Generation Sequencing technologies<br/>and cloud computing which is an emerging important paradigm is not fully utilized in an<br/>easy to use way.<br/>In this thesis, we present Tavaxy, a pattern based bioinformatics workflow management<br/>system with cloud computing support. It brings together the features of Taverna<br/>and Galaxy, where a workflow written in either Taverna or Galaxy can be imported to Tavaxy and modified within its environment and executed. In addition, workflows within<br/>Tavaxy can run on cloud computing infrastructure. We also provide set of data patterns<br/>that facilitate the large scale parallel processing of large datasets |
| 546 ## - Language Note | |
| Language Note | Text in English, abstracts in English . |
| 650 #4 - Subject | |
| Subject | Informatics-IFM |
| 655 #7 - Index Term-Genre/Form | |
| Source of term | NULIB |
| focus term | Dissertation, Academic |
| 690 ## - Subject | |
| School | Informatics-IFM |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
| Source of classification or shelving scheme | Dewey Decimal Classification |
| Koha item type | Thesis |
| 650 #4 - Subject | |
| -- | 266 |
| 655 #7 - Index Term-Genre/Form | |
| -- | 187 |
| 690 ## - Subject | |
| -- | 266 |
| Withdrawn status | Lost status | Source of classification or shelving scheme | Damaged status | Not for loan | Home library | Current library | Date acquired | Total Checkouts | Full call number | Date last seen | Price effective from | Koha item type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dewey Decimal Classification | Not For Loan | Main library | Main library | 01/12/2021 | 610/ SE.I 2013 | 01/12/2021 | 01/12/2021 | Thesis |