Implementation of Work Flow Integration Techniques in Tavaxy for Bioinformatics Applications / (Record no. 8819)

MARC details
000 -LEADER
fixed length control field 12515nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 210112b2013 a|||f mb|| 00| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency EG-CaNU
Transcribing agency EG-CaNU
041 0# - Language Code
Language code of text eng
Language code of abstract eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name Shady Alaa Eldin Eissa
245 1# - TITLE STATEMENT
Title Implementation of Work Flow Integration Techniques in Tavaxy for Bioinformatics Applications /
Statement of responsibility, etc. Shady Alaa Eldin Eissa
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2013
300 ## - PHYSICAL DESCRIPTION
Extent 155 p.
Other physical details ill.
Dimensions 21 cm.
500 ## - GENERAL NOTE
Materials specified Supervisor: Mohamed Abou El-Hoda
502 ## - Dissertation Note
Dissertation type Thesis (M.A.)—Nile University, Egypt, 2013 .
504 ## - Bibliography
Bibliography "Includes bibliographical references"
505 0# - Contents
Formatted contents note Contents:<br/>1 Introduction 1<br/>1.1 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br/>1.3 Sequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br/>1.4 Workflows and workflow management systems . . . . . . . . . . . . . . . 3<br/>1.5 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>1.6 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br/>1.7 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br/>1.8 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>2 Sequence Analysis Introduction 11<br/>2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br/>2.2 Sequencing technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br/>2.2.1 Sequencing technologies comparison . . . . . . . . . . . . . . . . . 13<br/>vii<br/>viii TABLE OF CONTENTS<br/>2.3 Data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>2.3.1 FASTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br/>2.3.2 FASTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br/>2.3.3 SAM/BAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>2.3.4 VCF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br/>2.4 Software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br/>2.4.1 Utility Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br/>2.4.2 BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br/>2.4.3 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>2.4.4 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>2.4.5 Variant Calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>3 Related Work 23<br/>3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br/>3.2 Workflow Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br/>3.2.1 Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br/>3.2.2 Formal description . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br/>3.2.3 Workflow editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.2.4 Application domain . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.2.5 Control flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>3.2.6 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br/>3.2.7 Data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br/>3.2.8 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.9 Execution engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.10 Execution middleware for grid or cluster . . . . . . . . . . . . . . 27<br/>3.2.11 Enactment layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.12 Support for legacy code . . . . . . . . . . . . . . . . . . . . . . . 27<br/>3.2.13 Hierarchical workflows . . . . . . . . . . . . . . . . . . . . . . . . 28<br/>TABLE OF CONTENTS ix<br/>3.2.14 Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28<br/>3.2.15 Workflow repository . . . . . . . . . . . . . . . . . . . . . . . . . 29<br/>3.3 Workflow interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br/>3.3.1 Interoperability between Taverna and Galaxy . . . . . . . . . . . 30<br/>3.4 Workflow patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br/>3.5 Workflows and Cloud computing . . . . . . . . . . . . . . . . . . . . . . 31<br/>3.5.1 Galaxy and the cloud . . . . . . . . . . . . . . . . . . . . . . . . . 31<br/>4 Taverna and Galaxy 33<br/>4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br/>4.2 Taverna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br/>4.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br/>4.2.2 Workflow description language . . . . . . . . . . . . . . . . . . . . 35<br/>4.2.3 Workflow execution engine . . . . . . . . . . . . . . . . . . . . . . 36<br/>4.3 Galaxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br/>4.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br/>4.3.2 Workflow description language . . . . . . . . . . . . . . . . . . . . 38<br/>4.3.3 Workflow execution engine . . . . . . . . . . . . . . . . . . . . . . 39<br/>4.4 Comparison of Taverna and Galaxy . . . . . . . . . . . . . . . . . . . . . 41<br/>5 Tavaxy 43<br/>5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br/>5.2 Tavaxy architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br/>5.3 Workflow pattern database . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br/>5.3.1 Control patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br/>5.3.2 Advanced data patterns . . . . . . . . . . . . . . . . . . . . . . . 49<br/>5.4 Workflow authoring tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br/>5.5 Workflow description language . . . . . . . . . . . . . . . . . . . . . . . . 52<br/>5.5.1 Tool description file . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br/>x TABLE OF CONTENTS<br/>5.6 Workflow mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br/>5.6.1 Executable workflow . . . . . . . . . . . . . . . . . . . . . . . . . 53<br/>5.6.2 Translating workflow description languages . . . . . . . . . . . . . 54<br/>5.6.3 Workflow execution optimization . . . . . . . . . . . . . . . . . . 59<br/>5.7 Integrating Galaxy and Taverna workflows in Tavaxy . . . . . . . . . . . 60<br/>5.8 Workflow engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br/>6 Cloud computing support inside Tavaxy 67<br/>6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br/>6.2 Amazon Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br/>6.3 elasticHPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br/>6.4 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br/>6.5 Tavaxy on the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br/>6.6 Tool on the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br/>6.7 Subworkflow on the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br/>7 Case studies and Experiments 77<br/>7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br/>7.2 Case study I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<br/>7.2.1 Composing heterogeneous sub-workflows on Tavaxy . . . . . . . . 77<br/>7.2.2 Measuring the performance . . . . . . . . . . . . . . . . . . . . . 82<br/>7.3 Case study II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br/>7.3.1 Metagenomics workflow . . . . . . . . . . . . . . . . . . . . . . . 85<br/>7.3.2 Measuring the performance . . . . . . . . . . . . . . . . . . . . . 89<br/>7.4 Use of cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br/>8 Conclusion and future work 95<br/>8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br/>8.1.1 Pattern based workflows interoperability . . . . . . . . . . . . . . 95<br/>TABLE OF CONTENTS xi<br/>8.1.2 Use of patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96<br/>8.1.3 Cloud computing support . . . . . . . . . . . . . . . . . . . . . . 96<br/>8.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br/>8.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br/>A Translating SCUFL to tSCUFL 99<br/>A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br/>A.1.1 Taverna language . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br/>A.1.2 Tavaxy language . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br/>A.1.3 Galaxy language . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br/>A.2 Simple Workflow example . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br/>A.2.1 SCUFL representation . . . . . . . . . . . . . . . . . . . . . . . . 102<br/>A.2.2 Galaxy JSON representation . . . . . . . . . . . . . . . . . . . . . 102<br/>A.3 Translating processors and links . . . . . . . . . . . . . . . . . . . . . . . 104<br/>A.3.1 Translation with replacement . . . . . . . . . . . . . . . . . . . . 104<br/>A.3.2 Translation without replacement . . . . . . . . . . . . . . . . . . . 104<br/>A.4 Translating dependency links . . . . . . . . . . . . . . . . . . . . . . . . 104<br/>A.5 Translating conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br/>A.6 Translating Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br/>B Workflow management systems survey 113<br/>B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br/>B.2 General features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<br/>B.3 Supported workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116<br/>B.4 Workflow execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br/>B.5 Storage and sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119<br/>C Patterns implementation 121<br/>C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>xii TABLE OF CONTENTS<br/>C.2 Switch pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>C.3 Data select pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>C.4 Data merge pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>C.5 Iteration pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br/>Bibliography 127
520 3# - Abstract
Abstract Abstract:<br/>Sequence analysis is one of the most important domains within bioinformatics. This<br/>is because the sequence of nucleotides in DNA determines the function and structure of<br/>the produced protein which in turn controls the functions and reproduction of the whole<br/>cell.<br/>The sequence analysis domain includes versatile tasks such as sequences alignment,<br/>protein structure prediction, database search, among others. These different tasks are<br/>addressed using growing set of tools, and the tools are different in terms of usage and<br/>computational requirements. Recent analysis endeavor require the use of multiple tools<br/>in a pipelined fashion. Workflow management systems have emerged to facilitate the<br/>design and execution of such pipelines.<br/>In the last few years, many workflow systems have been developed. Within the bioinformatics<br/>community, Taverna and Galaxy are the most popular. However, there is an<br/>interoperability problem between them where a workflow developed for one system can<br/>not work for the other. Furthermore, not all these systems can easily handle large amounts<br/>of biological data such as those produced by Next Generation Sequencing technologies<br/>and cloud computing which is an emerging important paradigm is not fully utilized in an<br/>easy to use way.<br/>In this thesis, we present Tavaxy, a pattern based bioinformatics workflow management<br/>system with cloud computing support. It brings together the features of Taverna<br/>and Galaxy, where a workflow written in either Taverna or Galaxy can be imported to Tavaxy and modified within its environment and executed. In addition, workflows within<br/>Tavaxy can run on cloud computing infrastructure. We also provide set of data patterns<br/>that facilitate the large scale parallel processing of large datasets
546 ## - Language Note
Language Note Text in English, abstracts in English .
650 #4 - Subject
Subject Informatics-IFM
655 #7 - Index Term-Genre/Form
Source of term NULIB
focus term Dissertation, Academic
690 ## - Subject
School Informatics-IFM
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
650 #4 - Subject
-- 266
655 #7 - Index Term-Genre/Form
-- 187
690 ## - Subject
-- 266
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Date acquired Total Checkouts Full call number Date last seen Price effective from Koha item type
    Dewey Decimal Classification   Not For Loan Main library Main library 01/12/2021   610/ SE.I 2013 01/12/2021 01/12/2021 Thesis