Training at CyberInfrastructure Partnership

Event Announcement

Cyberinfrastructure Seminar Series

Tuesday, October 17, 2006

Accelerating the Scientific Exploration Process with Scientific Workflows

Ilkay Altintas, SDSC
11:00 AM - 12:30 PM (PDT)
1:00   PM - 2:30 PM (CDT)
Speaker Location: SDSC
NCSA Room 3000 Via Access Grid
AG Venue: http://agschedule.ncsa.uiuc.edu/meetingdetails.asp?MID=17602

Although an increasing amount of middleware has emerged in the last few years to achieve remote data access, distributed job execution, and data management, orchestrating these technologies with minimal overhead still remains a difficult task for scientists. Scientific workflow systems improve this situation by creating interfaces to a variety of technologies and automating the execution and monitoring of the workflows. Workflow systems provide domain-independent customizable interfaces and tools that combine different Cyberinfrastructure technologies along with efficient methods for using them. As simulations and experiments move into the petascale regime, the orchestration of long running data and compute intensive tasks is becoming a major requirement for the successful steering and completion of scientific investigations.

A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Kepler (http://kepler-project.org) is a cross-project collaboration whose purpose is to develop a domain-independent scientific workflow system. It provides a workflow environment in which scientists design and execute scientific workflows by specifying the desired sequence of computational actions and the appropriate data flow, including required data transformations, between these steps. Currently deployed workflows range from local analytical pipelines to distributed, high­performance and high-throughput applications, which can be both data- and compute-intensive. The scientific workflow approach offers a number of advantages over traditional scripting- based approaches, including ease of configuration, improved reusability and maintenance of workflows and components (actors), automated provenance management, "smart" re-running of different versions of workflow instances, on-the-fly updateable parameters, monitoring of long running tasks, and support for fault-tolerance and recovery from failures.

This seminar presents an overview of common scientific workflow requirements and their associated features that are lacking in current state-of-the-art workflow management systems. These features are then illustrated using the Kepler workflow system, both from a user's and a "workflow engineer's" point-of-view. In particular, the use of some of the current features of Kepler in several scientific applications are highlighted, as well as upcoming extensions and improvements that are geared at different user communities.

The Cyberinfrastructure Seminar Series is a set of presentations on cyberinfrastructure and related research organized by NCSA and SDSC. These seminars are available on site at the presenting institution and remotely via the Access Grid. All Access Grid sites are welcome to participate in this seminar. If you have any questions about this seminar, contact the NCSA Training Group.