Extending Grid-based Workflow Tools with Patterns ... - CiteSeerX

15 downloads 162 Views 831KB Size Report
develop some general patterns identifying how com- ponents within a workflow may .... languages (such as Python or Perl) – and therefore the specified semantics are ...... Workshop, Sinaia, Romania, June 2003, IOS Press. [12] I. Taylor et al.,.
Extending Grid-based Workflow Tools with Patterns/Operators Cecilia Gomes1 , Omer F. Rana2 and Jose Cunha1 CITI Center, Universidade Nova de Lisboa, Portugal School of Computer Science/Welsh e-Science Center, Cardiff University, UK

Abstract

ranging from Directed Acyclic Graphs (DAGs) to language-based formalisms. On the other hand, run time functions relate to the management of workflow execution/enactment and interactions with Grid resources. Examples of this approach include the use of constraint checking mechanisms [24]. This separation is useful to allow a given build time function to be mapped to multiple run time functions and vice versa. The run time function is built over an existing Grid middleware system (such as the Globus toolkit or UNICORE).

Many Grid applications involve combining computational and data access components into complex workflows. A distinction is generally made between mechanisms to compose components (referred to as build time functions) and subsequently mechanisms to execute these components on distributed resources (referred to as run time functions). An approach to supporting such build and run time functions using specialist patterns and operators is presented. “Structural” patterns may be treated as meta-components within a workflow system, and used within the composition process. Subsequently, such components may be scheduled for execution using “Behavioural” patterns via the enactment process. Application examples to demonstrate how such patterns – and subsequently operators – may be used are presented. Their implementation within the Triana Problem Solving Environment is also described.

1

Existing approaches to workflow construction (build time functions) involve connecting multiple components according to their dependencies. The granularity of each component can vary from being a complete application, a library component to a subroutine. Given such a composition, it is possible to develop some general patterns identifying how components within a workflow may be combined. Hence, it is possible to identify a catalogue of such common patterns, as undertaken in [19], in the context of business services. Such patterns form the basis for structuring a particular application, and could also be included within a workflow system and instantiated to particular executable components at run time. A second approach to build time functions involves the dynamic construction of a workflow from applicationlevel descriptions of a given data product. In this instance, given a data product, a planning engine is used for selecting appropriate components to achieve the required data product. The approach can also be extended to a run time function, requiring the planner to also discover suitable computational resources to execute these components. Such an approach is

Introduction and Motivation

The importance of workflow in Grid computing applications has been recognised by a number of researchers [18, 1]. Generally such workflow involves the orchestration of services which are hosted on different resources, and often in different administrative domains. According to [2], Grid workflow management could be characterised into build time functions and run time functions. Build time functions relate to defining and modelling the workflow tasks/services and their dependencies. A variety of approaches exist for such build time specification – 1

used to first create an “abstract” workflow – identifying suitable components based on the description of their capabilities, and combining these together using a pre-defined plan library. The abstract workflow is then mapped into a “concrete” workflow by associating each component with an executable task (identifying location of input/output files, configuration parameters, and access rights) [3]. The use of planning approaches in composing Web Services has also been considered by other authors, such as with OWL-S [21] and recently WSMO [22]. Such approaches involve the use of a rich semantic annotation with each service, and a process graph that represents relationships between services. OWL-S, for instance, makes use of a “service profile” (defining the I/O model of the service), a “service model” (identifying how the service works), and a “service grounding” (identifying how the service should be executed). A process model (atomic, simple or composite) is then used to combine services, using flow-control constructs such as sequence, iterate, repeat-until etc. An alternative approach involves the use of planning as model checking. In this approach an executor, given a goal to achieve, identifies all sets of possible states that can be valid given the goal state and the current input data. These states are then pruned based on other knowledge (such as “beliefs”) available to the executor. A key feature of this approach is that the executor can only partially observe the domain within which composition is taking place – therefore imprecise knowledge of the domain is assumed. However, it is necessary to build a model of the domain – involving a detailed understanding of the starting state and a description of all the interaction protocols involved. This requirement renders this approach difficult to utilize in the context of a distributed Grid system.

1.1

to their existing components. Furthermore, finding equivalence between semantically similar component descriptions is a non-trivial undertaking. On the other hand, model checking approaches require a detailed description of protocols involved within a particular domain and could lead to a high computational complexity to identify the set of all valid states – in the context of one or more protocols. We therefore propose an approach that makes use of design patterns at the compositional level (referred to as “structural patterns”) and at the behaviour/run time level (referred to as “behavioural patterns”). These patterns can then be manipulated using scripting tools through the use of “structural” and “behavioural operators”. Hence we: (1) extend the capabilities of existing workflow systems with support for Patterns and Operators, and (2) map the “Behavioural” patterns and operators to a resource management system. In this way we address both build time and run time functions, and provide an option midway between a full semantic annotation of components for automated composition and a totally user defined composition using a graphical tool. Patterns allow the abstraction of common interactions between components, thereby enabling reuse of interactions that have proven useful in similar domains. A user may build an application in a structured fashion by selecting the most appropriate set of patterns, and combine them according to pre-defined operator semantics. Users may define new patterns and add these as standard components to a tool library for use by others. Patterns and Operators also provide additional capability that is not easily representable via visual components. Our approach treats patterns as first class entities but differs from other work [5, 6] in that the user may explicitly define structural constraints between components, separately from the behavioural constraints. Our approach is somewhat similar to that of van der Aalst et al. [17] – although they do not make a distinction between structural and behavioural patterns. Their work does not have a notion of “operators” as a means to manipulate such patterns. They also focus on Petri net models of their patterns, whereas our concern is to link patterns with particular resource managers and composition tools. Furthermore, the

Extending Workflow Tools

Our approach lies between a user-defined workflow (both build and run time), and an automated workflow using planning/semantic approaches identified above. We believe the semantic approaches are too complex for many existing Grid applications, as they require application users to provide annotations 2

approach we adopt enables a workflow designer to identify constraints on the enactment process. The mapping of operators to a resource manager is therefore an aspect not considered by other Grid-based workflow systems. The approach presented here is primarily aimed at computational scientists and developers, who have some understanding of the computational needs of their application domain. A scientist should be aware about the likely co-ordination and interaction types between components of the application (such as a database or numeric solver etc). The structural and behavioural patterns presented here will enable such scientists and developers to utilise common usage scenarios within a domain (either the use of particular components, such as database systems, or interactions between components, such as the use of streaming). The paper is structured as follows: section 2 introduces our concept of Patterns and Operators, and section 3 demonstrates how these are implemented in the Triana system [7]). The mapping of behavioural patterns and operators to the DRMAA API [8] is then provided in section 5.

2

pattern operators may be applied in an ordered combination – and may be shared between users. The presented structural and behavioural operators may be implemented using a number of different scripting languages (such as Python or Perl) – and therefore the specified semantics are not restricted to our Java implementation. A brief overview of our pattern templates and operators is provided here, details can be found in [10, 11]. These references also contain semantics of the patterns and operators discussed here, along with Petri net models of some of these operators. Alternative related work has been undertaken by the parallel computing community, and is based on the use of algorithmic skeletons. The predominant motivation behind this has been the need to overcome the difficulty of constructing parallel programs – by capturing common algorithmic forms which may subsequently be used as components for building parallel programs [27], [28]. Such skeletons are expected to provide parameterizable abstractions that may be composed – generally using a functional programming language. A skeleton is expected to be transparent to an application user (and may come with a prepackaged implementation). Skeletons are viewed formally as polymorphic, higher-order functions – which may be repeatedly applied to achieve various transformations (on data structures such as lists). Herrmann and Lengauer [26] outline the use of a programming language “Higher-order Divide and Conquer” (HDC) based on a subset of the functional programming language Haskell. They suggest that the use of a powerful type system in functional languages make them more suitable than other paradigms. Although useful for specifying programs in a concise syntax, we believe such approaches are limited in the context of Grid environments. This is primarily due to the absence of tools available in such languages for connecting to Grid middleware, such as Globus or UNICORE. Although skeletons based approaches do provide a useful prototyping tool for analysis. Our use of “operators” borrows from the use of transformation techniques in skeleton based approaches, albeit our focus is on the use of object-oriented techniques. Furthermore, our design patterns and operators are aimed at supporting workflow-based systems,

Structuring Workflow with Patterns and Operators

A workflow pattern encodes a commonly recurring theme in component composition. A pattern is therefore defined in an application independent manner, and particularly useful for configuring and specifying systems that are composed of independent subdomains. Patterns are aimed at capturing some generic attributes of a system – which may be further refined (eventually) to lead to an implementation. These are important requirements for Grid computing applications, which generally need to operate in dynamic environments. When using patterns, Grid application developers may deploy previously generated pattern templates as an initial step, and then refine these based on our operators. The use of pattern operators is also particularly important to deal with dynamicity, because they provide the capability to modify a pattern at run time. Furthermore, 3

Result pattern

and not focused on use within a particular programming environment. Structural Pattern Templates encode component connectivity, representing topologies like a ring, a star or a pipeline, or design patterns like Facade, Proxy or Adapter [14]. The possibility of encoding these structural constraints allows, for example, the representation of common software architectures in high-performance computing applications. For example, the pipeline pattern may be used in a signal processing application where the first stage may consist of a signal generator service producing data to a set of intermediate stages for filtering. Frequently, the last stage consists of a visualization service for observing results. Similarly, the proxy pattern, allows the local presence of an entity’s surrogate, allowing access to the remote entity. Behavioural Pattern Templates capture recurring themes in component interactions, and define the temporal and the (control and data) flow dependencies between the components. Generally, these applications involve distribution of code from a master, the replication of a code segment (such as within a loop), or parameter sweeps over one or more indicies. Behavioural patterns define the temporal (control and data) flow dependencies between the components. The separation of ‘structure’ from ‘behaviour’ allows the selection of the most adequate combinations for a particular application. We provide several behavioural patterns such as MasterSlave, Client-Server, Streaming, Peer-to-Peer, Mobile Agents/Itinerary, Remote Evaluation, Codeon-Demand, Contract, Observer/Publish-Subscriber, Parameter sweep, Service Adapter, and so on. For example, the Service Adapter pattern attaches additional properties or behaviours to an existing application to enable it to be invoked as a service [15]. The Master-Slave pattern, in turn, can be mapped to many parallel programming libraries, and represents the division of a task into multiple (usually independent) sub-units – and shares some similarities with the Client-Server pattern – although the control flow in the latter is more complex. Structural Operators support the composition of structural patterns, without modifying the structural constraints imposed on the pattern. These operators

Real Subject

Result pattern

Pattern Proxy

Increase( proxyPT, 2 )

Extend( proxyPT, element ) Real Proxy Subject

Real Subject

Proxy a

Proxy b

Proxy Proxy

Figure 1: The increase and extend structural operators. provide a user with a simple and flexible way to refine structural patterns. There are several structural operators such as increase, decrease, extend, reduce, rename, replace, replicate, embed, etc. For example in figure 1 it is possible to observe the result of applying the increase and extend operators to the Proxy pattern. The semantics of these operators can be found in [10]. Behavioural Operators are applied over the structural operator templates combined with the behavioural patterns after instantiating the templates with specific runnable components. Behavioural operators act upon pattern instances for execution control and reconfiguration purposes. Behavioural operators include: Start (starts the execution of a specific pattern instance), Stop (stops the execution of a pattern instance saving its current state), Resume (resumes the execution of a2 pattern instance from the point where it was stopped), Terminate (terminates the execution of a specific pattern instance), Restart (allows the periodic execution of a pattern instance), Limit (limits the execution of a specific pattern instance to a certain amount of time; when the time expires the execution is terminated), Repeat (allows the repetition of the execution of a specific pattern a certain number of times), etc. Both structural operators and behavioural operators can be combined into scripts which may be later reused in similar applications. When using Patterns and Operators, a user would develop their application in the following way: 1. Provide a structure definition by selection of structural patterns and if necessary their refinement through structural operators; 2. Provide a behaviour definition (i.e. definition 4

of data and control flows) by selection of behavioural patterns. 3. Manipulate actual data flows between components to support component execution by the use of behavioural operators. 4. If necessary, provide any dynamic reconfiguration of the workflow by behavioural and structural operators. We implement these patterns and operators within the Triana Problem Solving Environment and the DRMAA API. Currently, we do not provide any support to a workflow developer about which structural or behavioural pattern/operator would be most suitable in a given context. This decision is left purely for the developer. Consequently, the same workflow outcome can be achieved in multiple possible ways. We therefore do not provide any “methodology” for the selection of patterns or operators. A summary of the Figure 2: The Triana Graphical User Interface showpatterns and operators that are currently supported ing a ring and a star structural pattern can be found in Table 1.

3

units) for signal processing, mathematical calculations, audio and image processing, etc, and provides a wizard for the creation of new components, which can then be added to the toolbox. Structural Patterns appear as standard components that can be combined with other patterns or executable units. Triana provides both a composition editor, and a deployment mechanism to support this. The Pattern library provided within Triana treats patterns as “group units” (i.e. units made up of others). Each element within such group units is a “dummy” component (or a place holder) and can subsequently be instantiated with executables from the Triana toolbox. Hence, structural pattern templates are collections of dummy components that can be instantiated with other structural pattern templates or with executables. Every pattern has an associated ‘pattern controller/executor’ which enables refinement of structural patterns via operators, and allows enactment of the selected Behavioural patterns and operators. A Galaxy simulation application with Triana is illustrated in figure 3. The Galaxy formation example may be represented by a star pattern template, where

Implementation over the Triana Workflow Environment

The Triana Problem Solving Environment provides a composition environment to allow a set of components to be combined and subsequently a mechanism for distribution of components [16]. The composition environment is developed so that it can be used individually, allowing the composition environment to be used alongside a variety of different execution engines. An XML-based task graph is generated from the composition tool, and supports bindings for distributing components using Web Services or Peer-2Peer technologies (based on JXTA). Triana also requires the existence of a Triana execution environment to exist on each node that is to host a Triana service. This is also a significant difference from existing portal technologies [1]; existing systems assume the presence of a hosting environment on resources. Our prototype extends Triana [12], and allows developers to utilise a collection of pre-defined patterns from a library. Triana comes with components (called 5

Structural

Behavioural

Patterns Pipeline, Star, Ring, Bus Adapter,Proxy, Facade

Operators Rename, Replace, Increase, Decrease, Extend, Reduce, Replicate, Embed, Group/Aggregate

Master-Slave, Streaming, Client-Server, Peer-2-Peer, Mobile Agents/Itinerary, Remote Evaluation, Code-on-Demand, Contract, Observer/Subscribe-Publish, Parameter Sweep

IsEqual, IsRecursive, IsDisjoint, IsSubset, IsSuperset, IsComposite, IsInComposite, IsCompatible, IsOwner, Owner, OwnerGroup, AssignActivity, RemoveActivity, Start, Terminate, Stop, Log, Resume, Restart, Limit, Repeat, Steer, ChangeDependencies.Synchronise, ChangeDependencies.ChangeDataFlow, ChangeDependencies.ChangeControlFlow, ChangeDependencies.ChangeSharedDataDependencies

Table 1: Pattern Templates and Operator Summary

Figure 4: Combining a pipeline pattern template with a star pattern template

4

Figure 3: Final configuration: image processing in the “Galaxy Formation example”

Application Usage

The requirement of distributed data and computational resources has been demonstrated in a number of astrophysics applications, such as in the Astrothe nucleus contains the actions necessary to generate Grid [23] and GEO-LIGO [25] projects. Applications and control the animation execution, and the satel- in these projects require high-performance computlites represent image processing and analysis actions. ing support for undertaking data analysis on various Both the actions at the nucleus and at the satellites data types (text, images) and involve large quantities are supported by pipeline templates. Figure 4 shows of time-based data. Data capture and analysis adhow to embed a pipeline into the nucleus of the star ditionally involves scientists and instruments which (called DummyUnit). are geographically distributed – requiring a workflow 6

engine to coordinate execution across different sites. We demonstrate through two application scenarios the use of our patterns and operators within Triana.

4.1

Transformation and Visualisation Service

GEO-LIGO Data Analysis and Visualisation

Wave Detector

The GEO-LIGO project, in particular, is related to gravitational wave experiments [13] where data captured from laser interferometers (such as GEO600, LIGO and VIRGO) needs to be accessed and analysed. Similarly, galaxy and star formation using smoothed particle hydrodynamics generates large data files containing snapshots of an evolving system stored in 16 dimensions. Typically, a simplistic simulation would consist of around a million particles and may have a raw data frame sizes of 60 Mbytes, with an overall data set size of the order of 6 GBytes. The dimensions describe particle positions, velocities, and masses, type of particle, and a smoothed particle hydrodynamic radius of influence. After calculation, each snapshot is entirely independent of the others allowing distribution over the Grid for independent data processing and graphic generation. Figure 5 shows a simple example where a wave detector is producing data to be analysed and displayed by several services – allowing multiple scientists to compare the results of analysis. The example illustrates two visualisation services, and a transformation and visualisation pipeline. To configure this application example, a user first identifies the relevant structural patterns: a star pattern with four elements is created to represent the connections between the Wave Detector service (figure 5) and the transformation and visualisation services. In turn, to support the Transformation and Visualisation service (figure 5) the user creates a pipeline pattern with three elements. To obtain the right number of component place holders in both pattern templates (PTs), the user may apply the Increase() or the Decrease() operators. Subsequently, the user combines both pattern templates by embedding the pipeline pattern into one of the star pattern element. Finally, suitable components are instantiated into these elements. This example is implemented in Triana by mod-

First Transformation

Second Transformation

Visualisation

Visualisation Service (type A)

Visualisation Service (type B)

Figure 5: Analysing gravitational wave data

elling the output of the gravitational wave detector by a component which generates a wave with parameterisable amplitude, type (sawtooth, sinusoid) and frequency. The first configuration step is the creation of the two required pattern template: a Star and a Pipeline. In order to create a Star PT, the user drags and drops the DrawStar unit from Triana’s Patterns toolbox and initialises it (figure 6). In this case, it is necessary to increase the number of elements (figure 7). For the creation of the Pipeline PT, the user selects the DrawPipeline unit and repeats the process.

Figure 6: Initialisation of a Star pattern template

Two PTs with Dummy Units representing component place holders are added in the workflow composition tool. The DummyUnit may be instantiated to a structural pattern template, or to an executable 7

example by the Wave unit. Two graphical displaying units for rendering input signals are selected to represent the visualisation services: the SGTGrapher and the Histogrammer. The selected transformation services for instantiating the first two pipeline stages are the Gaussian unit (which adds noise to the data generated by the Wave) and the FFT unit (which performs a Fast Fourier transform).

Figure 7: Addition of one satellite to a Star PT

component from the Triana toolbox. The next configuration step is to structure the two templates so that the Transformation and Visualisation service is connected to the Wave Detector service. The Embed operator is used on the Star PT to associate the Pipeline PT with one of the DummyUnits. Finally, the user instantiates the pattern templates with the necessary executable components from the Triana toolbox. An example can be seen in figure 8.

Figure 9: Final configuration.

4.1.1

Execution and Configuration Scripts

The data analysis described above may also be automated through a script – which demonstrates a more powerful use of the behavioural operators. For instance, where a scientific instrument is constantly producing data, it would be useful to restart the data analysis application periodically to automatically analyse this data. To achieve this, we develop a simulation script that includes the “Restart” Behavioural operator, resulting in re-launching of the execution every 20000 milliseconds. The restarting operation can then be aborted at any time by calling the “Terminate” Behavioural operator. The script is Figure 8: Instantiation of a Unit. created with a text editor in declarative style, and consists of references to the behavioural and structural operators – applied on instances of patterns Figure 9 shows the final configuration after all the within the Triana workflow. Each line in the script template slots have been instantiated. The gravita- is used to launch a separate activity to manipulate tional wave detector (figure 5) is represented in this the workflow, and the script interpreter blocks un8

til a given activity completes. Elements such as Instantiate are used to map executable component instances to DummyUnits within a pattern template. It is also possible for the script to only partially specify the contents of DummyUnits, allowing a user to manually associate the remaining DummyUnits using a graphical editor. The interpreter of the script is referred to as the “Pattern Controller” and work alongside the workflow enactor in Triana.

1: Initialize 2: Increase 1 3: Create Pipeline TransfVisSrv 4: RunStructuralScript TransfVisSrv 5: Instantiate DummyUnit /toolboxes/SignalProc/Injection/Gaussian.xml 5: Instantiate DummyUnit1 /toolboxes/SignalProc/Algorithms/FFT.xml 6: Instantiate DummyUnit2 /toolboxes/SignalProc/Output/SGTGrapher.xml 7: EndStructuralScript 8: Embed TransfVisSrv DummyUnit1 9: Instantiate DummyUnit /toolboxes/SignalProc/Input/Wave.xml 10:Instantiate DummyUnit2 /toolboxes/SignalProc/Output/Histogrammer.xml 11:Instantiate DummyUnit3 /toolboxes/SignalProc/Output/SGTGrapher.xml 12:Restart 20000

Figure 10: Regular data production by the gravitational wave detection service

The script is run by the pattern controller of a Star PT which performs the following steps (the number references relate to the lines in the script above): a) creates the Star (1); b) adds one satellite to the nucleus (2); c) creates a Pipeline PT (named TransfVisSrv – line 3) and instantiates all its slots (called DummyUnit(i) – lines 4-7); d) embeds the Transformation and Visualisation service (TransfVisSrv) into the first satellite (DummyUnit1 – line 8); e) instantiates the rest of the empty slots of the template (lines 9-11); and, apply the behavioural operator Restart, in order to execute the instantiated Star every 20000 milliseconds (12). The behaviour relies on the use of the Streaming Behavioural pattern supported by default.

Figure 11: Producing different waves every 10 seconds

4.1.2

Production Use

The analysis of gravitational waves data example described in sub-section 4.1.1 does not take into account 9

the regular production of data by the wave detector. To simulate such a situation, we describe a configuration (figure 10) where the Count tool produces different values, at each execution, to the frequency parameter of the Wave tool. Frequency starts at a value of 100Hz, and it is increased at each execution by 100Hz upto a maximum of 4000Hz. Consequently, the Wave tool produces different waves which can be visualised in the SGTGrapher tool.

Figure 14: The debug window showing the execution of the Terminate Behavioural operator

Figure 12: Two different waves produced at two con- time by applying the Terminate Behavioural operator secutive execution steps (figures 13 and 14). The Count tool remembers the intermediate value for the frequency parameter of the last execution. Therefore, the user may, for example, In the same way as described in section 4.1.1, the repeat the execution a certain number of times, by Restart Behavioural operator can be applied to the restarting from the previous saved frequency value. simulation in order to generate a sequence of different Figure 15 (on page 15) shows the selection of the waves at a fixed time period (10 seconds). Figure 12 Repeat Behavioural operator for repeatedly launching shows two consecutive snapshots of the SGTGrapher the execution of the simulation, by a certain number tool. of times (in this case, 10 times). In this way, the user can see the result after each consecutive iterations. The debug window in the figure shows that the Repeat operator was repeatedly called. The second simulation involving frequent data production could also be achieved by modifying the configuration in figure 9 – containing a pipeline PT as the nucleus of the star PT – connecting the Count and Wave components. Modification in this example could be achieved through the use of structural operators, for instance, the Increase structural operator Figure 13: Selection of the Terminate Behavioural could be used to extend the pipeline PT, thereby prooperator viding another DummyUnit to host the Count component. An alternative would be to use the Replace operator to swap the existing pipeline PT with anThe automatic re-execution can be stopped at any other one containing two DummyUnits instead of one. 10

4.2

Database Access

As in astrophysics, other scientific experiments in Environmental and Genetic Sciences, Nuclear Physics, or Earth/Ocean Surface Topography also require the distributed storage of data across different organisations, and their subsequent manipulation by many users. One common characteristic of such applications is database enquiry, where data may be spread over several databases. Such databases may be populated through simulation engines or directly via scientific instruments producing data in real time. We demonstrate how the Facade PT may be used to access several databases within a workflow, where the databases may either be replicated, or store different types of data that needs to be aggregated. The Facade PT therefore provides a uniform interface to send a query to these databases. Behaviour associated with such a pattern may involve the redirection of queries to a particular database (based on content or request type), or based on Quality of Service issues, such as response time.

sociations of databases to data analysis/processing tools. In terms of behaviour, a simple version of the Client/Server Behavioural Pattern represents the data and control flows between the Requester (client) and the Facade (server): the server analyses the requests and redirects them to the subsystems which, in turn, produce data. Additionally, the Master/Slave represents another eligible Behavioural patterns to represent the data and control flows between the Facade and the two sub-workflows.

Figure 17: Internal Structure of Pipeline subworkflows connected to Facade Pattern

The sub-workflow represented by the Pipeline pattern in figure 17 contains, as its first element, the DBExplore component to query a database using SQL [?] requests. The output is processed by the MakeCurve and displayed with the GraceGrapher component. The Pipeline1 sub-workflow provides access to a different or a replicated database combined with another visualisation tool (Histogrammer) for output data analysis. Data and control flows in both pipelines is provided by the Streaming Behavioural pattern.

Figure 16: Connecting the Facade pattern to two subworkflows

Figure 16 shows the Pipeline Structural pattern connecting the client application (Requester) to the Facade Structural pattern in Triana. The latter redirects requests to two subsystems already instantiated with two structural patterns: Pipeline and Pipeline1. Both pipelines configure possible as-

4.2.1

Real Time Analysis

11

To access real time data, instead of that in a prepopulated database, a user can re-configure the operation in section 4.2 to interact with a Real Time Engine (RTE). To achieve this, the Stop Behavioural

pattern is applied to the Pipeline Structural pattern involved, followed by an application of the Extend Structural Operator to the Facade Structural pattern – to redirect requests to a RTE in addition to the database. As a result, the existing Facade becomes a sub-workflow of the new Facade pattern, and the pipeline containing the RTE becomes the other subworkflow. In this way, behaviour associated with the outmost Facade may redirect requests to the innermost Facade pattern or to the RTE, or both. The Client/Server Behavioural Pattern is used to define the data and control flows between the outmost Facade and its own sub-workflows. Figure 18 illustrates the resulting configuration within Triana.

tion of these, as they are provided as a collection of scripts that need to be configured by a user prior to execution. Current implementation work is focused on the mapping of behavioural patterns over currently available APIs, such as Java CoG Kit [9] and DRMAA [8]. In this section, we describe how structural and behavioural operators are mapped to the DRMAA API – which can be used with the Sun Grid Engine. DRMAA provides a generalised API to execute jobs over Distributed Resource Management Systems (DRMSs). It includes common operations on jobs like termination or suspension. A job is a running application on a DRMS and it is identified by a job id attribute that is passed back by the DRMS upon job submission. This attribute is used by the functions that support job control and monitoring. DRMAA API uses an interface definition language (with IN, OUT and INOUT parameters), and also provides support for handling errors (via error codes). Figure 19 shows the steps to configure and execute an application based on the pipeline structural pattern combined with the dataflow behavioural pattern. In step 1, a user defines a pipeline pattern template with three elements, and in step 2 adds an extra element to the pattern template. In step 3 the dataflow behavioural pattern to be applied to all the elements of the pattern template is selected. An entity at the pattern level is defined, the pattern controller/executor, responsible for enforcing the selected behavioural pattern at each element. In step 4 all component place-holders are instantiated with components (Applications) that may represent a unit Figure 18: Re-direction to a RTE in Triana or a group of units organized in a workflow. Step 5 represents the application of the behavioural operators to a pattern instance. The operators are supported by functions in the DRMAA 5 Mapping to the DRMAA API that manage the execution of the Applications by a resource manager. The execution of each AppliAPI cation is supported by a job (running executable) in Execution management associated with the be- the resource manager. Figure 19 shows how behavioural operators act havioural patterns and operators needs to be supported via an enactment engine. To achieve this, upon pattern instances – essentially pattern temwe map our behavioural operators using a resource plates combined with some behavioural pattern and management API. Hence, behavioural patterns are instantiated with executable applications. Applicaimplemented over the run-time system used to exe- tion execution using DRMAA require the definition cute the components. There is no visual representa- of attributes like the application’s name, its initial 12

Start, Stop, Resume, Terminate, Limit, Restart,...

5−

Creation of jobs that support the execution of the applications at a resource manager. Application of the behavioural operators to the pattern instance.

4−

Instantiation of the pattern elements (component place−holders) with executable applications (App1, ..., App4)

Pattern Instance Pattern controller Functions of a Distributed Resource Manager

job1

job2

job3

job4

Pattern Instance Pattern controller

App1

App2

App3

Pattern Template Pattern controller

gram. As an example, we show how a pipeline pattern can be mapped to DRMAA: Element pattern elements[MAX ELEMS] – contains the Elements that compose a specific pattern instance. Similarly, job identifiers[MAX ELEMS] represents the identifiers returned by the drmaa run job routine for jobs created to support pattern elements. The order of the activities is preserved. DRMAA variables frequently used: INOUT jt is a job template (opaque handle), and INOUT drmaa context error buf contains a context-sensitive error upon failed return. Start Operator – to initiate execution of Pipeline Elements.

App4

for(int index = Pipeline.pattern_elements.length -1 ; index >= 0; index --) { /*launch all activities in the pipeline*/ int ret = drmaa_allocate_job_template( jt, drmaa_context_error_buf ); process_error( ret, drmaa_context_error_buf ); define_attributes( jt, Pipeline.pattern_elements[index] ); /* Pipeline.startTime defines the time at which all elements in the pipeline instance should start running. */ ret = drmaa_set_attribute( jt, drmaa_start_time, Pipeline.startTime, drmaa_context_error_buf); process_error( ret, drmaa_context_error_buf ); /* run job */ ret = drmaa_run_job( job_id, jt, drmaa_context_error_buf ); process_error( ret, drmaa_context_error_buf ); }

3− Application of a Behavioural pattern −− dataflow model: control dependencies data flow dependencies The "pattern controller" will enforce the behavioural pattern supported at each pattern element by the wrapper

Increase(P,1)

2− Application of a structural pattern (increase)

Pattern Template

1− Structural Pattern (structural composition)

Figure 19: The necessary steps to configure and execute an application using patterns and pattern operators. Please read the figure starting from the bottom.

Repeat Operator – in this instance a single operator is used to re-execute an entire pattern instance a certain number of times ( “n” in the code).

count = 0; input parameters, the necessary remote environment for(int count < n; count++) { that has to be set up for the application to run, Start( Pipeline ); and so forth. These attributes are used to explic- /* wait for all the jobs that compose the pipeline to terminate */ itly configure the task to be run via a particular drmaa_synchronize( resource manager. Although DRMAA has the noPipeline.job_identifiers, timeout, 0, tion of sessions, only one session can be active at a drmaa_context_error_buf ); time. A single DRMAA session for all the opera- /*timeout is bigger than all job execution times */ } tors is assumed. Hence drmaa init and drmaa exit routines are called, respectively, after the pattern instance is created and in the end of the script pro-

13

6

Conclusion and Future Work

composed workflows – using planning approaches, and workflows that are constructed manually by a user. The planning approach requires an application developer to semantically annotate components and develop complex models that represent possible interactions between such components. Often this is a challenging task and hard to achieve in practise. On the other hand, allowing a user to construct a workflow manually ignores the number of common interconnection structures that existing within scientific computing applications. Using a combination of both structural and behavioural operators, we are able to address both build and run time characteristics associated with workflow management.

The extension of a Problem Solving Environment (Triana) with Patterns and Operators is described. Composition is achieved using a pattern extended graphical interface provided with Triana – whereas execution is managed by mapping Operators to the DRMAA API. We believe a Pattern based approach is particularly useful for reuse of component libraries and for mapping applications to a range of different execution environments. The DRMAA API was selected because of the significant focus it has received within the Grid community – and the availability of commercial resource management systems (such as Grid Engine from Sun Microsystems) that make use of it. We are also investigating alternatives to DR- Acknowledgement: We would like to thank Ian MAA (such as Java CoG) [9] – primarily as current Wang and Matthew Shields of the Triana Group in versions of DRMAA are aimed at executing batch Cardiff for their help with Triana internals. jobs. With the emerging focus on Web Services in the Grid community, the DRMAA API has also lagged behind other equivalent developments (such as the References Java CoG kit). [1] M. Li and M. A. Baker, “A Review of Grid Portal Technology”, in Book, “Grid Computing: Software EnviPatterns and operators provide a useful extension ronment and Tools” (ed: Jose Cunha and O.F.Rana), to existing workflow engines, as they enable the capSpringer Verlag, 2006 ture of common software usage styles across differ[2] J. Yu and R. Buyya, “A Taxonomy of Workflow Manent application communities. The pipeline and star agement Systems for Grid Computing”, Journal of Grid structural patterns, for instance, are commonly found Computing, Vol:3, pp 171-200, 2006. in scientific applications (such as integrating a data [3] J. Blythe, E. Deelman, Y. Gil, C. Kesselman, A. Agarwal, G. Mehta and K. Vahi, “The Role of Planning in Grid source with a mesh generator, followed by a visuComputing”, 13th Int. Conf. on Automated Planning and aliser). Describing such compositions in a more forScheduling (ICAPS), Trento, Italy, 2003. mal way (as we have attempted to do here), will enable practisioners to identify common software li- [4] G. Fox, D. Gannon and M. Thomas, “A Summary of Grid Computing Environments”, braries and tools. This is particularly important as Concurrency and Computation: Practice and software that performs similar functionality is availExperience (Special Issue), 2003. Available able from a variety of different vendors. Providing at:http://communitygrids.iu.edu/cglpubs.htm the right balance between tools that require users to [5] B. Wydaeghe, W. Vanderperren, “Visual Composipossess programming skills, and those that are based tion Using Composition Patterns”, Proc. Tools 2001, on a visual interface is difficult to achieve. By comSanta Barbara, USA, July 2001. bining the visual interface of Triana with more ad- [6] ObjectVenture, The ObjectAssembler Visual vanced patterns and operators, we are attempting to Development Environment, Java Sys.-Con. enhance the functionality offered through (a variety Journal, June 1, 2003. Article available at: of) existing workflow tools. Full usage of these ideas http://java.sys-con.com/read/37562.htm. Last by the applications community is still a future aim accessed: July 2006. for us. [7] The GridLab project. See Web site at: Patterns and operators also provide an important http://www.gridlab.org/. Last accessed: Janmiddle ground between mechanisms to automatically uary 2004. 14

[8] Habri Rajic, Roger Brobst et al., “Distributed Resource Management Application API Specification 1.0”. Global Grid Forum DRMAA Working Group. See Web site at: http://drmaa.org/wiki/. Last accessed: July 2006.

[19] N. Russell, W.M.P. van der Aalst, A.H.M. ter Hofstede, D. Edmond, “Workflow Resource Patterns: Identification, Representation and Tool Support”, Proceedings of CAiSE, pp 216–232, 2005. [20] M. Pistore, F. Barbon, P. Bertoli, D. Shaparau and P. Traverso, “Planning and Monitoring Web Service Composition”, Workshop on Planning and Scheduling for Web and Grid Services, held alongside 14th International Conference on Automated Planning and Scheduling (ICAPS 2004), Whistler, British Columbia, Canada, June 3-7 2004.

[9] Gregor von Laszewski, Ian Foster, Jarek Gawor, and Peter Lane, ”A Java Commodity Grid Kit,” Concurrency and Computation: Practice and Experience, vol. 13, no. 8-9, pp. 643-662, 2001, http:/www.cogkits.org/. [10] M.C.Gomes, O.F.Rana, J.C.Cunha “Pattern Operators for Grid Environments”, Scientific Programming Journal, Volume 11, Number 3, 2003, IOS Press.

[21] N. Milanovic and M. Malek, “Current Solutions for Web Service Composition”, IEEE Internet Computing, Vol:8, Issue 6, pp 51-59, Nov.-Dec. 2004.

[11] M.C.Gomes, J.C.Cunha, O.F.Rana, “A Patternbased Software Engineering Tool for Grid Environments”, Concurrent Information Processing and Computing proceedings, NATO Advanced Research Workshop, Sinaia, Romania, June 2003, IOS Press.

[22] D. Roman, M. Dimitrov and M. Stolberg, “D14v0.1. Choreography in WSMO”, DERI Working Draft 17 July 2004. Available at: http://www.wsmo.org/2004/d14/v0.1/20040717/. Last accessed: July 2006.

[12] I. Taylor et al., “Triana” (http://www.trianacode.org/). Triana is the workflow engine for the EU GridLab project (http://www.gridlab.org/). Last Visited: January 2004. [13] The GEO600 project. See Web http://www.geo600.uni-hannover.de/. cessed: July 2006.

site Last

[23] “AstroGrid and the Virtual Observatory” Project. Details at: http://www.astrogrid.org/. Last accessed: July 2006. [24] J. Chen and Y. Yang, “Multiple States based Temporal Consistency for Dynamic Verification of Fixedtime Constraints in Grid Workflow Systems”, Concurrency and Computation: Practice and Experience, 2006.

at: ac-

[14] E. Gamma, R. Helm, R. Johnson, J. Vlissides, “Design Patterns: Elements of Reusable Object-Oriented Software”, Addison-Wesley, 1994.

[25] “Laser Interferometer Gravitational Wave Observatory” Project. Details at: http://www.ligo.caltech.edu/. Last accessed: July 2006.

[15] O. F. Rana, D. W. Walker, “Service Design Patterns for Computational Grids”, in “Patterns and Skeletons for Parallel and Distributed Computing”, F. Rabhi and S. Gorlatch(Eds), Springer, 2002.

[26] C. A. Herrmann and C. Lengauer, “Transforming Rapid Prototypes to Efficient Parallel Programs”, book chapter in “Patterns and Skeletons for Parallel and Distributed Computing”, (Fethi A. Rabhi and Sergei Gorlatch (Eds)), Springer Verlag, 2002.

[16] I. J. Taylor, M. S. Shields, I. Wang and O. F. Rana, “Triana Applications within Grid Computing and Peer to Peer Environments”, Journal of Grid Computing, Vol:1, No:2, pp 199-217, 2003.

[27] Murray I. Cole and Andrea Zavanella, “Coordinating Heterogeneous Parallel Systems with Skeletons and Activity Graphs”, Journal of Systems Integration, 10(2), pp 127–143, 2001.

[17] W.M.P. van der Aalst, A.H.M. ter Hofstede, B. Kiepuszewski, and A.P. Barros. “Workflow Patterns”. Distributed and Parallel Databases, 14(3), pages 551, July 2003.

[28] S. Gorlatch, “Extracting and implementing list homomorphisms in parallel program development”, Science of Computer Programming, 33(1), pp 1–27, 1998.

[18] A. Slominki, D. Gannon and G. Fox, “Introduction to Workflows and Use of Workflows in Grids and Grid Portals”, Global Grid Forum 9, presented in “Grid Computing Environments” session, Chicago, October 2004.

15

Figure 15: Applying the Repeat Behaviour Pattern for launching the execution ten times

16