COWS: A Distributed Transactional Workflow

0 downloads 0 Views 522KB Size Report
Jul 7, 2000 - system is constructed with distributed transactional objects to achieve the ... management system because many benefits can be obtained by.
IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000

1514

PAPER

ICU/COWS: A Distributed Transactional Workflow System Supporting Multiple Workflow Types Dongsoo HAN† , Jaeyong SHIM†† , and Chansu YU† , Nonmembers

SUMMARY In this paper, we describe a distributed transactional workflow system named ICU/COWS, which supports multiple workflow types of large scale enterprises. The system aims to support the whole workflow for large scale enterprises effectively within a single workflow system and the system is designed to satisfy several design goals such as availability, scalability, and reliability. Transactional task and special tasks such as alternative task and compensating task are developed and utilized to achieve the design goals in task model level and the system is constructed with distributed transactional objects to achieve the design goals in distributed system environment. In this paper, structured ad hoc workflow is defined as a special type of ad hoc workflow that should be automated by workflow management system because many benefits can be obtained by automating it and connector facility is proposed as a means to support structured ad hoc workflow effectively. Some characteristics of a workflow system can be identified by monitoring the system behavior on different conditions like workloads or system configurations. An early version of the system has been implemented and the performance data of the system is illustrated. key words: distributed workflow system, multiple workflow types, execution object, connector facility

1.

Introduction

In large scale enterprises conducting critical business processes, various kinds of tasks are connected and coordinated to achieve the goals of the organization [33], [42]. In computerized environment such tasks range from manual tasks performed on the web browser to automatic tasks updating database through transactional operations. Usually it takes very long time for a process instance to complete all of its steps to achieve given goals in large scale enterprises. In such environments, application servers for each step do not always exist in the same site with the workflow server and the bulk of workflow instances in processing may co-exist at the same time. And even worse, the environment and the business logic could be changed during the execution because of the long duration of the processing time. Conventional centralized workflow systems are not suitable to support the aforementioned situation effectively because each activity step should be controlled by the centralized workflow server which is often inevitably located separate from the application servers Manuscript received September 8, 1999. Manuscript revised February 17, 2000. † The authors are with the Faculty of Information and Communication University (ICU), Taejon, 305–760 Korea. †† The author is with the doctoral course, Information and Communication University (ICU), Taejon, 305–760 Korea.

performing the activity steps. We found that in large scale enterprises application servers already in use cannot be easily relocated for a particular situation. Since network disconnection is frequent in WAN (Wide Area Networks) and the system may be unreliable because of transient network disconnection, workflow servers are more reliable when placed near application servers that are already deployed. That is, multiple workflow servers collaborating in a distributed fashion are more desirable for workflow instances that invoke applications in distributed application servers. Even when the workflow servers are physically distributed in a distributed workflow system, the whole system should still be viewed as a single workflow system to the users. By maintaining the multiple workflow servers as a single system all whole workflow instances can be readily monitored and the system can cope with partial system failures capably. When distributed multiple workflow servers work together as a single system, an inconsistent internal state is frequently evident viewed from the perspective of the whole system because the state of a workflow server cannot be communicated to the other servers immediately. This situation happens more frequently in the distributed system than the centralized system. As a workflow system responds to external events based on the internal system state information, a consistent internal state maintenance of the workflow system is indispensable to react to the external events correctly. Special care is required to make the distributed multiple workflow servers maintain consistent internal state. Transactional operations or services can be used to keep the system consistent. For example, the distributed workflow system can keep the status consistent by treating any subsequent operations, which might invoke an inconsistent state in between operations, as a bundled single transactional operation using the transactional services. Workflow systems have been classified by the characteristics of the workflow instances supported by the workflow systems. Production workflow follows defined workflow templates that are rarely changed and the processes often update critical database information through transactional operations [35]. In ad hoc workflow [43], the processing path is so transient and changeable that sometimes even the process templates cannot be defined. Ad hoc workflow can be further classified

HAN et al.: ICU/COWS

1515

into inherently ad hoc workflow and structured ad hoc workflow. Inherently ad hoc workflow is the workflow that no pattern that can be derived from the activities in workflow. On the other hand, structured ad hoc workflow is exposed and executed as ad hoc workflow, but its activities are connected by some rules and regulations and its connection path is rarely changed. The delivery of work items in structured ad hoc workflow is similar to that in inherently ad hoc workflow. But if a structured ad hoc workflow instance is completed, a flow can be derived and the derived flow can be shared by other workflow instances of the same workflow class because the same rules are applied to the workflow instances in the same workflow class. Decision approval procedure is one of the typical example of structured ad hoc workflow. Conventional workflow vendors distinguish the workflow types they support rather clearly but the real world workflow of large scale enterprises often appears in the form of a combination of the production, administrative, and structured ad hoc workflow. Thus a workflow system for large scale enterprises should support multiple workflow types within a single workflow system. In this paper, we designed a distributed transactional workflow system supporting multiple workflow types to handle business processes of large scale enterprise. The main purpose of the system is to support the whole workflow of large scale enterprises effectively within a single workflow system. The system is designed to meet several design goals such as availability, scalability, and reliability, and the system provides special functions for the ad hoc workflow treatment at the same time. Some characteristics of distributed workflow systems can be identified by monitoring the system behaviors on different workload conditions and system configurations. An early version of the system has been implemented and the performance data on the system is illustrated in this paper. The paper is organized as follows. In Sect. 2, we describe the design goals in some more detail. In Sect. 3, we describe the task models supported in our system and Sect. 4, we explain the transactional features of the system. In Sect. 5, system architecture and the components of the system are described. In Sect. 6, we introduce some features to support ad hoc workflow and in Sect. 7, we show the performance result on the early version of the system. Related work is described in Sect. 8 and we draw conclusions in Sect. 9. 2.

Design Goals

In this section we describe how the design goals of ICU/COWS support critical business processes of large scale enterprises. Firstly, availability is the one of the most prominent feature of ICU/COWS. Since ICU/COWS has to accommodate continuously invoked workflow instances, it should keep serving regardless of

the situation encountered. Although by using a highcost reliable hardware system can achieve high system availability, this approach is more suitable for a centralized system. Thus we achieve high availability by running multiple servers. The whole system gives a single system view to the users and they provide high availability because they can keep serving as long as one server is running. Since the servers are distributed and located on the same or close site to the application servers, they can achieve high system reliability as well as availability. Secondly, ICU/COWS is designed to support scalability. Scalability in a workflow system implies that new servers or new functions can be added easily without disrupting the existing servers or functions. When a workflow system is scalable, it can cope efficiently with overloaded workflow instances and the addition of new functions. To support scalability, all the servers are kept symmetrical with each other. Each server is ready to work together with a newly added server and the server components are implemented as CORBA objects. New functions for the server can be readily added because the functions are implemented as methods of the CORBA objects. Thirdly, ICU/COWS establishes a reliable workflow system. In a workflow system, reliability can be enhanced by equipping it with certain functions and facilities. With respect to modeling aspects, a reliable workflow system should provide some facilities that define reliable processes by providing additional features such as alternative paths in case of failures or compensating tasks for the recovery of already completed tasks. System aspects designed to provide a reliable workflow system include maintaining a consistent internal state as long as possible. Since a workflow system responds to external events based on the internal system state information, consistent internal state maintenance is indispensable in order to react to the events correctly. ICU/COWS maintains a consistent internal state using CORBA transaction services in changing the system states and passing data between the different steps. Failure recovery is another function provided for a reliable workflow system. ICU/COWS basically copes with failure recovery by making core workflow components as transactional objects and providing both forward and backward recovery mechanisms [30], [32], [33]. Fourthly, dynamic reconfiguration is supported in ICU/COWS. Since the workflow instances which ICU/COWS supports may take a very long time (e.g., several months), the execution path may need to be changed during the execution. Dynamic reconfiguration can be classified into two levels, that is, process instance-level dynamic reconfiguration and process template-level dynamic reconfiguration [18], [24], [44]. Process instance-level dynamic reconfiguration changes only the corresponding process instance and has no influence on the other process instances at

IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000

1516

all. Process template-level dynamic reconfiguration changes the process template during the execution. In ICU/COWS both levels of dynamic reconfiguration are supported. Finally, in large scale enterprises different kinds of workflow systems may have to be connected with each other because organizations may automate their processes using different workflow systems. ICU/COWS supports not only interoperability between the workflow systems by following the WfMC standard interface 4 but also interoperability between the organizations by installing the connector facility in each department. This connector plays the central role for ICU/COWS to act as a multi-purpose workflow system. Even though ICU/COWS largely has the features of a production workflow system, by using the connector and the dynamic reconfiguration facilities, one can handle ad hoc workflow or combined workflow easily. 3.

Task Models in ICU/COWS

In this section we introduce the task models used in ICU/COWS. In ICU/COWS, a workflow application is modeled as a collection of tasks and a task is regarded as the unit of activity within a workflow application. Therefore, the structure of a workflow application is expressed through interdependencies among the component tasks. A task usually has one or more input and output data sets. Thus, a task may terminate its processing with one of output data sets. In ICU/COWS, four different kinds of normal tasks for a normal business process are two kinds of special tasks for handling failure. In the following we briefly describe these tasks. • Normal Tasks – Human Task : Human intervention is required to process the task, which is managed by the worklist handler. – Transactional Task : This includes all tasks that can define their compensating task and applies to both database applications and general purpose applications. – Non-transactional Task : This task can access data controlled by a resource manager which has no transactional properties such as a file system. In general, this kind of task cannot be safely recovered from failure. – Compound Task : This task is used to model an activity consisting of multiple tasks and logically related works. • Special Tasks – Alternative Task : This task takes care of a logical failure or the exception of a normal automatic task, whether it is a transactional task or non-transactional task.

– Compensating Task : This task is used to undo what was already done by an incomplete task in the vortex of a system failure. If a task is a transactional task, the corresponding compensating task must be defined with the task. Based on the task model proposed above, diverse types of workflow applications can be defined satisfactorily. In addition, the task model provides a foundation to cope with any exceptional situations that are anticipated in the design stage of workflow applications by providing measures against possible failure during a process execution. 4.

Transactional Features in ICU/COWS

Transactions have been widely used in the context of (usually distributed) applications that require the support of reliable execution. In the distributed environment, two aspects of a workflow system controlling a set of business processes utilize the transactional concept. One goal is to deliver the events without crash or loss between the distributed workflow components, and the other is to maintain the workflow status safely against various fault causes on the runtime and to recover the workflow system when a failure has occurred. However, it is inappropriate to apply conventional transaction technologies, which have been used for the database area, directly to a workflow system. Whereas the conventional transaction is a data-oriented activity in a homogeneous environment, workflow is a process-oriented activity in a HAD (heterogeneous, autonomous, distributed) environment. Workflow execution also depends on additional factors, such as human, organization and application, and is usually more long-lived than that of a conventional transaction. Therefore, we only borrow some of the transactional concepts from DBMS, TP Monitor and Advanced Transaction Models (ATMs), which lead the transaction technology, and adapt it to the workflow environment appropriately. In ICU/COWS, transactional features are largely utilized to enhance the correctness of workflow system reactions to events and to cope with system failure effectively. For example the nested transaction concept is used to correctly deliver the data and control information between the workflow components in the distributed environment. In addition, by assigning alternative tasks for a task at workflow build time, the workflow can continue its execution of the task even when some failures are invoked. This may be very useful for a long-lived (i.e., several weeks or months) workflow. As a recovery policy, ICU/COWS provides a partial backward and re-executing mechanism based on the compensation and log. When a failure happens to some business processes, withdrawing the workflow all the way back to the starting point is sometimes extremely inefficient. It is preferable to withdraw only the affected

HAN et al.: ICU/COWS

1517

part of the workflow when the failure and/or the workflow application permit it. Based on the data dependency sets and the control condition, the recovery scope for compensation is defined to minimize recovery overhead in the recovery mechanism in ICU/COWS. The human task or non-transactional task, which usually does not have a corresponding compensating task, is manually recovered through human intervention, along with some reports of execution to the workflow administrator or related persons. After finishing the recovery procedure, workflow would be re-executed from the point designated by the workflow developer at the workflow build time. 5.

System Architecture and the Components of ICU/COWS

This section describes the system architecture and the components of ICU/COWS which is designed to support multiple workflow system architectures within the system. ICU/COWS is developed on the CORBA environment using WTS (Workflow Transaction Services) which is specially devised for ICU/COWS. WTS can be considered as a kind of extension of CORBA OTS (Object Transaction Service) to enable convenient workflow system development on it. Details about WTS specifications will be treated in other papers in the near future. The components of ICU/COWS include TMIF (Task Managing Instance Factory), GTMIG (Global Task Managing Instance Generator), Simulator, Process Builder, Admin/Monitoring Service, and Worklist Handler. Every component is built as CORBA objects and the details of the core module will be described in the following subsections. Figure 1 shows the software architecture of ICU/COWS.

Fig. 1

Software architecture of ICU/COWS.

5.1 TMI and GTMI Before we explain the TMI and GTMI, we introduce the ExecObject which is the abstract object for TMI and GTMI object. ExecObject is composed of the common attributes, status information, and methods for both TMI and GTMI objects. Operations to create, manage, and access to history information is also included in the ExecObject methods. The TMI (Task Managing Instance), which manages the tasks of each activity, is created from a TMI object which is inherited from an ExecObject object. It either sends a work item to a worklist handler or invokes an application through the application agents. Application agents access the workflow relevant data via TMI. The TMI also monitors the status of the invoked tasks by communication with worklist handlers or application agents. When a task is completed the TMI sends the start event to the next TMI, and the TMI which receives the event starts the task. In this way, control is transmitted as defined at process build time. Five different kinds of TMI objects exist in ICU/COWS to service tasks for workflow activity. They are implemented by inheriting the TMI object. RouteTMI is a TMI to control routings such as split or join. It can invoke the script interpreter to evaluate a condition statement when necessary. ApplicationTMI is created when the task has to invoke a specific application. ApplicationTMI handles both automatic and manual tasks. For an automatic task it invokes the application automatically and for a manual task it sends a work item to the worklist handlers. SubflowTMI manages subprocesses. It is implemented by inheriting both from the Requester object which can ask to create a new process and the ImplementTMI object because it should be a TMI and be able to create new workflow process at the same time. In the ImplementTMI object, methods to connect worklist handlers and to manipulate workflow relevant data are implemented. LoopTMI implements a set of TMIs constructing a loop structure. Attributes to represent a set of TMIs and attributes to represent loop conditions are prepared in the LoopTMI. NullTMI, a TMI for special purposes, is used either to process start and end points of a workflow process or to send a request to users directly without application intervention. GTMI controls the processing of the global process instance from the creation of a process instance to the end of the process instance. To create a process instance it first reads the workflow schema from the workflow schema database and then allocates spaces for workflow relevant data and prepares data structures and stores workflow process information. Once the information is stored, through the initial requester GTMI asks GTMIG to create TMIs of the process. When all

IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000

1518

the TMIs are created for the process, GTMI starts the process instance by triggering the first TMI. During the processing it either receives status reports from the TMIs or suspends TMIs transiently to handle requests, such as dynamic reconfiguration, from the administrator. Although a TMI has to report its status to its GTMI, a TMI can continue its execution even if the GTMI crashes because it does not check whether the GTMI has received its report or not. This approach is effective to achieve availability in a distributed environment where the network disconnection is frequent. When the last TMI reports its end of processing, GTMI erases all the TMIs that have not disappeared and deallocates space for the process including itself. 5.2 GTMIG and TMIF Each server is equipped with one GTMIG and one or more TMIFs, respectively. GTMIG creates GTMI by the request from the initial requester and then creates TMIs by the request from GTMIs. But the actual creation of the TMIs and GTMIs is carried by TMIFs chosen by GTMIG. Thus GTMIG manages information on the hosts that are most feasible to create TMIs and GTMIs. Since numerous TMIs should be generated and the TMIs could be created on different servers, when GTMIG asks TMIFs to generate TMIs, it asks the TMIF that is resident at the same site with the TMIs to be generated. To the GTMIG, local TMIFs and remote TMIFs are viewed equivalently and they are invoked in the same way because they are in the form of a distributed object. So the workflow system operates in a fully distributed fashion. The site that a TMI about to be created is determined by the user directives or by considering the system situation. When one server is down, the GTMIG searches for an alternative server and uses the selected server instead of the crashed server. In this way, the whole system can maintain high system availability irrespective of a system failure. Moreover, GTMIG assigns unique IDs to each TMI and GTMI and it acts as the main connecting point for monitoring services. 5.3 A Workflow Instance Life Cycle In this subsection, we explain what is happening in ICU/COWS when a process instance creation is requested by a user. The following is the normal sequence of the steps from a process instance creation to the end of its execution: 1. A user asks to create a certain process instance. 2. GTMIG creates a GTMI for the process instance through TMIF. 3. GTMI asks GTMIG to create all TMIs of the process instance through TMIF.

4. GTMI sends a start signal to the first TMI to start work. 5. TMI starts work and sends a start signal to the next TMI once the work is finished successfully. 6. Iterate step 5 until the last TMI is reached. 7. The last TMI sends an end signal to the GTMI when it finishes. 8. GTMI destroys all the TMIs generated and itself. Both GTMIG and TMIF are resident in all the servers. Therefore if one server crashes the other server can take over the GTMI and TMI creation job instead. When GTMI creates TMIs, it can place the TMIs in several different ways based on the user directives. A user can explicitly denote on which server each TMI should be placed when defining a process template. If there are no user directives at all, TMIs are created in a distributed fashion by GTMI considering the location of application servers and load balancing. GTMI creates a TMI through TMIF which is installed on each workflow server. When the designated server of a TMI crashes, the other server is selected instead for the TMI creation. We found that this creation method is very flexible and effective in achieving high availability and scalability of the workflow system. Figure 2 shows the runtime architecture of ICU/COWS. 5.4 Application Servers and TMI Placement It is advantageous when workflow servers reside in the same or close site as the application servers processing the workflow instance. However, in large enterprises, long-running workflow instances may need services from several application servers which are located on physically different sites. Thus, when multiple servers operate in a distributed fashion they need to be deployed considering the location of application servers and the TMI generation should be conducted in the same manner. That is, a TMI which invokes an application program that requires services from an application server must be created in the workflow server that is at the same or close site as the application servers. 5.5 Worklist Handler and Client The worklist handler performs the role of bridging between TMIs and clients. Only one worklist handler can exist in a workflow server. However, multiple worklist handlers can exist in the overall system. TMI sends work items to worklist handlers in a push mode and the worklist handler sends the work items to the corresponding clients in the same manner. Although a worklist handler can be preferred by a TMI or a client, there is no need for a worklist handler to be dedicated to a certain TMI or client. That is, a TMI can be connected to any worklist handler to send work items to clients and a client can be connected to any worklist

HAN et al.: ICU/COWS

1519

Fig. 2

Run time architecture of ICU/COWS.

as a means to support structured ad hoc workflow. 6.1 Dynamic Reconfiguration

Fig. 3 Conceptual view of connection among TMIs, worklist handlers, and clients.

handler to receive work items. Several worklist handlers can retain different worklists for a client but the worklists for a client are usually maintained by the primary worklist handler. Figure 3 shows the conceptual view of these connections. Since the addition of a new worklist handler to the system can be achieved by a slight change of a database and the crash of a worklist handler does not imply disconnection of the TMI from the client, the system is very scalable and resilient. 6.

Features to Support Ad hoc Workflow

In this section we describe some features for a workflow system to support ad hoc workflow. Firstly, since ad hoc workflow frequently changes its path during the execution, a workflow system should be equipped with dynamic reconfiguration facilities to support ad hoc workflow. Then we introduce structured ad hoc workflow because it cannot be supported well with only dynamic reconfiguration. Connector facility is proposed

Two levels of dynamic reconfiguration need to be considered separately for the complete support of the dynamic reconfiguration [44]. In process-template level dynamic reconfiguration, the change of the execution path or attributes or both is reflected on the process template. Thus once the change is reflected to the template the effect of the change is also reflected to all the instances generated based on the template. Special care must be given to the process instances in execution as to whether the change can unexpectedly influence the running process instances. The support of processtemplate level dynamic reconfiguration in a workflow system is rather straightforward if we do not consider the side effect of the template change. Actually this can be done with the help of the system administration to some degree. However with only process-template level dynamic reconfiguration we cannot change the execution path or attributes of a certain process instance. In process instance level dynamic reconfiguration the change is limited to a certain process instance. Therefore more refined control of dynamic reconfiguration is possible with process instance level dynamic reconfiguration. To support process instance level dynamic reconfiguration, a workflow system should have the control for each process instance and each process instance also needs to cope with the dynamic change during execution. In ICU/COWS, GTMI performs the role of controlling of a certain process instance and keeps a dynamic change flag which is set to FALSE in normal execution. GTMI sets the flag to TRUE and waits for all the activities in the active state to end.

IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000

1520

Once all the active activities end the GTMI starts to change the execution path or attributes of the instance according to the user requirement. The holding of the process instance progression is achieved with the help of TMIs. TMI always looks up the dynamic change flags before it starts its own work. When the value of the flag is FALSE the TMI does its own job normally, but when the value of the flag is TRUE the TMI wait until it receives the signal to resume the job from the GTMI. The details of the procedure of the process instance level dynamic reconfiguration are not within the scope of this paper. While process template level dynamic reconfiguration is usually performed by process designer, process instance level dynamic reconfiguration should be performed by workflow participants directly. Thus interfaces to perform process instance level dynamic reconfiguration is provided in the run time client in ICU/COWS. Not all the functions in process modeling tool need to be provided. Only the simplified version of process modeling tool is enough to support process instance level dynamic reconfiguration. 6.2 Structured Ad hoc Workflow In an office environment, unexpected events frequently happen during the workflow processing. Thus it is not always possible to define precisely all the processes in advance. The conventional workflow paradigm in which processes are defined in advance is not suited well to the situation. In a conventional workflow system, such processes are classified as ad hoc workflow and the support for the processes by the workflow system is passive and limited. Dynamic reconfiguration is often cited as a means to cope with the situation. However dynamic reconfiguration is not enough to cope with the situation. For instance, many activities of large scale enterprises form sequences under some rules even without the workflow concept. Such sequences seem to be incomplete and disconnected at a glance, but they are often connected based on some rules to achieve a certain goal. We define this workflow as structured ad hoc workflow to differentiate it from inherently ad hoc workflow where no patterns can be derived from the activities in the workflow. The delivery of work items in structured ad hoc workflow is similar to that in inherently ad hoc workflow. But when a structured ad hoc workflow instance is completed, a flow can be derived and the derived flow can be shared by other workflow instances of the same workflow class because the same rules are applied to the workflow instances in the same workflow class. Note that workflow instances in the same workflow class of structured ad hoc workflow need not exactly be the same. Rather, they share some parts that are the same as each other. Therefore in structured ad hoc workflow some part of the predefined process template can be reused while the predefined

workflow template rarely can be reused in inherently ad hoc workflow. Structured ad hoc workflow differs from production or administrative workflow in that only part of the workflow template can be reused in structured ad hoc workflow while in production or administrative workflow, the whole workflow template is defined and used to generate workflow instances. Decision approval procedure is one of the typical example of structured ad hoc workflow. Although the structured ad hoc workflow differs from administrative and production workflow, the benefit of structured ad hoc workflow automation should not be underestimated. Almost the same or even more benefits can be obtained from structured ad hoc workflow automation than those from administrative and production workflow automations. More flexible concepts and facilities have to be introduced for a workflow system to support structured ad hoc workflow effectively. In structured ad hoc workflow, the workflow fragments are defined at run time and the defined workflow fragments are connected to form a workflow ultimately. Thus run time process definition accommodation, which is not fully supported with only dynamic reconfiguration is necessary. We propose connector facility to accommodate run time process definition in workflow management system. 6.3 Connector Facility A connector is defined as a storage or service, either to store incoming work items or data from the other departments or to access the storage [27]. With the dynamic reconfiguration facility, the connector facility plays a key role to support ad hoc workflow. The main function of the connector is to connect inter-department workflow with the following functions. Firstly, a connector has a storage service to store the work items to be handed over from several departments to one department. When it stores the work items, for the same workflow instances to be connected it keeps the workflow instance ID and propagates it until the process instance is completed. Secondly, the connector has to provide some means to access the data it stores. For that it provides APIs (Application Program Interface) for application programs or user interfaces for users. Thirdly, the connector registers events and keeps a history of what happened in the connector using statistical information. This information not only shows the current situation but also can be used as basic data for extracting the execution path or BPR (Business Process Reengineering). Fourthly, the connector provides the facility to define structured data easily. The defined structured data for each connector are manipulated and managed by the workflow system in an integrated manner. Although the way of structured data managing is similar to that of workflow relevant data, it differs from the workflow relevant data in that it can be defined at run time. Fifthly, the connector keeps the workflow

HAN et al.: ICU/COWS

1521

fragments defined in the run time and provides some functionality to reuse the workflow fragments easily. Using this facility, the whole workflow can be built from the defined workflow fragments in an incremental way. 7.

Implementation and Evaluation

An early version of this system has been developed in the JDK1.2 environment using basic CORBA naming services. Several tests on the workflow servers of different machines connected with the 10 Mbps LAN in our laboratory have been performed to factor out the characteristics of the system. Although the result of the tests may not represent the system characteristics exactly and the result may be changed as additional features are added to the system, some important characteristics of the distributed workflow systems have been identified.

Fig. 4

Average response time of each activity step.

7.1 Effects of Multiple Workflow Servers Since multiple workflow servers collaborate with one another in the distributed workflow system, they may enhance the total performance of the workflow system. We tested the performance effect of the multiple workflow servers in the system viewpoint and the user view point. In the system viewpoint the throughput (i.e., the number of jobs processed per second) is measured, and in the user viewpoint the average response time of the system reacting to the user requests is measured. 100 workflow instances with 300 activities connected sequentially are generated every 10 seconds and we assumed that each activity step takes 1 second to complete its work. Currently once the invocation of a workflow instance is requested in the workflow system, the workflow server that received the request creates all the TMIs (in this case 300 TMIs) to control each activity. The average response time of the created process instances has been measured on 1 to 4 hosts with one or more TMIFs in each server. Figure 4 shows the performance result of the test. We can find that 4 servers had 1.5 to 3 times better processing power than the single workflow system in congestion situation. From the performance result, it can be claimed that the number of hosts and the number of TMIFs are the factors that affect performance enhancement of the workflow system. 7.2 Effects of the TMI Distribution Methods Three different means of TMI distribution are applied to analyze the effect of each method. In fully distributed TMI placement, which is called the worst grouping method, the next TMI of a TMI is always placed on a different server from the TMI. Thus this placement method inevitably incurs network traffic when a TMI completes and notifies the next TMI. In

Fig. 5 System throughput change according to TMI grouping methods.

the chunk TMI distribution method, which we call the best grouping method, TMIs are divided evenly among the servers and all the neighbors of the TMIs are placed in the same server except the first and the last TMIs. Therefore the completion of a TMI does not incur network traffic to notify its completion to the next TMI except for the last TMI in the server. In the random TMI distribution method, TMI placement is decided randomly, and therefore the network overhead can be considered to be medium when compared with the previous methods. Figure 5 shows the throughput change according to TMI grouping methods. As expected the best grouping method showed 3 times better performance result than worst grouping method when the number of concurrent instances reaches 100. 7.3 Effects of TMI Creation Methods Two kinds of TMI creation methods are tested. In compile time TMI creation, all the TMIs are created before the first one starts. Thus compile time TMI creation does not have TMI creation overhead at all once

IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000

1522

Fig. 6 Comparison of average response time between compile time and run time TMI creation methods.

the process instance is started. However, compile time TMI creation may create TMIs that will not be executed at all. In run time TMI creation only the TMIs that will be executed are created right before the execution. Therefore the TMI creation overhead is included in the run time overhead of a process instance. This may become significant overhead in a production workflow system. Figure 6 compares the response time of the two methods. As can be seen from the figure, the overhead of the run time TMI creation method is considerable. Thus for time critical production workflow the compile time TMI creation method is more desirable. 7.4 Effects of Workflow Server Distribution The configuration of workflow servers also factors into the performance results. For example, when multiple servers are connected in a WAN environment the network overhead will be much more obvious. However when the workflow servers are resident in a clustering environment where each node is connected with faster networks such as 150 Mbps, different results in the WAN environment occur. We are planning to test the performance in these environments in the near future. 8.

Related Work

Considerable work has been done on the research of the workflow system [1], [23]–[26], [34], [35]. But most workflow systems fall into the centralized monolithic workflow system. Relatively little work has been done on a distributed workflow system. ORBWork [13], [36] is one of the best known distributed workflow systems in the CORBA environment for the METEOR2 workflow model [37], [38]. The system is organized with fixed task managers which manage the assigned tasks specified by the IDL interface. The task managers and tasks of ORBWork correspond to TMI and tasks, re-

spectively, of our system. Whereas the task managers of ORBWork are resident as a workflow server component, the TMI is a transient object instance. Thus our system is more reactive and flexible than ORBWork. The tasks and task managers of ORBWork are not implemented as transactional CORBA objects as in our system. Therefore they need to prepare a special recovery facility to cope with system failure which is not necessary in our system. The result of the work of Nortel and others [9], [14], whose main design goals are to achieve interoperability, scalability, flexible task composition, dependability, and dynamic reconfiguration, is very similar to our system. Tasks and task controllers are implemented as CORBA transactional objects. They insist that they do not need to prepare a special mechanism for system recovery because the system is composed of CORBA transactional objects. Several kinds of tasks such as simple tasks, compound tasks and genesis tasks are introduced to represent various kinds of tasks effectively. The task controller corresponds to our TMI but they do not have a central controller that manages the whole workflow like our GTMI. Sometimes the compound task controllers work comparably to our GTMI. RainMan [12] is a distributed workflow system developed in pure Java using Java RMI as the primary transport mechanism. A source-performer model is used whereby sources generate service requests and performers execute the service. The service requesters could be workflows and the service providers could be people or applications or workflow system. An attempt is made to derive a standard interoperability interface between a wide variety of heterogeneous workflow applications. However, the system lacks a means of recovery from system failure. Numerous workflow systems [39]–[41] used transactional features for various reasons such as supporting forward recovery or adding reliability and consistency to the workflow system; the degree of incorporation of transactional features to each workflow system varies according to the system. Since our system components are implemented as CORBA transactional objects and use transactional services to satisfy the aforementioned goals, the degree of incorporation of transactional features is very high [16], [17]. When we review some characteristics of the ATMs in the long-lived transaction processing point of view, the hierarchical model in nested transactions [28] allows finer grained recovery, and provides some degree of flexibility in terms of transaction execution. In addition to database systems, nested transactions can be used to model reliable distributed systems [29]. Nested transactions provide a model for partitioning an application system into the recoverable units, that is, subtransactions. The Saga model [30] proposes compensation concepts for the recovery process. Undoing incomplete

HAN et al.: ICU/COWS

1523

transactions (or backward recovery) is an accepted repair mechanism for aborted transactions. However, this concept is not directly applicable to the most real-world workflow tasks that are governed by the long-lived (usually permanently lived) actions, e.g., human actions and legacy system processing. One can define a semantically inverse task (commonly referred to as a compensating task), or a chain of tasks that could effectively undo or repair the damage incurred by a failed task within the workflow. Compensation has been applied to tasks and/or groups of tasks (spheres) to support partial backward recovery in the context of the FlowMark WFMS [31]. Transaction failure is often localized within such models using retries and alternative actions. The flexible transaction model [32], [33] discussed the role of alternative transactions that can be executed without sacrificing the atomicity of the overall global transaction. Tasks can be retried in the case of certain failures, e.g., failures related to unavailability of input data, or inadequacy of resources for executing a task at a processing entity. Alternative tasks can be also scheduled to handle other more serious errors that might cause a task to fail, e.g., when a certain number of retries fail, or when a task cannot be activated due to unavailability of a processing entity. These provide a very flexible and natural model for dealing with failures, and could be applicable in workflow environments. It is very important to point out that the unit of recovery is also a transaction in transactional models. Each transaction has a predefined set of semantics that are compliant with the transaction processing system. The model for recovery in a workflow system is more involved since the recovery process should not only restore the state of the workflow system, but should proceed forward in a manner that is compliant with the overall organizational process. Our work is the first attempt to support multiworkflow types in distributed transactional workflow system. With dynamic reconfiguration mechanism, connector facility is the core facility to cope with structured ad hoc workflow. It enables the large whole workflow to be derived incrementally from the partial workflow fragments. Also it can function as a means to connect automated business processes with manual business processes. Not so much attempt has been tried to show distributed workflow system characteristics through the quantitative value of the system performance. We showed that some important characteristics of the system can be made out from the quantitative performance data of the distributed workflow system. 9.

To automate such a business process with a centralized workflow system, the workflow system should always be located apart from the application servers. Since network disconnection is common, making the whole system work as a reliable system is not a simple matter. Hence a workflow system supporting large scale enterprise business processes operates better in a distributed fashion whereby each server is located near the application servers. Although the servers are physically distributed, the whole workflow system better preserves a single system image to the users because the single system image copes effectively with a partial system failure and monitoring and administration of the whole workflow system. On the other hand, for a workflow system to support a large scale enterprise workflow, it must support multi-workflow types because the workflow of a large scale enterprise is often a combination of production, administrative, and ad hoc workflow. A workflow system supporting only a certain workflow type is not suitable to the situation. In this paper we designed and implemented a distributed transactional workflow system supporting multiple workflow types. Transactional features are used mainly to enhance the correctness of the workflow system reaction to the events and to cope with system failure effectively. Since the servers are distributed over the network, the consistency of the system is more important than the centralized server system. Transactional operation is used to keep the system from an inconsistent state. Also, workflow system components are built as transactional objects to cope with system failure effectively. No special mechanism is necessary for the transactional object to recover from the system failure. Several tests were performed with an early version of the system to determine the system characteristics. We found that distributed multiple servers are effective not only for reliability but also for throughput of the system. We believe the results reflect the general characteristics of the distributed workflow system. The usefulness of a connector facility provided to support ad hoc workflow with dynamic reconfiguration is very much dependent on user attitude, often determined by the decision approval system in the particular business culture utilizing the system. Thus the facility may not be accepted in an organization with a different culture. But we believe that the facility can play a key role in an organization that needs an extremely flexible workflow system and in changing the conventional workflow paradigm into a new paradigm in which the entire workflow can be derived by connecting partial workflow fragments.

Conclusion References

In a large scale enterprise that spreads over large areas, application servers located in several sites need to be involved in order to complete a business process.

[1] Workflow Management Coalition Specification Document, The Workflow Reference Model Version 1.1, Nov. 1994.

IEICE TRANS. INF. & SYST., VOL.E83–D, NO.7 JULY 2000

1524

[2] Workflow Management Coalition Specification Document, Workflow Coalition Interface 1: Process Definition Interchange Process Model Document Number: WFMC TC1016-P, Aug. 1998. [3] Workflow Management Coalition Specification Document, Workflow Management Application Programming Interface (Interface 2, 3) Specification Version 2.0e Document Number: WFMC TC-1009, July 1998. [4] Workflow Management Coalition Specification Document, Workflow Client Application (Interface 2) Application Programming Interface (WAPI) Naming Conventions Version 1.4 Document Number: WFMC-TC-1013, Nov. 1997. [5] Workflow Management Coalition Specification Document, Workflow Standard - Interoperability Abstract Specification Version 1.0, Document Number: WFMC-TC-1012, Oct. 1996. [6] Workflow Management Coalition Specification Document, Workflow Standard - Interoperability Internet e-mail MIME binding Version 1.1 Document Number: WFMC-TC-1018, Sept. 1998. [7] Workflow Management Coalition Specification Document, Workflow Management Coalition Audit Data Specification Version 1.1 Document Number: WFMC-TC-1015, Sept. 1998. [8] Workflow Management Coalition Specification Document, Workflow management Coalition Terminology & Glossary Version 2.0 Document Number: WFMC-TC-1011, June 1996. [9] Nortel & University of Newcastle upon Tyne, Workflow Management Facility Specification Revised Submission OMG Document Number: bom/98-03-01, 1998. [10] Joint Submitters, Workflow Management Facility Revised Submission, OMG Document Number: bom/98-06-07, July 4 1998. [11] Electronic Data System Corporation, EDS Workflow Management Facility Initial Submission OMG Document Number: bom/97-08-06, Aug. 29 1997. [12] S. Paul, E. Park, and J. Chaar, “RainMan: A workflow system for the internet,” Proc. USENIX Symp. on Internet Technologies and Systems, 1997. [13] S. Das, K. Kochut, J. Miller, A. Seth, and D. Worah, “ORBWork: A reliable distributed CORBA-based workflow enactment system for METEOR2,” Tech. Report No.UGACS-TR 97-001, Dept. of Computer Science, University of Georgia, 1997. [14] G.D. Parrington, S.K. Shrivastava, S.M. Wheater, and M.C. Little, “The design and implemented of Arjuna,” USENIX Computing Systems Journal, vol.8, no.3, pp.255– 308, 1998. [15] S.K. Shrivastava and S.M. Wheater, “Architectural support for dynamic reconfiguration of large scale distributed application,” The 4th International Conference on Configurable Distributed Systems (CDS ’98), Annapolis, Maryland, USA, May 4–6 1998. [16] Z. Yang and K. Duddy, “CORBA: A platform for distributed object computing,” ACM Operating System Review, vol.30, no.2, pp.4–31, April 1996. [17] S.M. Wheater, S.K. Shrivastava, and F. Ranno, “A CORBA compliant transactional workflow system for internet applications,” Proc. IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing (MIDDLEWARE ’98), Sept. 1–18 1998. [18] S.K. Shrivastava and S.M. Wheater, “Architectural support for dynamic reconfiguration of large scale distributed application,” The 4th International Conference on Configurable Distributed Systems (CDS ’98), Annapolis, Maryland, USA, 1998.

[19] M.U. Kamath and K. Ramamritham, “Correctness issues in workflow management,” Distributed Systems Engineering (DSE) Journal: Special Issue on Workflow Management Systems, vol.3, no.4, Dec. 1996. [20] G. Alonso, D. Agrawal, A. El Abbadi, M Kamath, R. Gunthor, and C. Mohan, “Advanced transaction models in workflow contexts,” 12th International Conference on Data Engineering, New Orleans, Louisiana, Feb. 1996. [21] M.U. Kamath and K. Ramamritham, “Modeling, correctness & systems issues in supporting advanced database applications using workflow management systems,” University of Massachusetts Computer Science Technical Report 95-50, June 1995. [22] M.U. Kamath and K. Ramamritham, “Failure handling and coordinated execution of concurrent workflow,” Proc. 14th International Conference on Data Engineering, Orlando, Florida, IEEE Computer Society Press, Feb. 1998. [23] G. Alonso, R. Gunthor, M Kamath, D. Agrawal, A. El Abbadi, and C. Mohan, “Exotica/FMDC: Handling disconnected clients in a workflow management system,” Third International Conference on Cooperative Information Systems (CoopIS-95), May 1995. [24] A.E. Clarence, K. Keddara, and G. Rozenberg, “Dynamic change within workflow systems,” Proc. ACM SIGOIS Conference on Organizational Computing Systems, Milpitas, CA, 1995. [25] K.-H. Kim and C.A. Ellis, “A framework for workflow architectures,” University of Colorado/Department of Computer Science, Technical Reports, CU-CS-847-97, Dec. 1997. [26] D. Han and H. Park, “Design and implementation of web based business process automating HiFlow system,” J. KISS(C): Computing Practices, vol.4, no.1, Feb. 1998. [27] D. Han and J. Shim, “Connector-oriented workflow system for the support of structured Ad hoc workflow,” Proc. the 33rd Hawaii International Conference on System Sciences (HICSS-33), Maui, Hawaii (CD-ROM), Jan. 4–7, 2000. [28] J. Moss, “Nested transactions and reliable distributed computing,” Proc. 2nd Symposium on Reliability in Distributed Software and Database Systems, pp.33–39, IEEE CS Press, Pittsburgh, PA, July 1982. [29] J. Moss, “Nested transactions: An introduction,” B.K. Bhargava, ed., Concurrency Control and Reliability in Distributed Systems, Van Nostrand Reinhold Company, 1987. [30] H. Garcia-Molina and K. Salem, “Sagas,” Proc. ACM SIGMOD International Conference on Management of Data, pp.249–259, San Francisco, CA, May 1987. [31] F. Leymann, “Supporting business transactions via partial backward recovery in workflow management systems,” in GI-Fachtagung Datenbanken in Buro Technik und Wissenchaft, Springer-Verlag, Dresden, Germany, 1995. [32] A.K. Elmagarmid, Y. Leu, W. Litwin, and M. Rusinkiewicz, “A multidatabase transaction model for InterBase,” Proc. 16th International Conference on Very Large Data Bases, pp.507–518, Brisbane, Australia, Aug. 1990. [33] A. Zhang, M. Nodine, B. Bhargava, and O. Bukhres, “Ensuring relaxed atomicity for flexible transactions in multidatabase systems,” Proc. ACM SIGMOD International Conference on Management of Data, pp.67–78, 1994. [34] G. Alonso and H.J. Schek, “Research issues in large workflow management systems,” [She96], Athens, GA, May 1996. [35] B.R. Silver, “The BIS guide to workflow software: A visual comparison of today’s leading products,” Technical Report, BIS Strategic Decisions, Norwell, MA, Sept. 1995. [36] S. Das, “ORB work: A distributed CORBA-based engine for the METEOR2 workflow management system,” Master’s thesis, University of Georgia, Athens, GA, March 1997.

HAN et al.: ICU/COWS

1525

[37] X. Wang, “Implementation and performance evaluation of CORBA-based centralized workflow schedulers,” Master’s thesis, University of Georgia, Aug. 1995. [38] J.A. Miller, A.P. Sheth, K.J. Kochut, and X. Wang, “CORBA-based run-time architectures for workflow management systems,” Journal of Database Management, vol.7, no.1, pp.16–27, Winter 1996. [39] Y. Breitbart, A. Deacon, H. Schek, and A. Sheth, “Merging application-centric and data-centric approaches to support transaction-oriented multi-system workflows,” SIGMOD Record, vol.22, no.3, pp.23–30, Sept. 1993. [40] D. Georgakopoulos, M. Hornick, and A. Sheth, “An overview of workflow management: From process modeling to workflow automation infrastructure,” Distributed and Parallel Databases, vol.3, no.2, pp.119–154, April 1995. [41] J. Tang and J. Veijalainen, “Transaction-oriented workflow concepts in inter-organizational environments,” Proc. 4th International Conference on Information and Knowledge Management, Baltimore, MD, Nov. 1995. [42] S. Joosten, G. Aussems, M. Duitshof, R. Huffmeijer, and E. Mulder, “WA-12: An empirical study about the practice of workflow management,” University of Twente, Research Monograph, Enschede, The Netherlands, July 1994. [43] T. Schael and B. Zeller, “Design principles for cooperative office support system in distributed process management,” in Support Functionality in the Office Environment, A. Verrijn-Stuart, ed., North Holland, 1991. [44] H. Tarumi, K. Kida, Y. Ishiguro, K. Yoshifu, and T. Asakura, “WorkWeb system: Multi-workflow management with a multi-agent system,” Proc. ACM International Conference on Supporting Group Work (Group ’97), pp.299– 308, 1997.

Dongsoo Han received the B.S. and M.S. degrees in Computer Science and Statistics from Seoul National University, Seoul, Korea in 1989 and 1991 respectively, and the Ph.D. degree in Information Science from Kyoto University, Kyoto, Japan in 1996. He had worked for Hyundai Information Technology, Korea from 1996 to 1997. He is currently affiliated as an assistant professor with the School of Engineering, Information and Communications University (ICU), Taejon, Korea. His current research interests are in the areas of distributed workflow management system, parallel compilers, high performance computing. He is a member of ACM and IPSJ.

Jaeyong Shim received the B.S. and M.S. degrees in Computer Science from Sogang University, Seoul, Korea in 1994 and 1996 respectively. He had worked for Hyundai Information Technology, Seoul, Korea from 1996 to 1997. Since 1998, he has been enrolled in the doctoral course of Information and Communications University (ICU), Taejon, Korea. His research interests are workflow technology including interoperability, adaptability and correctness issues and mobile systems.

Chansu Yu received the B.S. and M.S. degrees in Electrical Engineering from Seoul National University, Seoul, Korea in 1982 and in 1984, respectively. He worked as a research engineer at GoldStar Company until 1989. He received the Ph.D. degree in Computer Engineering from the Pennsylvania State University in 1994. Since 1997, he has been an assistant professor with the School of Engineering, Information and Communications University (ICU), Taejon, Korea. His areas of interest are computer architecture, parallel and cluster computing, performance evaluation and mobile systems. He is a member of the IEEE and IEEE Computer Society.