Service Mining on the Web - IEEE Xplore

9 downloads 12097 Views 2MB Size Report
Jan 22, 2009 - Index Terms—Service mining, service recognition, interestingness, usefulness ... both self-describable data and Web services, which are a new.
IEEE TRANSACTIONS ON SERVICES COMPUTING,

VOL. 2,

NO. 1,

JANUARY-MARCH 2009

65

Service Mining on the Web George Zheng and Athman Bouguettaya, Senior Member, IEEE Abstract—The Web is transforming from a Web of data to a Web of both Semantic data and services. This trend is providing us with increasing opportunities to compose potentially interesting and useful services from existing services. While we may not sometimes have the specific queries needed in top-down service composition approaches to identify them, the early and proactive exposure of these opportunities will be key to harvest the great potential of the large body of Web services. In this paper, we propose a Web service mining framework that allows unexpected and interesting service compositions to automatically emerge in a bottom-up fashion. We present several mining techniques aiming at the discovery of such service compositions. We also present evaluation measures of their interestingness and usefulness. As a novel application of this framework, we demonstrate its effectiveness and potential by applying it to service-oriented models of biological processes for the discovery of interesting and useful pathways. Index Terms—Service mining, service recognition, interestingness, usefulness, pathway discovery.

Ç 1

INTRODUCTION

T

HE

Web is currently going through a transformation from a data-centric Web to a Semantic Web consisting of both self-describable data and Web services, which are a new type of first class object. The Web service deployment of previously isolated applications allows such an application to be described and published by one organization (i.e., service provider), and discovered and invoked later by other independently developed applications (i.e., service consumers) [1], essentially making these applications interoperable on the Web. This unprecedented ease of application integration contributed to the increasing popularity of Web service composition, which aims at providing value-added services through composing existing services. Web service composition has traditionally taken a top-down approach. The top-down approach requires a user to provide a goal containing specific search criteria defining the exact service functionality the user expects, as shown through an example of composing a travel service at the top left of Fig. 1. Often, the more specific the query and search criteria are, the smaller the search space and more relevant the composition results will be. The specificity of the search criteria would reflect the interest and often knowledge of the service composer about the potential composability of existing Web services. Since the composer is typically only aware of and consequently interested in some specific types of compositions, the scope of such a search is usually very narrow. As a result, an attempt to start the search with a set of specific criteria that are not framed to coincide with the availability of interested Web services will most likely end . G. Zheng is with the Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 94061. E-mail: [email protected]. . A. Bouguettaya is with Commonwealth Scientific and Industrial Research Organisation (CSIRO) ICT Centre, Building 108, North Road, ANU Campus, ACTON ACT 2601, Australia. E-mail: [email protected]. Manuscript received 10 Aug. 2008; revised 15 Dec. 2008; accepted 18 Jan. 2009; published online 22 Jan. 2009. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TSC-2008-08-0072. Digital Object Identifier no. 10.1109/TSC.2009.2. 1939-1374/09/$25.00 ß 2009 IEEE

up empty-handed. The top-down approach can thus work well only if the service composer clearly knows what to look for and the component Web services needed to compose such services are available. Aiming at exploring the full potential of the service space without prior knowledge of what exactly is in it, another view that approaches service composition from the bottom-up is building up recently [2]. Instead of starting the search with a specific goal, a service engineer may be interested in discovering any interesting and useful service compositions that may come up in the search process. For performance reasons, a general goal may be provided at the beginning to scope down the initial search space to a reasonable size. For an illustration, we show at the bottom of Fig. 1 that a service engineer sets out to find any interesting and useful services with a general interest in Chinese medicine in mind. What comes out of the search process might be quite surprising. For example, in addition to discovering the possibility of composing a service for translating Tsalagi1 to Chinese, the engineer also discovers, with the help of a service mining tool, a service composition that takes as input a biological sample from a subject, determines the corresponding genome and the possible diseases the subject is predisposed to, and finally generates a list of treatment recommendations and/or life style suggestions. Thus, unlike the search process in the topdown approach that is strictly driven by the search criteria, the search process in the bottom-up approach is serendipitous in nature, i.e., it has the potential of finding interesting and useful service compositions that are unexpected. As more diverse services are deployed to the Web at an accelerating rate, the collective opportunities of composing services will surpass anyone’s imagination. Many of these opportunities will be hidden in the Web of available services and unexpected to most people. While we may not sometimes have the specific queries needed to search for them, being able to discover these opportunities early in today’s business environment equates to gaining competitive business advantages. For government agencies, doing 1. A language spoken by the Cherokee Indian tribe. Published by the IEEE Computer Society

66

IEEE TRANSACTIONS ON SERVICES COMPUTING,

VOL. 2,

NO. 1, JANUARY-MARCH 2009

distinguish interesting and useful composed services from those that are either trivial, already known, or useless. In this paper, we propose a Web service mining framework that addresses these two challenges. We organize the remainder of the paper as follows: Section 2 first introduces the concept of Web service recognition, which forms the basis of much of our mining algorithms. We present our service mining framework and details of all the phases in Sections 3-6. We show the application of this framework to pathway discovery in Section 7. In Section 8, we survey and contrast related work with our research. We conclude the paper with discussion of future work in Section 9.

2

Fig. 1. Top-down composition versus bottom-up mining.

so also means that citizens receive useful and potentially life-enhancing services in advance. It is thus essential to be able to proactively discover opportunities for composing useful services even when the goals are unspecified at the moment, or simply hard to imagine or unknown. Much like the easy access to a glut of data that has provided a fertile ground for data mining research, we expect that the increase in Web services’ availability will also spur both the need and opportunities to break new ground on Web service mining. We define Web service mining as a bottomup search process aimed at the proactive discovery of potentially interesting and useful Web services from existing services. Web service mining faces two main challenges, namely, combinatorial explosion and evaluation of interestingness and usefulness. A naive bottom-up approach would be to conduct an exhaustive search and full-blown composability analysis between any two Web services in the service registry. Once a potentially positive composition from two component services is identified, the analysis would expand to allow for more component services in a composition. As the number of registered Web services increases at an accelerating rate, such an approach can quickly become infeasible due to the overwhelming computation resulting from a “combinatorial explosion.” The second challenge is determining the interestingness and usefulness of a composed service identified through mining. In the top-down approaches, the determination of interestingness and usefulness is not a major concern since the goal provided by the user already implies what types of compositions the user anticipates. In Web service mining, neither interestingness nor usefulness would be so obvious when the composed Web services are discovered without any specific goals. Useful composed services may be contaminated with frivolous or trivial ones that add little or no value. In addition, some of these useful services may have already been known. It will thus be necessary to

WEB SERVICE RECOGNITION

Much like molecules in the natural world where they can recognize each other and form bonds in between [3], Web services and operations can also recognize each other through both syntax and semantics. Consequently, potentially interesting and useful service compositions may emerge from bottom-up through such mechanism. In the following, we first introduce our extensions to existing Web service ontologies (e.g., OWL-S [4] and WSMO [5]) that makes this possible. Operation interface. A construct used to specify a shared service capability. An operation can implement an operation interface or make known of its need to invoke operations that implement an operation interface. The separation of operation from operation interface allows the same interface to be implemented by multiple service operations. This construct allows Web services to declaratively plug into one another at the operation level. Domain. It is used to group relevant service capabilities or operation interfaces into the same category. A Web service’s involvement with a domain is reflected by whether it supplies or consumes an implementation of an operation interface in such a domain. Like OWL-S and WSMO, we rely on domain ontologies to define the type of operation parameters. Based on these extensions, we identify four types of recognition between Web services and operations, as shown in Fig. 2. 1. Direct recognition. A direct recognition is established between operations opa and opb if opa consumes the implementation of an operation interface opintf , which is implemented by opb . In addition, opa and opb must be mode, binding, and message composable [6]. Mode composability states that the following pairs are composable: notification with one-way and solicit-response with request-response. Binding composability states that both operations should share the same protocol. Message composability states that the number of parameters and type of each parameter should match between the two operations. 2. Indirect recognition. A target operation opt indirectly recognizes a source operation ops if ops generates some or all input parameters of opt . We use the term indirect to indicate the fact that there is a potential need to relay parts of the output message from ops to parts of the input message to opt at the composition level. The following two recognition mechanisms are relevant to service oriented models of biological processes as we apply

ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB

67

Fig. 2. recognitions between services and operations.

our mining framework later to the discovery of biological pathways. Each service models the process(es) of a biological entity, also referred to later as a service providing entity. 3. Promotion. When operation op1 of service sa produces an entity (i.e., output parameter) that in turn provides service sb , we say that sa : op1 promotes sb . 4. Inhibition. When operation op1 of service sa consumes an entity (i.e., input parameter) that in turn provides service sb , we say that sa : op1 inhibits sb . Note that in order for Web services and operations to recognize one another using these mechanisms, additional pre- and postconditions may also need to be met.

3

THE SERVICE MINING FRAMEWORK

Fig. 3 shows our Web service mining framework using a multiphase approach that is inspired by the drug discovery process [7]. The idea is to keep the computation complexity simple in the early phases when there are a large number of Web services to process. As the size of the candidate pool shrinks toward the later phases, it increases the computation complexity in order to achieve better accuracy. The framework starts first with scope specification, a manual phase involving a domain expert defining the context of mining. We expect the domain expert to have a general idea about the “seeds” of Web service functional areas (e.g., travel, insurance, medicine) and optionally the locales of these functions that he/she is interested in mining. Such seeds are expected to grow into fruitful compositions as the mining progresses. Within this phase, a hierarchy of domain ontology indexes is established to speed up latter phases in the mining process. Weights may be assigned to these seeds to differentiate user’s interest in them and to help retain compositions grown out of those that the user is more interested in. Note that rather than specifying the exact goal of the compositions in pursuit as would a traditional Web service composition approach, scope specification does not limit what that goal should be. Consequently, any composition leads emerged within this scope will be pursued further.

Fig. 3. Service mining framework.

Scope specification is followed by several automatic phases. The first of these is search space determination. To help curb the problem of combinatorial explosion when faced with a large number of Web services, the mining context is used in this phase to identify a focused library of existing Web services as the initial pool for further mining. The next is the screening phase, which contains three subphases representing a grow-weed-grow cycle. In the filtering (first growing) subphase, Web services in the focused library would go through filtering algorithms for the purpose of identifying potentially interesting leads of service compositions. This is achieved through establishing linkages between Web services based on the four recognition mechanisms at a “coarse-grained” level (i.e., involving only a subset of matching Web service characteristics such as operation interfaces and parameter types), so the filtering step can be quickly completed. In static verification (weeding) subphase, service compositions leads identified earlier are semantically verified based on a subset of operation pre- and postconditions involving binary variables (e.g., whether the input to an operation is activated) and enumerated properties (e.g., the locale of an operation input). In the linking (last growing) subphase, verified service compositions are linked together to establish more comprehensive composition networks. When the mining framework is applied to the discovery of biological pathways,2 such composition networks would represent pathways linking service oriented models of biological processes. The composition networks are then input to the evaluation phase, which consists of four subphases. Objective evaluation identifies and highlights interesting segments of a composition network by checking whether such linkages 2. Pathways are represented as a network of interactions among biological entities such as cell, DNA, RNA, and enzyme. Exposure of pathways is expected to deepen our understanding of how diseases come about and help expedite drug discovery for treating them.

68

IEEE TRANSACTIONS ON SERVICES COMPUTING,

are novel (i.e., previously unknown) and whether they are established in a surprising way (e.g., if they link segments not previously known to be related). An interactive session follows next with the user taking hints from highlighted interesting segments within a composition network and picking a handful of nodes to pursue further. These nodes are then automatically linked into a connected subgraph, to the extent possible, using a subset of nodes and edges in the original graph. This subgraph provides the user the basis to formulate hypotheses, which can then be tested out via simulation. In the case of pathway discovery, the simulation is used to invoke relevant service operations, changing the quantity/attribute value of various entities involved in the composition network. Results from the simulation phase are expected to reveal hidden relationships among the corresponding processes. These results are then presented to the user, whose subjective evaluation finally determines whether the subgraph in pursuit is actually useful. In some cases, the user may want to revise the simulation initial setting, rerun the simulation, and evaluate new simulation results. At the end, the user may want to introduce some of the discovered service composition subgraphs representing pathways to a pathway base for future references. One use of such references may be in the area of building models for biological entities at a more complex level. We present details of various phases of the mining process in the following sections.

4

4.1 Scope Specification The mining process starts with the scope specification phase where a composite Web service engineer optionally takes advantage of necessary subjective interestingness measures to bootstrap the mining process. The engineer may scope the mining activity by defining a list of functional areas and the locales where these functions reside. For example, the engineer may express a general interest in service compositions that involve travel, healthcare, or insurance within the locale of the continental US. Since different functional areas are drawn from corresponding domains, which may, in turn, rely on different ontologies, scope specification essentially determines a set of ontologies to use for the mining process. When presented with these ontologies, the engineer may choose to assign interestingness weights to various ontology nodes that he/she is particularly interested in. In addition, the engineer may optionally choose to assign interestingness weights to some of the operation interfaces within these domains that are also of interest. The end product of scope specification is the mining context containing a list of relevant domains and locales limiting the

NO. 1, JANUARY-MARCH 2009

applicability of corresponding functions within these domains. We formally define mining context C in (1) C ¼ fdðLÞ j d 2 Dg;

ð1Þ

where . D is a set of Web service domains; . L is a set of locale attributes of mining interest; . dðLÞ is a domain carved out by L. Consequently, if we use ! to denote the relationship of refers to, then the set of all ontologies referred to in C can be denoted as OntðCÞ and calculated using OntðCÞ ¼ font j 9d 2 C ^ d ! ontg;

ð2Þ

and the set of all operation interfaces included in C can be denoted as OPintf ðCÞ and calculated using OPintf ðCÞ ¼ fopintf j 9d 2 C ^ opintf 2 dg:

ð3Þ

4.2 Search Space Determination The mining scope determines the coverage of the search space when looking for composable components for the purpose of composition. Similar to the drug discovery process, the end product of our search space determination phase is a focused library consisting of Web services from service registry R that are involved in mining context C. We formally define focused library L in (4) L ¼ fs j s 2 R ^ ðs:operations \ OPintf ðCÞ 6¼  _ 9op 2 s:operations : opconsume ðOPintf Þ \ OPintf ðCÞ 6¼ Þg; ð4Þ

PRESCREENING PLANNING

The search space of the mining process can be scoped down if we are only interested in finding potentially interesting/ useful composed services within certain functional areas and locale of mining interest limiting the applicability of these functions. We organize our prescreening planning to contain two phases: Scope Specification and Search Space Determination.

VOL. 2,

where s:operations denotes the set of operations implemented by s and opconsume ðOPintf Þ denotes the set of operation interfaces that are consumed by op. Thus (4) gives the focused library as the set of all Web services that either provide implementation(s) for some interface(s) in OPintf ðCÞ or whose operation(s) consume(s) some implementation(s) of interface(s) in OPintf ðCÞ. The focused library thus covers the search space that is carved out based on the identified mining context.

5

SCREENING

The screening phase in our framework consists of three distinct subphases: filtering, static verification, and linking.

5.1 Filtering To address the problem of combinatorial explosion, we rely on a publish/subscribe mechanism to convert the traditional combinatorial search problem into a service/operation recognition problem. As a result, top-down searches are transformed into bottom-up matches. We filter Web services at two levels: operation and parameter. 5.1.1 Operation Level Filtering At the operation level, operation interfaces within the mining context serve as the medium for Web service operations to plug into one another via direct recognition. We show our operation level filtering mechanism in Algorithm 1. Algorithms 2 and 3 list our operation agent’s functions for publication and subscription.

ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB

69

Fig. 4. Exhaustive search versus our filtering mechanisms. (a) Exhaustive search. (b) Operation level filtering. (c) Parameter level filtering.

Algorithm 1. Operation Level Filtering Input: Context operation interfaces OPintf ðCÞ, focused library F . Output: Leads of composed Web services L. Variables: Leads from publication and subscription Lps , operation interfaces consumed by op; opconsume ðOPintf Þ. 1: == Create an agent per operation interface for keeping track of publishers and subscribers 2: for all opintf 2 OPintf ðCÞ do 3: create Agentðopintf Þ; 4: end for 5: for all s 2 F do 6: for all op 2 s:operations do 7: == Operation implementing an operation interface publishes through the interface 8: if 9opintf 2 OPintf ðCÞ: op implements opintf then 9: Lps Agentðopintf Þ.publish(op); 10: L:addðLps Þ; 11: end if 12: == Operation consuming implementation of an operation interface subscribes to the interface 13: for all opintf 2 opconsume ðOPintf Þ do 14: if opintf 2 OPintf ðCÞ\then 15: Lps Agentðopintf Þ.subscribe(op); 16: L:addðLps Þ; 17: end if 18: end for 19: end for 20: == Subscribe service to the service providing entity type 21: k typeðs:providerEntityÞ; 22: if k 2 OntðCÞ then 23: if :9AgentðkÞ then 24: create AgentðkÞ; 25: end if 26: AgentðkÞ:subscribeðsÞ;

27: end if 28: end for When publishing an operation that implements an interface, function publishðopÞ of the corresponding agent checks whether there is any subscriber to the interface. If so, it tries to establish a service composition lead using direct recognition between the publisher and the subscriber. Similarly, when a service operation subscribes to an operation interface that it consumes, function subscribeðopÞ checks whether there is any publisher that implements the interface. If so, it tries to establish a lead service composition between the subscriber and the publisher. Fig. 4b depicts our operation level filtering mechanism. Algorithm 2. Operation Agent Function for Publication publishðopÞ Input: Web service operation op providing implementation for the operation interface that this agent represents. Output: Leads of composed Web services Lps . Variable: A composed service cs. 1: publishers:addðopÞ; 2: if subscribers 6¼  then 0 3: for all op 2 subscribers do 0 4: cs generateLeadðop ; opÞ; 5: Lps :addðcsÞ; 6: end for 7: end if 8: return Lps ; Algorithm 3. Operation Agent Function for Subscription subscribeðopÞ Input: Web service operation op interested in invoking the operation interface that this agent represents. Output: Leads of composed Web services Lps . Variable: A composed service cs 1: subscribers:addðopÞ; 2: if publishers 6¼  then 0 3: for all op 2 publishers do

70

IEEE TRANSACTIONS ON SERVICES COMPUTING,

VOL. 2,

NO. 1, JANUARY-MARCH 2009

0

4: cs generateLeadðop; op Þ; 5: Lps :addðcsÞ; 6: end for 7: end if 8: return Lps ;

TABLE 1 Symbols and Parameters

5.1.2 Parameter Level Filtering Parameter level filtering targets three types of recognition: promotion, inhibition, and indirect recognition, as described in Section 2. We consider three types of matching between parameters p1 and p2 , whose data types refer to domain ontology index nodes (DOINs) na and nb , respectively: Exact match or synonym. na ¼ nb . One index node is created for all synonymous ontology nodes. . Is-a. na is a child of nb . . Has-a. na has a component nb . We assume that the above relationships among parameter types are already declared in domain ontologies and thus can be automatically detected. Fig. 4c illustrates our parameter level filtering mechanism. Since ontological index nodes are used to describe the type of operation parameters, a parameter is considered an instance of such a node. When a Web service operation is introduced in the mining process, each of its output parameters will publish to an ontology index node it is an instance of. Similarly, each of its input parameters will subscribe to an ontology index node it is an instance of. The publication and subscription on a node can sometimes propagate to other nodes within the ontology index node network. This happens when the node is involved in an inheritance or compositional relationship with other nodes. In general, publication propagates down a composition tree and up an inheritance tree, while subscription propagates up a composition tree and down an inheritance tree. In addition to parameter, a service would also subscribe to the ontology index node that defines the type of its service providing entity. For better performance, we include this subscription in lines 21-27 of Algorithm 1. Due to page limit, we omit listing of parameter filtering algorithms. As Web service operations are introduced into the mining process, subscriptions and publications at both the operation and parameter levels are triggered. Each operation interface and ontology index node keeps track of its own subscribers and publishers. This tracking enables Web services to recognize one another at both levels. .

5.2 Complexity Analysis We compare the computation complexity of our operation level filtering algorithms against a naive exhaustive search algorithm. Table 1 lists relevant variables used in our complexity analysis. If we refer to the size of collection s:operations as jSj, then the time to carry out a hashtable-based check of the 2 operation (lines 6 and 11 in Algorithm 1) is O½logðjSjÞ. We first analyze the performance of the traditional exhaustive search mechanism (see Fig. 4a). An operation level composability check will iterate through all services in the scope and check each service against all other services. For a pair of services s1 and s2 , it checks whether s1 ’s operation

could consume any of s2 ’s operations, and vice versa. There are two ways to match up operations from s1 and s2 . The first is to iterate through s1 ’s operations and for each operation iterate through Noc to see if an consumable operation is in s2 ’s operation set. The second is to iterate through s1 ’s operations and then s2 ’s operations. For each operation found in s2 ’s operation set, check if it is consumable by the one found in s1 ’s operation set. Thus, the time to perform operation level comparison using an exhaustive search is   2   min Noi  Noc  logNoi ; Noi2  logNoc : Tof ¼ O Nws We now analyze the performance of our operation level filtering algorithms. According to Algorithm 2, the time to perform publishðopÞ is OðNops Þ. Likewise, from Algorithm 3, the time to perform subscribeðopÞ is OðNopp Þ. Thus, Tof can be calculated according to Algorithm 1 Tof ¼ O½Nop þ Nws  ðNoi  ðlogNop þ Nops þ Noc  ðlogNop þ Nopp ÞÞ þ logðjontjÞÞ ¼ O½Nws  ðNoi  ðNops þ Noc  Nopp þ ð1 þ Noc Þ  logNop Þ þ Nop þ logðjontjÞÞ: Comparing the performance of our filtering algorithms against that of an exhaustive search algorithm, we see that when Nop is relatively small and stable as compared to Nws , Tof in our filtering algorithm is linear to Nws , while Tof in a traditional exhaustive search is exponential to Nws . We conducted experiment on an XP machine with duo core 2.8 GHz to simulate the performance of the filtering algorithms. We focus in our experiment on investigating the relationship between the total processing time of the filtering algorithms and the number of Web services that are used as inputs to these algorithms. Table 2 lists the configuration variables used in our experiment. We use ns to denote the number of services in the input and s ratio the ratio of ontology index nodes to services. For each s ratio, we iterate through ns , which starts at 100 and doubles its values for each subsequent iteration, as indicated in Table 2. For each pair of (ns , s ratio), we run through the filtering algorithms 10 times. We then take the averages of the total processing time from these runs and plot them in Fig. 5. According to simulation results in Fig. 5, we see that the total filtering time is linear to the number of services used as input.

ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB

71

TABLE 2 Experiment Settings for Performance Simulation

Fig. 5. Filtering time versus number of services.

5.3 Static Verification and Linking Various measures have been proposed in [6] to determine whether two operations are composable at both syntactic and semantic levels. These measures can be used to determine whether a direct recognition-based composition is actually valid. For promotion-based and inhibition-based compositions, they are valid because the entities involved provide the corresponding services by declaration. For an indirect recognition-based composition, its validity can be determined by checking whether the preconditions and postconditions of source and target operations in a composition overlap. Statically verified service compositions are then linked together in the linking subphase into more comprehensive service composition networks. Within such a network, nodes would represent services, operations, operation interfaces, and parameters (see Fig. 2). Edges would represent relationships among these nodes. These relationships include: an operation implementing an operation interface, an operation consuming the implementation of an operation interface, an entity providing a service, a service providing an operation, and an operation consuming or producing a parameter.

6

POSTSCREENING EVALUATION

Not all service compositions discovered during the screening phase are necessarily interesting and useful. The purpose of postscreening analysis and evaluation is to identify those that are truly interesting and useful. Postscreening evaluation consists of four distinct steps: objective evalutaion, interactive hypothesis formulation, simulation, and subjective evaluation.

6.1 Objective Evaluation Objective evaluation aims at using objective measures to evaluate the interestingness and usefulness of composed Web services. In this section, we first introduce the concepts of operation similarity, domain correlation and domain unrelatedness. These concepts are used in our objective measures for interestingness and usefulness, which are described thereafter. 6.1.1 Operation Similarity The concept of operation similarity is relevant when we study the interestingness of an indirect recognition-based composition. The similarity of two operations can be measured by comparing their input parameter set, output parameter set, preconditions and postconditions. We use

the following function to measure the similarity between opi and opj : Simðopi ; opj Þ   jPin ðopi Þ \ Pin ðopj Þj jPout ðopi Þ \ Pout ðopj Þj  ¼ cp jPin ðopi Þ [ Pin ðopj Þj jPout ðopi Þ [ Pout ðopj Þj   jCpre ðopi Þ \ Cpre ðopj Þj jCpost ðopi Þ \ Cpost ðopj Þj þ cc  ; jCpre ðopi Þ [ Cpre ðopj Þj jCpost ðopi Þ [ Cpost ðopj Þj ð5Þ where cp and cc are weights such that 0  cp ; cc  1 and cp þ cc ¼ 1. jP j and jCj give the size of parameter set P and condition set C, respectively. When dealing with parameters, operators \ and [ are based on ontological overlap and union of two concepts. For conditions, these two operators would find the overlap and union of two concepts that are either the same or synonyms. According to (5), Simðopi ; opj Þ ranges from 0 to 1, with 1 indicating that the two operations have the same parameters and conditions. Note that (5) focuses only on the external observable similarity between two operations.

6.1.2 Domain Correlation and Unrelatedness Domain correlation  measures the relevance of two domains di and dj or the cohesion of the same domain (when i ¼ j). The relevance of di and dj can be reflected by the composability among operations from the two domains. When i ¼ j, this relevance becomes the measure of cohesion of a single domain. Based on heuristics, domain correlation is defined as 1 0 ðnþ1Þ

"

½di ; dj  ¼ e

;

ð6Þ

where n is the number of unique pairs of operations, fðopi ; opj Þ j opi 2 di ; opj 2 dj g, which are previously known to have been involved in a composition. When n ¼ 0, the correlation between two domains in (6) is assigned an 1 initial value of 0 ¼ e "0 . Equation (6) shows that  for two domains quickly approaches 1 as n increases. We define the multiplicative inverse of the domain correlation as domain unrelatedness . We bound the maximum value of  to 1, thus, ½di ; dj  ¼

0 : ½di ; dj 

ð7Þ

Both (6) and (7) can be used later to measure the diversity of components involved in a service composition, as discussed in the following section.

72

IEEE TRANSACTIONS ON SERVICES COMPUTING,

6.1.3 Objective Interestingness Interestingness indicates how interesting a Web service composition discovered from the screening phase really is. We consider two types of application area where this needs to be determined. The first is e-government and e-commerce, where Quality of Web Service (QoWS) plays an important role in the case of direct recognition-based composition. We say that the composition is interesting if it exhibits better qualities than all previously discovered similar operations. The second application area is biological pathway discovery, where the significance of QoWS is less important as we don’t expect any occurrences of direct recognition-based composition. In this section, we attempt to devise an objective interestingness measure factoring in various considerations for the other three recognition mechanisms that are applicable to both application areas. We define interestingness I as a tuple ½A; N; Uq ; Dv ; W , where A is the actionability of the composition, N is the novelty of the composition, Uq is the uniqueness of the composition, Dv is the diversity of the composition constituents, and W is the product of expert-assigned weights wi (wi > 1) to DOINs and operationQinterfaces that are involved in a composition: W ¼ m i¼1 wi . We choose to multiply all such weights involved in a composition to reflect their subjective interestingness-enhancing effect. Actionability. We define actionability as a binary (i.e., 1 for actionable, 0 for nonactionable) representing whether the composability of a composition can be verified through simulation or live execution. A nonactionable composition is considered uninteresting. Thus, actionability contributes multiplicatively toward the overall interestingness. Novelty. We define novelty as a binary (i.e., 1 for novel, 0 for old or known). The source of this information may be a database or registry that keeps track of known service compositions. An old or known composition is considered uninteresting. Thus novelty also contributes multiplicatively toward the overall interestingness. Uniqueness. Uniqueness Uq measures how unique a composition is. We use the following function to calculate uniqueness: 8 promotion or inhibition > < 1; Uq ¼ 1  Maxop2D > : SimðcompðOPs ; opt Þ; opÞ; indirect recognition: . . . . .

ð8Þ For both promotion and inhibition, the uniqueness is set to 1 due to the validity of the composition. For indirect recognition, D is a reference set of domains. Obviously, the more similar the composed operation is to an existing operation, the less unique it is regarded. Uq can thus vary between 0 and 1. Diversity. Diversity Dv indicates how diverse components involved in a Web service composition are. We use the following objective function to measure diversity:

Dv ¼

VOL. 2,

8 < ds 2DðsÞ ½dðopÞ; ds ; :

NO. 1, JANUARY-MARCH 2009

promotion or inhibition

op2OPs ½dðopÞ; dðopt Þ; indirect recognition: ð9Þ

If we replace  with , (9) can be rewritten as 8 0 promotion or inhibition < ds 2DðsÞ ½dðopÞ;ds  ; Dv ¼ ð10Þ 0 : op2OPs ½dðopÞ;dðopt Þ ; indirect recognition: For promotion or inhibition involving op and s, DðsÞ is a set of domains s is involved with. In all cases, dðÞ gives the domain of a given service operation. According to (10), in the case of promotion or inhibition, the more domains s is involved with and the less the correlation between these domains and that for op, the higher the value of diversity. In the case of indirect recognition, both the number and the domains of source operations op 2 OPs contribute to the overall diversity. The more source operations that are involved in the composition and the less the correlation between their domains and that of the target operation, the higher the value of diversity.

6.1.4 Objective Usefulness While it is difficult to objectively quantify the usefulness of new properties emerged out of a service composition using a simple usefulness measure, we take into consideration elements of usefulness that can be objectively evaluated for the cases of direct and indirect recognition. Due to the prevalent use of QoWS attributes in e-commerce and e-government, usefulness in these settings could be calculated for the following two cases: Case 1. There are multiple operations competing to provide each of the input parameters of a completely bound operation within a group of relevant leads (GRLs) centered around an operation interface. Each of these competing operations may, in turn, have implementations provided by multiple services. If the overall composition provides the same or similar function not seen before, then the usefulness can be at least partially calculated using the overall QoWS. The purpose of doing that is so that all the candidate compositions can then be compared to find out which one of them exhibits the highest QoWS. Case 2. If a composition provides a function that is the same as or similar to an existing function, usefulness could be expressed through its QoWS improvement achieved over the existing function. In either case, QoWS-based usefulness can be determined either statically if QoWS is registered over time and become wildly known or measured at runtime if they are vague and have to be determined dynamically. 6.1.5 Evaluation of Objective Interestingness and Usefulness When multiple compositions are identified through the screening phase, the candidate pool for further consideration can be reduced through selecting service compositions that exhibit the highest values of objective interestingness and/or usefulness. This can be done using two approaches, namely, weighted function and skyline methods.

ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB

73

Weighted function. A weighted function taking into account interestingness measures can be devised as follows: I ¼ ANðcu Uq þ cd Dv Þ

m Y

wi ;

TABLE 3 Experiment Settings for Interestingness Simulation

ð11Þ

i¼1

where cu and cd are weights for the weighted function such that 0  cu ; cd  1 and cu þ cd ¼ 1, m is the number of expert-assigned weights wi (wi > 1) to operation interfaces and DOINs that are involved in a composition. Likewise, a similar weighted function on usefulness in the e-commerce environment can be calculated using 8X X q  qmin qmax  q > > wq þ w ; case 1; > < q2pos qmax  qmin q2neg q qmax  qmin U¼ X X q q > > wq  wq ; case 2; > : q q  q  qmin max min max q2pos q2neg ð12Þ where pos is a set of quality attributes (e.g., reliability) that contribute positively toward the weighted function while neg is a set of quality attributes (e.g., response time) that contribute negatively toward the weighted function. wq are weights assigned P by users to each quality attributes such wq ¼ 1. q represents the value of an that wq  0 and aggregate lead QoWS attribute. q ¼ qcomposition  qexisting . qmax and qmin are, respectively, the maximum and minimum values of the same attribute among all leads in the GRL. The first half of (12) is essentially a combined form of the scaling phase and weighting phase proposed in [8]. The second half of (12) extends the function to the improvement of QoWS calculated between two compositions. A weighted function, once configured with all the weights, can be rather simple to use. Unfortunately, the configuration itself requires the user to first express his/her preferences over several interestingness-related measures or quality attributes as numeric weights. Often the user has to go through a time-consuming trial and error process, as the data are being presented, to arrive at a desired combination of such weights. Skyline. Another popular approach in selecting service compositions that exhibit high values of desired properties is the use of skyline operator, which originates from the database community [9]. A skyline is defined as a set of objects that are not dominated by other objects. In a multiobjective environment such as those listed in the interestinginess tuple, composition compa dominates composition compb if compa exhibits better value in at least one dimension than does compb and values as good as or better than does compb in all other dimensions. The skyline operator addresses well the problem faced by the weighted function as the user is not required to come up with an optimal combination of all the weights used in the weighted function.

6.1.6 Interestingness Simulation In our simulation, we focus on investigating the interestingness skyline of service compositions. In particular, we focus on the study of interestingness of compositions obtained through indirect recognition since they require more computation according to (11), (8), and (10). Table 3 lists the configuration variables used in our experiment.

Since both actionability and novelty are boolean variables, we ignore them in our simulation. During each iteration of our mining algorithms, we pick a value for the number of operation interfaces per domain and use that to populate operations in 50 domains. For each domain operation, we generate its input/output parameters such that the number of these parameters uniformly falls in the range of 0-5. Each of these parameters is associated with a DOIN, which is identified with a sequence number. For simplicity, we flatten all the DOINs (i.e., no inheritance and composition relationships among ontology nodes) so that only exact matches and synonyms will be considered. We place these DOINs (50,000 of them for the experiment) in a circular buffer so that the last sequence number is next to the first one. To study the contribution of user assigned weights on DOINs toward the interestingness, we randomly choose 100 nodes (50;000  0:002) using a uniform distribution and assign a weight uniformly distributed between 1.0 and 5.0. To simulate the cohesive nature of DOINs in a domain, we pick them for the domain using a Gaussian distribution around a mean sequence number randomly chosen for the domain according to a uniform distribution. We assume that each parameter has an equal chance of being associated with a DOIN. To simulate the pre- and postconditions, each parameter is symbolically given a range randomly chosen between 0 and 1.0 using a uniform distribution. We use the overlap of two such ranges (see (5)) to calculate the contribution of these conditions toward the similarity of two operations. During each mining iteration based on the chosen number of operation interfaces per domain, no , we calculate uniqueness, diversity, and weight product for compositions discovered in the iteration. These values are then normalized using the following equation: v¼

v  vmin : vmax  vmin

ð13Þ

The interestingness skylines can be shown as a surface formed in a 3D space with compositions’ uniqueness Uq , Q diversity Dv , and weight product m i¼1 wi as the coordinates. Fig. 6 uses circles to highlight skyline points. It shows the interestingness skylines for different numbers of operation interfaces per domain. We see that as this number increases, the number of discovered compositions also increases dramatically. However, the interestingness skyline keeps a population of top candidates with a relatively stable size.

74

IEEE TRANSACTIONS ON SERVICES COMPUTING,

VOL. 2,

NO. 1, JANUARY-MARCH 2009

Fig. 6. Skylines versus number of operations. (a) 50 operations/domain. (b) 200 operations/domain. (c) 500 operations/domain.

6.2 Interactive Hypothesis Formulation To help user formulate hypotheses that would lead to the identification of ultimately interesting and useful service compositions, we have developed strategies aiming at providing user with visual aids toward that goal. We describe these strategies in this section and focus on the aspect of interestingness as we apply these strategies later to the discovery of pathways. 6.2.1 Identification of Interesting Segments in a Composition Network After a service composition network is discovered from the screening phase, the identification of interesting segments within the network would help user focus more toward them as they are expected to become part of the final outcome from the service mining process. The evaluation strategies discussed in Section 6.1.5 can be used, in general, to identify compositions of high interestingness. 6.2.2 Establishment of Fully Connected Graph Once interesting edges in a service composition or pathway network are highlighted to the user, he/she can then use them as hints in selecting nodes of interest for further exploration. We have developed the following strategy to link both interesting edges and user -elected nodes into a connected graph to the extent possible. Coalesce nodes (e.g., a, b, and c in Fig. 7) linked by interesting edges into a group. 2. Convert interesting nodes (e.g., t picked by user) and groups encompassing interesting nodes (e.g., c, f) into nuclei, i.e., graph expansion focus nodes. 3. Incrementally expand all the nuclei. We use the heuristics of connecting all the interesting nodes using as many interesting edges as possible. To achieve this, whenever a newly encountered node is part of a nonnucleus group (e.g., one that contains h, i, and j), an additional expansion is also triggered and the whole group is engulfed. The expansion stops when all nuclei are connected or when all nodes in the graph are visited. We omit listing of corresponding algorithms due to page limit. Connected graphs identified using this process are then presented to the user as basis for hypothesis formulation.

6.3 Runtime Simulation Potentially composable Web services are identified in the screening phase based on ontologies, which contain static information about the semantics of those services. Such composability may not survive the dynamic runtime configuration that contains realistic external conditions and interdependencies among service operations. For example, a biological Web service may specify conditions necessary to enable or disable a biological process. These conditions may include temperature, parameter locale and quantity, kinetic energy, etc. While it may not be feasible to account for the external conditions and operation interdependencies during the screening phase since they would force it to become too selective and computationally more expensive, they need to be considered in the evaluation phase to ensure that the pathways identified in the screening phase really exist. The verification of pathway validity can be carried out using a simulation environment, where functions of biological Web services can be invoked in the order as identified in pathway leads. A pathway lead identified in the screening phase indicates the potential possibility of a pathway based on service and operation recognition. Verification aims at determining if segments of an identified pathway lead can indeed be enabled with a chain of relevant conditions. The

1.

Fig. 7. Expansion of interesting segments in graph.

ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB

75

second important aspect of runtime simulation is its ability to support predictive analysis. For example, based on pathway leads established from the screening phase and later highlighted in the interactive hypothesis formulation subphase, the user may attempt to predict certain outcome from indirect relationships derived from the way the pathway network is laid out. Such prediction can be tested out using a simulation strategy outlined in Algorithm 4. Algorithm 4. Simulation Algorithm Input: Pathway Network P N, function fðÞ determining initial number of instances for an entity type, total number of iterations I, upper bound S for random number generator random with uniform distribution Output: Statistics Stats Variables: entity type et, entity instance container ContainerðetÞ of type et, operation op, input entity opin , output entity opout and precondition oppre of op 1: for all et 2 P N do 2: ContainerðetÞ create fðetÞ instances; 3: end for 4: Stats Tally entity quantities in each container; 5: for i ¼ 0 to I do 6: for all op 2 P N do 7: s op:getP roviderServceðÞ; 8: et parameter op:getInputParameterðÞ:getEntityTypeðÞ; s:getP roviderEntityT ypeðÞ; 9: etprovider 10: if etparameter ¼ etprovider then 11: n number of entities of type etprovider that match oppre 12: else 13: n number of entities of type etparameter 14: end if 15: // Calculate the number of times to invoke the operation 16: n n=Sþððrandom:nextIntðSÞ < ðn modulo SÞÞ?1 : 0Þ; 17: for j ¼ 0 to n do 18: if 9opin 2 Containerðetparameter Þ : opin matches oppre then invokeðopÞ with opin ; 19: opout 20: if etparameter 6¼ etprovider ^ provider is consumable then 21: Containerðetprovider Þ:removeð0Þ; 22: end if 23: if etparameter 6¼ etprovider _ provider is consumable then 24: Containerðetparameter Þ:removeðopin Þ; 25: end if opout :getEntityT ypeðÞ; 26: etparameter 27: Containerðetparameter Þ:addðopout Þ; 28: end if 29: end for 30: end for 31: Stats Tally entity quantities in each container; 32: end for When an operation is to be invoked, the algorithm checks two factors. First, it examines whether all the preconditions of the operation are met. An operation that does not have

Fig. 8. Truth Table for Removing Entity Instance.

available input entities meeting its preconditions should simply not be invoked. Second, it determines how many instances are available for providing the corresponding service. This factor is needed in the case of pathway discovery due to the fact that biological entities of the same type each has a discrete service process that deals with input and output of a finite proportion. This differs from traditional business service processes that are often represented as collective singletons for a given organization (e.g., credit check, loan approval). The available instances of a particular biological entity that provides a service will drive the amount of various other entities they may consume and/or produce. For this reason, the algorithm treats each entity node in a pathway network as a container of entity instances of the noted ontology type. We determine the number of times an operation should be invoked based on the quantity of the corresponding service providing entity (lines 7-16). To make sure that an operation from a service providing entity of a small quantity also gets the chance to be invoked, a random number generator is used (line 16). Fig. 8 shows the logic used in Algorithm 4 for removing an entity in the corresponding entity container.

6.4 Subjective Evaluation In addition to objective measures for both interestingness and usefulness, user evaluating these aspects of a service composition may choose to use subjective measures. The reference base of such measures may be personal knowledge, belief, bias, and needs. Unfortunately, approaches based solely on subjective measures tend to inhibit us from getting interesting and useful compositions that were not thought of. An extreme case of relying on subjective measures to carry out Web service mining is the traditional composition approach where the user issues a query specifying the composition in pursuit to start the search process. Our approach allows objective measures to be used first to reduce the population size of the candidate pool and pushes the more expensive subjective evaluation toward the end where the population size is presumably much smaller. Results from runtime simulation are presented to the user, who can then correlate them with the hypotheses made earlier to see if they can be either confirmed or rejected. Based on such analysis, the user may then make the ultimate determination as to whether the pathway under investigation is really interesting and useful.

76

IEEE TRANSACTIONS ON SERVICES COMPUTING,

VOL. 2,

NO. 1, JANUARY-MARCH 2009

Fig. 9. Conceptual models of biological processes.

7

APPLICATION TO PATHWAY DISCOVERY

Limitations of existing biological process representation approaches motivated us to propose to model these processes as Web services [10]. To demonstrate the effectiveness of our mining framework, we applied it to the discovery of pathways linking these service-oriented processes. To prepare for our experiment, we first compiled a list of conceptual models of biological processes based on [11], [12], [13], [14], and [15]. In addition to describing process models, these sources also reveal some simple relevant pathways that can be manually put together, as shown in Fig. 9, where each subfigure represents models constructed based on information obtained from a single source. Ontology concepts (Fig. 9a) are used by these models to define the type of service providing entities and operation input/output parameters. Multiple examples of promotion, inhibition and indirect recognition can be found in these simple pathways. For example, Fig. 9c shows that upon injury, LTB4 recruits Neutrophil, promoting its service of producing COX2. Fig. 9d shows that Gastric Juice’s service can inhibit the services of both Stomach Cell and Mucus. Example of indirect recognition can be found in Fig. 9e, where PLA2’s service can liberate Arachidonic Acid, which can, in turn, be used as input to either the produce PGG2 operation of COX1’s service or the produce PGE2 operation of the COX2 service. In practice, we envision that research labs (i.e., model sources) can publish their discoveries of individual biological processes independently using the vehicle of Web services. Based on these models, we constructed corresponding WSDL services, wrapped them using

WSML [16] and deployed them into a WSMX [17] runtime environment [10]. We use simple pathways manually constructed here as references when we later check the correctness of pathways automatically discovered using our mining algorithms. Fig. 10 gives a snapshot of a pathway network automatically discovered. To enable the identification of interesting service compositions (i.e., segments within a pathway network), we extended each WSML service modeling a biological process to declare the modeling source in its nonfunctional properties (nfp) section. Based on comparison of such information from edges in the pathway graph involved in recognition patterns in Figs. 2b, 2c, and 2d, our interestingness evaluation algorithm is then used to highlight those that are determined novel. These highlighted edges provide the user with some visual clues aiding the manual selection of interesting nodes to pursue further. Once nodes of interest are selected by the user, our graph expansion algorithm (Section 6.2.2) is then used to link interesting nodes and edges into a connected graph, which forms the basis for the user to formulate hypotheses. An example of these hypotheses may state that an increased dosage amount of Aspirin will lead to the relief of pain, but may increase the risk of ulcer in the stomach. To test out hypotheses such as these, an initial quantity representing units of service is assigned to all service providing entities at the beginning of the simulation (lines 1-3 in Algorithm 4). These quantities are expected to change as entities involved in the simulation interact with each other over time. From the two sample plots generated based on simulation results, we see that as the quantity of Aspirin increases from 10 in plot (a) to 40 in plot (b), there is an increase in the erosion

ZHENG AND BOUGUETTAYA: SERVICE MINING ON THE WEB

77

Fig. 10. Discovered pathway highlighted with interesting subgraph and sample simulation results.

of stomach by the gastric juice due to the increased suppression on the production of mucus that covers the stomach wall. We also notice (in plots (a) and others that are not shown in Fig. 10) that when the senseRelief operation is enabled, it tends to obliterate the trace of Aspirin’s impact on pain sensation due to the ‘leaky bucket’ effect it has on pain and relief signals. Once we disable this operation (see plot (b)) in our simulation, we see a dramatic association between the Aspirin dosage and the suppression on the amount of pain signal being generated. This together with the observation of Aspirin’s impact on stomach erosion as noted earlier essentially confirms the initial hypothesis from the user.

8

RELATED WORK

Web mining research focuses on applying data mining techniques to discover interesting patterns of data from the Web. In contrast, our research focuses on studying service behaviors that are intrinsically dynamic in nature, thus the need of dynamic invocation of services after the discovery of interesting service compositions. A comprehensive QoSbased service composition selection strategy is proposed in [8]. Our weighted function on usefulness ((12)) is based on this strategy. Ardagna and Pernici [18] take this a step further by considering the frequency of execution paths. In our framework, we don’t assume that such frequency is readily available. Xiong et al. [19] investigate how to configure Web services in a dynamically changing environment. In this regard, our research aims at the quick identification of best service compositions and thus focuses more on the initial selection of service compositions using usefulness measures. Lamparter et al. [20] rely heavily on user preferences in the selection of Web services. Such preferences lead to a typical top-down service composition

approach and are thus not taken advantage in our approach. A number of feedback and log-based approaches have been proposed to improve QoS and service composability measures. For example, Jurca et al. [21] propose a QoS monitoring scheme based on quality ratings from service clients, Dustdar and Hoffmann [22] rely on analyzing Web service execution log data to discover potential process workflow instances involving these services, and Liang et al. [23] rely on usage data at user, template, and instance levels to mine for Web service composition patterns. While these approaches may work well for business processes over time as user feedback and execution logs are expected to become available, the challenge of identifying interesting workflows in the absence of such feedback and logs, especially at the time when component Web services are just introduced, is still real. Our Web service mining framework allows the mining of interesting service compositions to be carried out in the absence of user feedback and execution logs. When applied to the field of pathway discovery, where the expedience of such discovery is the key to success, our approach enables the proactive discovery of interesting pathways upon the availability of these services.

9

CONCLUSION

In this paper, we proposed a Web service mining framework that enables the proactive discovery of interesting and useful service compositions. To address the challenge of combinatorial explosion, we developed mining algorithms that can scale well will grow number of Web services. We also discussed how interestingness and usefulness can be objectively evaluated. Finally, we presented a novel application of our framework to the discovery of pathways linking biological processes. Future

78

IEEE TRANSACTIONS ON SERVICES COMPUTING,

work includes improving the agility of our mining framework to accommodate for the dynamic expansion and evolution of WSML services. This would not only allow the framework to be checked against an expanding pool of Web services, but more importantly, ensure that the results of the mining process are updated to reflect the current availability and semantic description of service capabilities.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]

[9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]

[21] [22] [23]

Web Services Architecture—W3C Working Group Note, http:// www.w3.org/TR/2004/NOTE-ws-arch-20040211/, Feb. 2004. G. Zheng and A. Bouguettaya, “A Web Service Mining Framework,” Proc. IEEE Int’l Conf. Web Services (ICWS ’07), July 2007. P. Ball, Designing the Molecular World—Chemistry at the Frontier. Princeton Univ. Press, 1994. OWL-S: Semantic Markup for Web Services—W3C Member Submission, http://www.w3.org/Submission/OWL-S/, Nov. 2004. Web Service Modeling Ontology, http://www.wsmo.org/, 2009. B. Medjahed, A. Bouguettaya, and A.K. Elmagarmid, “Composing Web Services on the Semantic Web,” VLDB J., Sept. 2003. J. Augen, “The Evolving Role of Information Technology in the Drug Discovery Process,” Drug Discovery Today, vol. 7, pp. 315323, 2002. L. Zeng, B. Benatallah, A.H.H. Ngu, M. Dumas, J. Kalagnanam, and H. Chang, “QoS-Aware Middleware for Web Services Composition,” IEEE Trans. Software Eng., vol. 30, no. 5, pp. 311327, May 2004. S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” Proc. 17th Int’l Conf. Data Eng., pp. 421-430, 2001. G. Zheng and A. Bouguettaya, “Discovering Pathways of Service Oriented Biological Processes,” Proc. Ninth Int’l Conf. Web information Systems Eng. (WISE ’08), Sept. 2008. S.Y. Auyang, “From Experience to Design—The Science behind Aspirin,” http://www.creatingtechnology.org/biomed/aspirin. htm, 2009. C. Freudenrich, “How Pain Works,” http://health.howstuffworks. com/pain.htm, 2009. L. Hoffman, “How Aspirin Works,” http://health.howstuffworks. com/aspirin1.htm, 2009. M. Landau, “Inflammatory Villain Turns Do-Gooder,” http:// focus.hms.harvard.edu/2001/Aug10_2001/immunology.html, 2009. M.-J. Yin, Y. Yamamto, and R.B. Gaynor, “The Anti Inflammatory Agents Aspirin and Salicylate Inhibit the Activity of IB kinase,” Nature, vol. 369, pp. 77-80, Nov. 1998. The Web Service Modeling Language WSML, http://www.wsmo. org/wsml/wsml-syntax, 2009. Web Services Execution Environment, http://sourceforge.net/ projects/wsmx, 2009. D. Ardagna and B. Pernici, “Global and Local QoS Constraints Guarantee in Web Service Selection,” Proc. IEEE Int’l Conf. Web Services (ICWS ’05), July 2005. P. Xiong, Y. Fan, and M. Zhou, “QoS-Aware Web Service Configuration,” IEEE Trans. Systems, Man, and Cybernetics, Part A, vol. 38, no. 4, pp. 888-895, 2008. S. Lamparter, A. Ankolekar, R. Studer, and S. Grimm, “Preference-Based Selection of Highly Configurable Web Services,” Proc. 16th Int’l Conf. World Wide Web (WWW ’07), pp. 1013-1022, 2007. R. Jurca, B. Faltings, and W. Binder, “Reliable QoS Monitoring Based on Client Feedback,” Proc. 16th Int’l Conf. World Wide Web (WWW ’07), pp. 1003-1012, 2007. S. Dustdar, T. Hoffmann, and W. van der Aalst, “Mining of AdHoc Business Processes with TeamLog,” Data and Knowledge Eng., http://citeseer.ist.psu.edu/dustdar04mining.html, 2005. Q.A. Liang, J.-Y. Chung, S. Miller, and Y. Ouyang, “Service Pattern Discovery of Web Service Mining in Web Service RegistryRepository,” Proc. IEEE Int’l Conf. e-Business Eng. (ICEBE ’06), pp. 286-293, 2006.

VOL. 2,

NO. 1, JANUARY-MARCH 2009

George Zheng received the BS degree in electronics engineering from Shanghai Jiao Tong University, China, in 1986, the MS degree in electrical engineering from the University of Virginia, Charlottesville, in 1991, and the MS degree in computer science from Johns Hopkins University, Baltimore, Maryland, in 1997. He received the PhD degree in computer science from the Virginia Polytechnic Institute and State University, Blacksburg, in 2009. He is currently a principal systems engineer with Science Applications International Corporation (SAIC). His research interests include Web services mining, bioinformatics, workflow, software simulation, and systems integration. Athman Bouguettaya received the PhD degree in computer science from the University of Colorado at Boulder in 1992. He is a science leader at CSIRO ICT Center, Canberra. He was previously a tenured faculty member in the Computer Science Department at Virginia Polytechnic Institute and State University (commonly known as Virginia Tech). He is on the editorial boards of several journals, including the IEEE Transactions on Services Computing, the International Journal on Web Services Research, the VLDB Journal, the Distributed and Parallel Databases Journal, and the International Journal of Cooperative Information Systems. He was invited to be a guest editor of a special issue of Computer on trust management in Web service environments and a special issue of Internet Computing on database technology on the Web. He also guest edited a special issue of the ACM Transactions on Internet on Semantic Web Services. He served as a program chair of the 2008 International Conference on Service Oriented Computing (ICSOC) and the IEEE RIDE Workshop on Web Services for E-Commerce and E-Government (RIDE-WS-ECEG 2004). He has served on numerous program committees of database and service-oriented computing conferences. His current research interests are in service-oriented computing. He is a senior member of the IEEE and the ACM.