XML-RPC Agents for Distributed Scienti c Computing - Semantic Scholar

4 downloads 2669 Views 190KB Size Report
School of Computational Science and Engineering. Florida State University. Tallahassee, FL ... and data interoperability problem between these type of codes is another important concern. The handling of ..... Technical report,. Thayer School ...
XML-RPC Agents for Distributed Scienti c Computing Robert van Engelen Kyle Gallivan Gunjan Gupta Department of Computer Science School of Computational Science and Engineering Florida State University Tallahassee, FL 32306-4530

George Cybenko Thayer School of Engineering Dartmouth College Hanover, NH 03755-8000

Abstract

This paper presents the use of XML-RPC to achieve data interoperability between scienti c applications in a distributed environment. Remote procedure calling with XML-RPC is programming language independent and operates across di erent platforms. We have designed and implemented tools for the automatic generation of XMLRPC stub routines and XML serialization converters to support application data interoperability in componentbased problem-solving environments for distributed scienti c computing. Locating and indexing XML-RPC services is performed using mobile agents. The agents serve to setup problem-solving sessions and to connect remote applications.

1 Introduction The integration of large-scale scienti c models, such as coupled ocean-atmosphere models, is a tremendous programming task. The degree of sophistication of a model implementation has to be high in terms of data structure compatibility and algorithmic selection to support an ecient exchange of simulation results. For this reason, uniform model development is a primary concern of scienti c application development [4]. The reuse of \legacy" codes and data interoperability problem between these type of codes is another important concern. The handling of data shared between disparate systems requires the de nition of persistent representations for that data, and corresponding interfaces to access and use that data. A representation of persistent data must inevitably inter-operate with humans or mechanisms that use other persistent representations [10]. We distinguish syntactic data interoperability and semantic data interoperability. The rst is implemented by a translator. Data is simply transformed from one persistent representation to another, typically in discrete (or batch) operations. This is distinguished from semantic data interoperability, which transforms data from one persistent representation into an implementation of an interface. This requires both transformation of the data into a new representation and the representation's interface implements the semantic behavior corresponding to the new persistent form of the data. Achieving semantic data interoperability is often described as \legacy integration", or \legacy wrapping" and is a universal recurring problem which up to now has typically been solved by ad-hoc means. In true semantic data interoperability, an object's representation is translated into another representation depending on the intended use of the object. For example, a satellite image's representation is translated into a set of parameters describing the percentages of land use re ecting agricultural development. Clearly, the data content is the same (although information may be lost in the translation process) but its intended use di ers. The interface of the data representation in the form of an application programming interface (API) determines the semantics of the communicated objects for semantic data interoperability. Most numerical applications are developed in imperative programming languages, e.g. Fortran and C. Although the implementation of numerical algorithms in these languages seems to be a natural choice, the problem of interapplication data interchange is exacerbated by the lack of object serialization and standards for remote method invocation (RMI) in these languages. Java object serialization is a technique that implicitly maps the data structures of an object to a uniform linearized representation that can be sent, stored, and retrieved by Java applications. Thus, the object can be shared outside the address space of an application by other Java applications. Java RMI enables applications to invoke methods and call functions of remote applications. Hence, the use of Java RMI allows di erent Java applications to be integrated in a distributed system, using, e.g., Java Voyager [11], and enables the implementation of component-based distributed computing environments [14]. Typically, in the component-based approach, one has to add a Java API to an existing application and register the application and its Java API as a component. Well known examples of component-based projects in the context of scienti c computing are the component project [8], Globus [6], the Grid [7], and Information Power Grid (IPG) [13]. 1

2

2 XML-RPC The use of Java RMI for the integration of applications in a distributed system requires the implementation of Java wrappers for non-Java applications. This is a tedious task as implementors have to carefully program data structure conversions between the application and the Java wrapper. This approach incurs overhead because of the duplication of data, creates non-modi able serialized objects (syntactic data interoperability is obtained), and requires the sharing of Java (interface) class de nitions that cannot be modi ed without recompilation of the whole system. Web developments have brought an important innovation in data representation for semantic data interoperability: the XML (extensible markup language) standard. As an object representation, XML is a promising universal standard. XML has an important bene t over HTML: it is content based and allows semantic data interoperability. The layout of XML by Web browsers can be done automatically through XSL (XML style sheet). XSL is a speci cation of a translation of an XML object into HTML (or other XML formats). Other protocols and representation formats that can be used for semantic data interoperability are Multi Protocol [9], OpenMath [2], and MathML [1]. However, these protocols can be easily layered on top of XML, where XML serves to encapsulate the data. Another Web innovation is XML-RPC [3], which is a platform-independent remote procedure call protocol that works over the Internet. XML-RPC uses XML as the marshalling format. An XML-RPC message is an HTTP-POST request to execute a procedure on the server with parameters encoded in XML and the return value encoded in XML. As compared to simple messages to implement application interoperability, XML-RPC can also handle exceptional behavior. For example, a RPC may cause an exception to be raised on the remote machine. Instead of sending an unintelligble error message back to the caller, an XML encoded exception is send back. XML-RPC is well-suited for integrating disparate systems implemented in di erent programming languages and running on di erent platforms. XML-RPC is available for a number of scripting languages and Java. An XML-RPC API can be implemented in Fortran or C by mapping symbolic procedure names of XML-RPC calls to actual procedure calls and by translating XML parameters and return values into the internal data structures of the Fortran/C application.

3 Data Representation in XML XML parsers and generators are available for most programming languages. An application that adopts XML parsers and generators builds an XML document object model (DOM), which is a tree-like data structure that closely resembles the structure of an hierarchical XML object. Internal data structures of the application have to be translated into the DOM. For hierarchical data structures, such as lists and trees, this translation is almost one-to-one. Many Web applications providing services can be characterized by operating on such hierarchical data structures. Consider for example Web pages and database records. The translation of these type of objects is simple. However, numerical applications are not necessarely constrained to hierarchical data structures. For example, numerical simulation software operating on irregular grids adopt indexing schemes and multiple levels of indirection on data structures used for nite di erence or nite element approximations. Care has to be taken for representing graph-like data structures in XML. More speci cally, pointers can introduce aliases, and an aliased part of the data structure must be translated and presented in XML only once. A producer of XML is responsible for generating a data structure in XML that can be translated by a consumer into a true copy of the original data structure. A naive use of XML and the DOM could lead to trees with replicated data, because of the hierarchical layout of XML in the DOM. Pointers pose another problem when pointers are allowed to refer to elements within arrays and records. For example, assume that array P is an array of pointers that point to elements of an array A. When P has to be produced in XML, the pointers must be traversed and the elements pointed to must be translated into XML only once such that an application consuming the XML can make a true copy of P and the elements to which P points. To produce copies of both P and A in XML such that an XML consumer can reconstruct true copies of P and A requires that array A has to be translated into XML rst so it can be reconstructed by an application that consumes the XML. Then, P can be output where each pointer is replaced by a reference to the start of A and an o set to the element. The example illustrates the tedious task of implementing data conversions between an application's internal data structures and XML in the presence of indirection (pointers). The problem is exacerbated by the well-known problem that the inspection of data structure declarations alone cannot reveal the exact usage of the data structure in question. Consider for example the C data structure shown in Fig. 1 (a). Although the declaration suggest that it is a tree (because of the use of left and right eld names), there is no limitation for using this data structure as a graph in which a node is referred to by more than one left or right pointer from another node. Every pointer must be treated as potentially aliasing.

3

struct Node { int val; struct Node *left; struct Node *right; } (a)

5 3

8 7

9 (b)

5 3 8 7 9

(c)

3 7 0 0 9 0 0 8 2 3 5 1 4

(d)

val 5 left val 3 left null right null right val 8 ... (e)

Figure 1: XML Representations The process to translate arbitrary data structures to XML can be loosely refered to as XML serialization. XML serialization converts graph-like data structures into XML by recursively traversing the graphs converting nodes to XML and marking the nodes that are converted. A mapping from pointers to some indexing scheme in XML can be employed (thereby also enabling XPointers, the XML pointer standard). Fig. 1 (b) depicts an example tree data structure and Fig. 1 (c) depicts the example tree in some possible XML format. No pointer indexing is required for trees. However, static analysis of the source data structure declaration Fig. 1 (a) does not limit the use of the data structure to trees, so aliases have to be assumed. Fig. 1 (d) depicts the example tree in XML using an indexing scheme. Each node has a def struct Node tag that de nes the contents of the node. The nodes are uniquely identi ed by an integer index generated by an in-order tree traversal (hence the root node is de ned below). A pointer to a node is replaced by the ptr struct Node tag with the integer index. Fig. 1 (e) depicts part of the XML-RPC format of the tree structure. However, the data part of the protocol is restrictive, because it does not support graphs. The XMLRPC protocol de nes records and arrays and some primitive data types. A severe restriction of the protocol is the lack of constructs for indirection with pointer types and constructs for passing methods (procedures) as parameters.

4 Problem-Solving Support for XML API Synthesis We have developed a compiler as part of a problem-solving environment for XML API synthesis. The compiler translates source C data structure declarations into serializing XML input and output routines for instances of the data structures. The output routines serialize internal data structures into XML and the input routines make an internal copy of the data structure on the heap. The copy of data structures with pointers is exact, but the pointer values di er as the copy is reproduced somewhere on the heap of the consuming application. Fig. 2 depicts part of the automatically generated output (a) and input (b) routines for the data structure in Fig. 1 (a). The compiler supports all data types, except unions. Pointers are constrained to point to single objects (except for character pointers which are considered strings). We developed two versions of the compiler. One version assumes that pointers point to the start of a data structure and cannot point within arrays and records to separate elements. Most C programs that use pointers in the implementation of standard data structures such as trees, graphs, matrices, etc., do not have pointers pointing to separate elements within arrays and records. This version generates ecient output and input codes, because the pointer aliasing checks are implemented in the output routines using hash tables. The XML and routines of this

4

int xml_def_struct_Node(struct Node *p) { int idx, idx_1, idx_2; if ((idx = lookup(p)) >= 0) return idx; /* already emitted */ idx = enter(p); /* put ptr in table */ idx_1 = xml_def_struct_Node(p->left); /* recursively define */ idx_2 = xml_def_struct_Node(p->right); /* ... left and right */ printf("", idx); xml_out_int(p->val); /* emit value of Node */ xml_out_ptr_struct_Node(idx_1); /* emit pointer to left */ xml_out_ptr_struct_Node(idx_2); /* emit pointer to right */ printf(""); return idx; } xml_out_ptr_struct_Node(int idx) { printf("%d", idx); }

int xml_in_defs() /* parses all def tags */ { for(;;) { if (xml_unit() == EOF) return EOF; if (strncasecmp(xml_tag, "def_", 4) == 0) { if (strcasecmp(xml_tag+4, "struct_Node") == 0) { struct Node *p = (struct Node*) xml_alloc(xml_id(), sizeof(struct Node)); /* alloc and put ptr in table */ xml_in_int(&p->val); /* get val field */ xml_in_ptr(&p->left); /* get left pointer */ xml_in_ptr(&p->list); /* get right pointer */ } else ... /* parse other objects */ xml_end(); } else return 0; } } xml_in_ptr(void *p) /* convert index to pointer */ { if (strncasecmp(xml_tag, "ptr_", 4) == 0) { xml_idx(&idx); /* get index from ptr_ tag */ p = xml_ptr(idx); /* get pointer to object */ xml_end(); /* ... from pointer table */ } }

(a)

(b)

Figure 2: Automatically Generated XML Output and Input Routines // module `examples' ... int draw(struct Node *p) { ... }

(a)

examples.draw 5

(b)

int examples_draw(struct Node *p) { int retval; int idx_1 = xml_def_struct_Node(p); printf("examples.draw"); printf(""); printf(""); xml_out_ptr_struct_Node(idx_1); printf(""); printf(""); xml_in_int(&retval); return retval; } (c)

Figure 3: Example XML-RPC Call version are shown in Fig. 1 and Fig. 2, respectively. The second version of the compiler generates output routines that adopt a two-phase serialization scheme using interval trees to detect overlap in memory of objects. The rst phase of the output analyzes the data structures for possible overlap (e.g. an array overlaps an element of that array). This second version of the compiler guarantees that consumers can always reproduce true copies, but the generated output routines are less ecient compared to the routines of the rst version. We are currently extending the compiler to generate XML-RPC routines for C functions. These routines implement RPC for arbitrary C programs by mapping XML-RPC to calls to C procedures. Fig. 3 (a) depicts an example C function for drawing a tree data structure with the type struct Node de ned in Fig. 1 (a). The XML-RPC remote procedure call of the draw function is shown in Fig. 3 (b), where the data structure in the gure is omitted (an example XML data structure can be found in Fig. 1 (d)). Fig. 3 (c) shows the generated stub routine that implements XML-RPC. A client uses the stub for a remote call to the tree draw routine of the server. After the RPC completes on the server side (not shown), the internal copy of the data structures are removed and an XML reply integer is send back to the caller1.

5 Locating XML-RPC Services In contrast to most of the searches done on the Web for non-scienti c information, a search for a scienti c computing service is very much determined by the matching of data structure types and algorithmic functionality provided by the service. Internet technology for locating, indexing, and retrieving static text-based documents is well developed, although certainly not ideal when applied in this area. Object request brokers, as de ned by CORBA for example, can in principle broker sessions between distributed components but the matching mechanisms are based on ontologies and keyword constructs that appear to be fragile when scaled to large heterogeneous networks [15]. The functionality of a computational service can be described in terms of its mathematical semantics which can be expressed in a lambda calculus like language (e.g. using OpenMath). However, some type of services cannot easily 1

The example stub does not show XML-RPC's exception handling capabilities.

5 be described by mathematical constructs alone. Some types of services can only be distinguished by using some universally understood reference (e.g., a satellite image provider). This is reference is provided by the XML-RPC APIs that identify the service. For example, the satellite image provider can be referred to simply by the object satellite with associated XML-RPC methods for retrieving satellite images. The advantage of using a form of lambda calculus for semantic content is the derivation of normal forms that guarantee uniqueness of representation. For example, suppose XML-RPC method y = solve (M; x) implements a solver for the matrix equation x = My. The semantics of this XML-RPC method can be de ned by (M; x):(M ?1 x). In this way, computational services can be indexed and matched based on semantic content. A search query for a solver can be provided as a lambda abstraction (A; b):(A?1 b) which is alpha-equivalent to (M; x):(M ?1 x). In addition, data types of the method parameters can be de ned as document type de nitions (DTDs). A DTD is a standardization of the XML object format and it enforces rules on tag naming, attributes, and overall hierarchy of the XML object representation. A DTD for matrix M can be considered a data type of the XML-RPC parameter.

6 Connecting XML-RPC Services We are currently investigating and experimenting with the use of mobile agents for connecting services in a component system. Agents are software systems that can act autonomously and at high \semantic levels". These high semantic levels suggest human-like functionality in information processing tasks. Example mobile agent systems are D'Agents [15, 5] and Java-to-go [12]. A mobile agent is given the freedom to transport and perform a variety of active computations at one or more remote agent servers, where an agent server is a networked process that provides the resources a agent needs to complete its goal. The use of mobile agents as object request brokers (ORB) exploits three important advantages compared to existing ORB systems:  Mobile agents can move to a large data resource as an alternative to moving the large data sets to a client [15]. The mobile agent performs a remote computation on the data and sends back only the relevant data products.  Mobile agents are dynamic and can make decisions based on the properties of the current environment. For example, an agent can decide to establish connections to servers that have lower process loads among the servers that o er the same applications. This enhances scalability.  Mobile agents can setup secure sessions between remote applications. Once the applications are activated by an agent, (encrypted) XML-RPC can be used between the active applications until the session ends. Mobile agents must be small-sized programs in order to be e ective for data transport. Since mobile agents for ORB have very speci c tasks to perform, their code size is limited. In our approach, they do not have to carry global network and information on the location of services. This type of information is maintained by proxy servers (part of the agent servers) that are assumed to reside on each machine an agent is able to move to. Proxy servers have updated information on networks and available XML-RPC applications. A data proxy server takes care of forwarding open I/O connections when agents move. A functional proxy server keeps a local copy of a registry of available services on the network by registering XML-RPC methods and their semantic descriptions in a local registry.

7 Example XML-RPC Agent Scenario The objective of this example scenario is to predict the impact of pollution of a river on the environment. To solve this problem, the scenario is as follows: rst the geometric data of the river location is obtained from a geographical information system (GIS). Then a simulator with the geometric information and the sources of the pollution creates simulation results. Finally, the simulation results have to be visualized and results are updated when new information on the pollution levels becomes available. The movements of the agent and data exchanged by the applications involved in the example are shown in Fig. 4. A component-based problem-solving environment (PSE) on machine A helped a user to de ne the scenario to solve the pollution modeling problem. The scenario is translated by the PSE on the machine into a sequence of XML-RPC calls. From machine A the PSE sends out an agent to connect to services (Fig. 4 (a)). The agent engages the functional proxy server on machine A to nd a server that can supply the geometry for the speci ed area adapted to river streams. In this case the agent moves to machine B and sets up a session between the applications where the session consists of XML-RPC calls to retrieve the river geometry and current pollution levels from the geographic information system on machine B.

6 Machine A

Machine A

Machine B

Machine C

Machine B

Machine C

Machine B

Machine C

Visualization

Component PSE

GIS

Simulator

Visualization

Component PSE

GIS

Simulator

Visualization

Component PSE

GIS

Simulator

API

API

API

API

API

API

API

API

API

API

API

Machine A

send Agent

data

dock

Agent

Agent Proxy Server

(a)

Proxy Server

Proxy Server

(b)

Proxy Server

API dock

Agent

Agent

move

move Proxy Server

dock

dock Agent

move Proxy Server

Proxy Server

Proxy Server

Proxy Server

(c)

Figure 4: Mobile Agent ORB in a Pollution Modeling Problem The next stage (Fig. 4 (b) is to establish a simulation of the pollution by a transport model. The functional proxy server of machine B directs the agent to machine C where it can nd the requested service. The data proxy servers maintain the communication channel with the GIS for XML-RPC. In this way, updated data from the GIS can be retrieved by the simulator. After docking to the simulator on machine C, the agent looks for a visualization tool on machine A (as speci ed in the problem description). The reason for using machine A is that it is more ecient to visualize data on the user's machine on which the software (and hardware) runs. Again, the functional proxy servers forward the agent to machine A and maintain the information streams between machines A, B, and C (Fig. 4 (c)).

8 Conclusions Our XML serialization and XML-RPC stub generation works well to achieve e ective data interoperability between disparate applications. Semantic data interoperability, however, may require the modi cation of data to meet speci c constraints of an application that handles the data. We are currently investigating methods to automatically (re)map XML into modi ed forms given a speci cation of constraints.

References [1] [2] [3] [4] [5]

[6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

MathML. the World-Wide Web Consortium. Available from http://www.w3.org/Math. OpenMath. the OpenMath Consortium. Available from http://www.openmath.org. XML-RPC (XML remote procedure call). http://www.xmlrpc.com. \Research council says US climate models can't keep up" (news article). Science, 283(5403):766, 1999. G. Cybenko, R.S. Gray, D. Kotz, and D. Rus. D'agents: Security in a multiple-language, mobile-agent system. Mobile Agent Security, Lecture Notes in Computer Science, 1998. I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. International Jnl. of Supercomputer Applications, 11(2):115{128, 1997. I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco, 1998. D. Gannon, R. Bramley, T. Stuckey, J. Villacis, J. Balasubramanian, E. Akman, F. Breg, S. Diwan, and M. Govindaraju. Developing component architectures for distributed scienti c problem solving. IEEE Computational Science & Engineering, 1998. S. Gray, N. Kajler, and P. S. Wang. Design and Implementation of MP, a Protocol for Ecient Exchange of Mathematical Expressions. Jnl. of Symbolic Computing, 1997. Mike Higgs and Bruce Cottman. Solving the data interoperability problem using a universal data access broker. I-Kinetics, Inc., http://www.componentware.com. ObjectSpace Inc. Java voyager, 1998. Available from http://www.objectspace.com/products/vgrOverview.htm. Weiyi Li and David G. Messerschmitt. Java-to-go: Itinerative computing using java. Department of Electrical Engineering and Computer Sciences University of California at Berkeley http://ptolemy.eecs.berkeley.edu/dgm/javatools/java-to-go/. Nasa Project. Information power grid (IPG), 1998. Available from http://www.nas.nasa.gov/IPG/. Clemens Szyperski. Component Software: Beyond Object-Oriented Programming. Addison-Wesley, 1998. Linda F. Wilson, Georrge Cybenko, and Daniel Burroughs. Mobile agents for distributed simulation. Technical report, Thayer School of Engineering, Dartmouth College, 1998.