Internet Computing, IEEE - CiteSeerX

Middleware Track

Editors: Doug Lea • dl@cs .oswego.edu Steve Vinoski • vinoski@ieee .org

Dermi: A New Distributed Hash Table-Based Middleware Framework Decentralized Event Remote Method Invocation (Dermi) is a peer-to-peer (P2P), decentralized event-based object middleware framework built on top of a structured overlay network. Using an event-notification service as the principal building block, Dermi makes three innovative contributions: P2P call abstractions, distributed interception, and a decentralized object-location service. The authors describe how to use these three pillars to build a wide variety of global-scale distributed applications and argue that Dermi is a solid foundational technology for future wide-area distributed component infrastructures.

Carles Pairot Gavaldà and Pedro García López Universitat Rovira i Virgili, Spain Antonio F. Gómez Skarmeta Universidad de Murcia, Spain

74

MAY • JUNE 2004

n creating global-scale Internet-based distributed applications, developers repeatedly face the same implementation issues, including object location, replication, mobility, and caching. Middleware plays a key role in addressing these challenges by providing a common higher-level interface for application programmers and hiding the complexity of myriad underlying networks and platforms. Middleware systems have a long tradition in centralized client–server local-area and metropolitan-area network environments, but very few global-scale middleware solutions exist. In wide-area scenarios, P2P networks gradually emerged as an alternative to traditional client–server systems for some application domains. P2P networks are exemplified by Gnutella-like systems, in

I

Published by the IEEE Computer Society

which all participants join in an anarchical way and messages flood the network (normally, messages travel only a few hops from their origin, making the system rather inefficient when searching for specific resources). Even using “random walkers” or expanding-ring techniques, which incrementally broaden the search space but generate more traffic if resources are far away, fails to elegantly solve the resource-location problem because it’s impossible to know how long it will take to find a given resource — if at all. The past two or three years have seen a revolution in the P2P research community with the introduction of structured P2P overlay networks, which offer an efficient, scalable, fault-resilient, and selforganizing substrate for building distributed applications. Inspired by several

1089-7801/04/$20.00 © 2004 IEEE

IEEE INTERNET COMPUTING

Dermi

applications that have emerged as a result of these structured P2P substrates — including wide-area storage systems such as Cooperative File System (CFS) and Past,1 event notification services such as Bayeux and Scribe,2 and even collaborative spamfiltering systems like SpamWatch — we developed the Decentralized Event Remote Method Invocation (Dermi) system. A completely decentralized event-based object middleware built on top of a structured P2P overlay, Dermi’s primary objective is to provide developers the necessary abstractions to develop wide-area-scale distributed applications. It uses a P2P publish–subscribe event system and offers several services to the application layer: P2P call abstractions, a decentralized way to locate objects, and a distributed interception service. In this article, we describe Dermi’s architecture and services, provide some empirical results derived from experiments and simulations, and discuss possible uses for the system.

Dermi Architecture Structured P2P overlays deliver interesting services, such as distributed hash tables (DHTs), decentralized object-location and routing facilities, and scalable group multicast–anycast, which provides upper-level applications with an abstraction layer to access these services transparently. For example, the DHT abstraction provides the same functionality as a hash table — associating key-value mapping with physical network nodes rather than hash buckets, like traditional hash tables. The standard put(key, value) and get(key)interface is the entry point for any application using the DHT. Dermi is built on top of a decentralize keybased-routing (KBR) P2P overlay network. It benefits from the underlying services provided by the P2P layer (see Figure 1), including group multicast and anycast (Cast), the DHT abstraction, and a decentralized object location and routing (DOLR) layer. Moreover, Dermi uses the Past object-replication and caching system.1 Our system models method calls as events and subscriptions using the API provided by the Cast abstraction (which models a wide-area event service). A prototype implementation of Dermi is currently available at our Web site (http://ants.etse.urv.es/DERMI). After analyzing several existing P2P overlay substrates (Chord,3 Tapestry,4 and Pastry5), we selected Pastry, which provides efficient routing because of its keen awareness of underlying network topologies. We used Scribe,2 a large-scale decentralized application-level multicast infra-


Decentralized event remote method invocation (Dermi)

Tier 2

Tier 1

Object replication and caching (PAST)

Distributed hash table (DHT)

Tier 0

Scribe

Group multicast and anycast (CAST)

Decentralized object location and routing (DOLR)

Key-based routing layer (KBR)

Figure 1. Dermi architecture. This system uses abstractions built on a key-based-routing (KBR) substrate as its main building block. structure built on top of Pastry, as our publish–subscribe message-oriented middleware. Scribe provides a more efficient group-joining mechanism than other existing solutions, and it also includes multisource support. The availability of an opensource implementation of Pastry and Scribe — FreePastry, developed in Java at Rice University (http://freepastry.rice.edu) — also simplified the choice. However, we could have used any other P2P DHT-based overlay network, because they share the same basic functionalities. In fact, designers of the principal DHTs in operation (Chord, Tapestry, and Pastry) already considered a proposal for all of them to follow a common API.6 Dermi was strongly inspired by the Java remote method invocation (RMI) object middleware, which lets developers create distributed Java-to-Java applications in which remote Java object methods can be invoked from other Java virtual machines. It also uses object serialization to marshal and unmarshal parameters and does not truncate types, thus supporting true object-oriented polymorphism. Following such a model, Dermi provides a dermi.Remote interface, a dermi.RemoteException class, and a dermi.Naming class to locate objects in our decentralized registry. Our system includes dermic, a tool that automatically generates stubs and skeletons for our remote objects. Together, these transparently manage object publications — subscriptions and their inherent notifications. Further, Dermi currently provides many other features found in Java RMI, such as remote exception handling, pass by value and by reference, and dynamic class loading.

www.computer.org/internet/

MAY • JUNE 2004

75

Middleware Track

One-to-one

One-to-many

Direct

Hopped

Multicall

Synchronous

Synchronous

Synchronous

Asynchronous

Asynchronous

Asynchronous

Anycall

Manycall

Sequential

Parallel

Figure 2. Dermi’s P2P call abstractions. (a) One-to-one calls involve only two entities: server and client. (b) One-to-many calls involve many entities: multiple servers and one client, or vice versa. Dermi also includes a communication layer between the stubs and skeletons — an important difference from RMI. In conventional RMI, a TCP socket is established between the caller (stub) and the callee (skeleton). Dermi stubs and skeletons use the underlying event service by making subscriptions and sending notifications to communicate the method calls and their results.

Dermi Services Using the decoupled nature of the underlying event infrastructure, we created several innovative services that Dermi provides to the applications layer. Though Dermi provides object mobility, replication, caching, and discovery services, we will concentrate on the most novel ones: P2P call abstractions, decentralized object location, and distributed interception. P2P Call Abstractions Figure 2 shows all of Dermi’s call abstractions. We divided them into two groups: one-to-one and oneto-many. One-to-one calls. One-to-one calls can be synchronous or asynchronous, depending on whether a client wishes to block their execution until a result returns. One-to-one calls do not use the event service, which fits more effectively into one-to-many calls. In one-to-one direct calls, an object client (stub) sends a message directly to an object server (skeleton). To accomplish this, we use the server’s NodeHandle, an object that represents the server’s address and port number. Thus, we achieve a direct peer communication between both end objects.

76

MAY • JUNE 2004


The results are returned the same way, producing a very efficient call that involves only two hops: one for the call and one for the returned results. Dermi’s current implementation fully supports direct synchronous calls, and we are currently working on support for asynchronous calls. One-to-one direct calls present several challenges because they aren’t tolerant to failures: when the server on which we wish to invoke methods goes down, it ceases to serve our requests. We solve this problem using NodeIds instead of NodeHandles, but this approach incurs additional overhead because a message routed to any given object might have to move through O(log n) (where O is order and n is the total number of nodes in the system) nodes before reaching its destination. This philosophy is in opposition to that of direct calls, in which a message moves directly from source to destination. Using the overlay network’s key-based routing capabilities is the foundation for what we call one-to-one hopped calls. The advantage of using the NodeId to route messages to the server is that we can use any existing replication mechanism, thus providing some failure tolerance. When the server we are using goes down, the message would automatically route to another server from the replica group, in a process transparent to the client, which continues to use the same NodeId to route messages. Past achieves this functionality with ease. Hopped calls are not as efficient as direct calls, but they provide some fault tolerance. They are under development and are not available in the current Dermi version. One-to-many calls. We modeled one-to-many calls using the overlay’s event service (in this case, Scribe) by means of notifications. We use only the application-level multicast layer in these calls. The multicall abstraction is a remote invocation from one client to many servers or from one server to many clients (for example, to propagate state information). Multicalls can be synchronous or asynchronous and are modeled as one-to-many notifications. All clients subscribe to the same topic hash (objectUID + MethodID) and the object server publishes events matching that subscription. As client numbers increase, this approach scales better than having point-to-point connections to any interested client. The approach also achieves transparency from clients to services — clients don’t need to know the locations of all servers that provide a service.


Dermi

When we designed our system, we wanted to stay close to the chosen programming language. Thus, our dermic tool generates stubs and skeletons using the same naming notations as Java. The generated stub code creates the appropriate subscription, decoupling the object server from clients. Anycall is a new form of remote procedure call that benefits from network locality. We take advantage of Scribe’s efficient anycast primitive2 to create a call to the objects that belong to the same multicast group (object replicas that can provide us with a service, for example). The anycall client is insensitive to which group object provides data; it only wants its request to be served. The idea is to iterate the multicast tree, starting from the closest member in the network. Once a member of the tree is found to satisfy the condition, it returns an affirmative result. If no group members are found to satisfy the anycall condition, it returns a remote exception to the caller. Dermi implements synchronous anycall, which blocks the client until the result returns. To illustrate the behavior of the anycall abstraction, consider how we might implement a CPUintensive application like SETI@Home (http://seti athome.ssl.berkeley.edu/) or the United Devices Cancer Research Project (www.grid.org/projects/cancer/) using Dermi. These applications retrieve data units from servers, analyze them on home or office PCs, and return the results to the servers. Our anycall abstraction could provide a simple alternative for the data-unit retrieval process. Imagine, for instance, that we have several servers with available data units (see Figure 3). We could create a multicast group under the topic AVAIL_DATA_UNITS, which would include an identifier equal to hash (“AVAIL_DATA_UNITS”). When a client node wanted to get a data unit, it would execute DataUnit du = anycall (“AVAIL_DATA_UNITS”, getDataUnit) to trigger an anycast message to the group; in response, the nearest group member would check whether it had any data units available. If true, the group member would return the data unit to the client and the anycast message would route no further. If false, the anycast message would route to another group member and so on, until any data unit was found or the message reached the root, which would mean that none of the group members had available data units. This result would throw a dermi.RemoteException back to the client to provide proper notification. Figure 4 shows the API used for representing anycalls. To mark a method as an anycall procedure,


n0

anycall (AVAIL_DATA_UNITS)

C n2

n1

n3

n4

DATA_UNIT returned

Figure 3. Anycall example. Client C anycalls to the AVAIL_DATA_UNITS group, reaching n2 first, which has no available data units to serve. The multicast tree is iterated (n4 → n3) until n3 finally returns a data unit.

public interface SimpleAnycall extends dermi.ERemote { public Object anyGetDataUnit() throws dermi.RemoteException; public boolean anyGetDataUnitCondition() throws dermi.RemoteException; }

Figure 4. Generated stub anycall functions. The dermic tool generates these methods for our anycall data unit example. The condition method (anyGetDataUnitCondition) checks whether the object has any data units available. we must add the prefix any to its method name, along with its condition method. The object that returns the data unit (anyGetDataUnit) will be called if and only if the condition method (anyGetDataUnitCondition) returns true. Otherwise, the message is routed to another group member. A manycall is a variation of the anycall abstraction. It sends a manycast message to several group members, continuing to route until it finds enough members to satisfy a global condition. Similar to anycall, when an object (in the multicast tree) receives a manycall message, it first checks whether the object satisfies a local condition and, subsequently, checks whether a global condition (passed along with the message) is met. The manycall is successful when the global condition is met.


MAY • JUNE 2004

77

Middleware Track

To better understand the manycall abstraction, imagine a massive online voting scenario in which we need a minimum of x votes to do a certain job. We could send a manycall to the group so that each member could vote yes or no, according to its local condition (to approve the execution of a certain simulation, for example). After checking this local condition (voting yes or no), the object would check the global condition (have x votes been reached?). If true, the voting process would conclude successfully, communicating the result to the manycall initiator. If the global condition weren’t reached, (the minimum number of votes x was not reached after iterating throughout all the multicast server tree), it would pass the unfavorable result to the client. Decentralized Object Location A scalable, stable, and fault-tolerant decentralized object-location service is needed to locate object references in wide-area environments such as Dermi. We can’t rely on a centralized naming service that could be a bottleneck for such a common task. We use our P2P overlay network substrate’s DHT facilities to build our object-location service. Other unstructured P2P networks, such as those based on Gnutella-like protocols, use flooding techniques to locate resources, but such techniques don’t guarantee the resource locations in a deterministic manner. By using a DHT-based approach to build our object-location service, we guarantee that a resource stored on the network will be found in at most O(log n) network hops — a stark contrast with the indeterminism of unstructured P2P overlays. This technique is not as efficient as a namingserver hierarchy, like Globe’s,7 which typically solves a query in two hops or less. In our solution, hop numbers increase as network size increases. Nevertheless, our decentralized object-location service remains embedded in the system in a natural way, so that we don’t need to rely on external services when doing object lookups and insertions. Our P2P location service stores object-location information that can be used to find objects via human-readable names. As in other wide-area location services,7 our object names don’t contain any embedded object’s location information to decouple its current location from its name. That is, an object’s name is independent of its location. We adopted a uniform resource identifier (URI)style naming convention for objects (for example, p2p://es/urv/etse/deim/Simple).7 Although we permit URI hierarchies that uniquely represent our objects, we use a secure hash algorithm (SHA-1) to

78

MAY • JUNE 2004


hash this key and insert it into the DHT. Our decentralized location service handles duplicates as well, throwing an exception in case someone wishes to rebind an already bound object without unbinding it beforehand. Distributed Interception Distributed interception lets us apply connectionoriented programming concepts in a distributed setting. With this service, we can reconnect and locate type-compatible interceptors at runtime in a distributed application. We extended Scribe’s classes to natively support this feature. Thus, we do not need to change the interceptor skeleton and intercepted remote-object subscriptions each time an interceptor is added or removed. We believe distributed interception can be a very useful mechanism in dynamic aspect-oriented programming (AOP) environments. Our interceptor implementation takes advantage of the fact that all events sent to a multicast group in Scribe first route to the group’s rendezvous point. Each group’s rendezvous point contains a list of pointers to other interceptor objects, which update every time an interceptor is added or removed. As a consequence, each time an event is sent to a multicast group, this notification arrives first at its rendezvous point, which checks whether it has interceptors. If there are no interceptors, the rendezvous node normally sends the event to the multicast group itself. Otherwise, the event passes sequentially throughout all the interceptors, which might transform it into a different event, changing its information. Finally, the event will be routed back to the rendezvous point, which will, in turn, send the intercepted event to the group members. We need a fault-tolerance mechanism in case the rendezvous point changes because of the addition or removal of network nodes. Fortunately, Scribe provides callbacks that notify us about rootnode modifications. The simplest approach would be to move all interceptor data from the old root to the new one, but this won’t work if the root node fails. In this case, we must have all interceptor data replicated among k nodes nearest the rendezvous point. To accomplish this, we use Past. Distributed interception is difficult to implement in strongly coupled object systems, in which clients and servers must be notified of object changes. When a TCP connection is established among many clients and an object server, the insertion of a remote interceptor implies that all clients should reconnect to the new interceptor and


Dermi

bind it to the remote server. Our solution does not affect client connections, which are represented as invariant subscriptions. Churn and Failure Handling DHTs are an emerging technology and still undergoing research. One research area is churn, the continuous process of node arrival and departure. Researchers have demonstrated that existing DHT implementations could break down at churn levels observed in deployed P2P systems,8 contrary to simulation-based results. Even though this is a hot research topic, we find the Bamboo approach very promising. Bamboo (www.bamboo-dht.org) is a new DHT that more easily accommodates large membership changes in the structure as well as continuous churn in membership. Bamboo’s authors say that it handles high levels of churn and achieves lookup performance comparable to Pastry in simulated networks without churn. Dermi partially addresses churn by controlling rendezvous-point changes. The root (or rendezvous point) of a Scribe multicast group is chosen to be the node whose identifier is closest to the group’s topic identifier. When new nodes join or leave our DHT, another node identifier might then become closer to the group’s topic identifier than its previous root. This means that every time a message is sent to the group, it will go to the new root rather than the old one. This can cause some events to be lost while a rendezvous-point change is in progress. Scribe notifies Dermi about these root changes via upcalls, and a buffering algorithm forwards lost events to the new root. Dermi handles server failures in several ways. One is via the anycall abstraction. Consider, for example, an environment in which several servers offer the same service. When clients issue an anycall to this server group using Scribe’s anycast primitive, each client should be directed toward its closest server replica. If any of these servers were to fail, however, the client would continue to be served, but by another server in the group. Thus, the only visible effect at the client side would be a slightly longer response time because it would no longer be served by its closest server. Another way to handle server failures is via replication mechanisms. With our decentralized location service, we must handle any possible node failures that can affect it. If a node that contains an object’s location information fails, that object’s lookups will fail as well. To solve this, we use datareplication mechanisms, such as those provided


transparently by a persistent and fault-tolerant storage management system like Past. When an object handle is to be inserted, Dermi replicates its data among the k nearest nodes to the target node. When a target node fails, the object’s handle can be recovered from any of its k nearest nodes.

Validation We validated Dermi’s approach using experimental measurements and simulations. Experimental Measurements We conducted several experiments to measure Dermi’s viability using the PlanetLab testbed (www.planet-lab.org). PlanetLab is a globally distributed platform for developing, deploying, and accessing planetary-scale network services. Any application deployed on it can experience real Internet behavior, including latency and bandwidth unpredictability. One of the things we measured was Dermi’s call latency (how long it takes to perform a call). We conducted the experiments using 20 nodes from the PlanetLab network, located in a wide variety of geographical locations, including China, Italy, France, Spain, Russia, Denmark, the UK, and the US. We repeatedly ran the tests at different times of day to minimize the effect of punctual node congestion and failures. Before each test, we estimated the average latency between nodes to gauge how much overhead the middleware calls incurred. Table 1 (next page) shows the tests’ median values in milliseconds (ms). The first test used one-to-one direct synchronous calls, which are achieved by establishing a direct P2P communication between two objects. Each test used 300 random invocations (getter–setter methods) for each pair of object nodes. As expected, this kind of invocation is the most efficient. The normalized incurred overhead is 1.27 (average call time/average latency). Next, we tested one-to-many synchronous multicall using a group of 10 servers and a client invoking 300 setter methods on all of them. Because it’s a synchronous test, the client remains blocked until all servers return from the invocation. Results show an average 463-ms call invocation. In an attempt to make a better comparison of the first two test results, we conducted the same test, trying to synchronously invoke each server sequentially (the client calls each server in sequence). As we expected, performance degraded (1,536 ms), thus demonstrating multicall’s ability to achieve one-to-many calls using the event


MAY • JUNE 2004

79

Middleware Track Table 1. Performance of one-to-one direct synchronous calls. Object server

Object client

planetlab2.comet.columbia.edu planetlab2.comet.columbia.edu pl2.6test.edu.cn planet1.berkeley.intel-research.net planetlab1.atla.internet2.planet-lab.org

planetlab1.diku.dk pl1.swri.org planetlab2.di.unito.it planetlab5.lcs.mit.edu planetlab2.sttl.internet2.planet-lab.org

Average latency (ms)

service. On average (and in this case), multicalling is 3.32 times faster than sequential direct calls. This test demonstrated the viability of the multicall abstraction. Apart from being inefficient to utilize direct calls to simulate one-to-many calls, however, it is incorrect in terms of design: The test demonstrates only that it is faster to multicall rather than doing n one-to-one direct calls. The point is that it also would be conceptually incorrect for a client to know the n servers (which, in theory, offer a same service). In fact, the client will know the name of the service to invoke rather than knowing all the servers that provide that service (which can be removed, new ones added, and so on). A client knows the name of the service to invoke, which is transparent on how many servers provide that service. To measure anycall performance, we used three nodes out of 20 providing the same service. A set of clients began invoking anycalls on these servers. Each server provided clients with a standard data-unit set. When a server exhausted its data units, another server from the same group took its place, and so on. For anycalls, the results showed an average of 166 ms for the first server, 302 ms for the second, and 538 ms for the third. These servers were chosen on the basis of proximity, such that the server closest to the client was first, followed by the second-closest and the last. The overall overhead for these calls was 1.46 (the normalized incurred overhead: average call time divided by average call latency). We tested the manycall sequential implementation under the same conditions as the anycall tests: clients sent manycalls to a group of three servers. In this case, we used the voting example described earlier. Each of the clients required an affirmative vote from each of the three servers. Once this task was accomplished, the manycall returned. On average, these invocations lasted 386 ms, producing an overhead of 1.68. Using the PlanetLab testbed, we verified that

80

MAY • JUNE 2004


116 60 522 84 62

Average call time (ms)

149 90 528 116 73

Dermi does not impose excessive overhead on distributed object invocations; one-to-many invocations elegantly fit with the applicationlevel multicast service, and anycalls and manycalls obtain good results because of the inherent network locality. Simulation Results For our simulations we focused primarily on Dermi’s distributed interception capability (we haven’t published other simulations that we conducted). Figure 5 shows a simulation of Dermi’s distributed interception mechanism. For clarity, the figure displays data only for the messages delivered to the event service’s application layer. The configuration used an overlay network of 40,000 nodes and a 20,000-node multicast group. We sent 20,000 notifications to the group and used FreePastry’s local node simulation. We measured the interceptors’ node stress, which shows the number of messages received for such nodes. The first scenario (Figure 5a) shows the group with an interceptor located at a node other than its root. Results show the rendezvous-point node overhead: each message is sent twice to the root (from the publisher to the root and from the interceptor to the root). We can improve this scenario by making the rendezvous node and the interceptor the same node (using Dermi’s object-mobility service). In that case, global node stress is the same as if there were no interceptor. What happens when the interceptor node becomes overwhelmed with event processing (rather than network load)? Imagine transmitting a video stream to several groups of users, one of which wants to receive it in a different video format. This would be a very demanding task if performed by the video publisher or by each affected group member. An alternative uses an interceptor to do the data-conversion and deliver it to the group that wants it. Even so, the multicast group’s root node could become overwhelmed with CPU processing if the interceptor and root coincided in


40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0,000 2,500

5,000

7,500 10,000 12,500 15,000 17,500 20,000 Number of notifications

One interceptor, different node and rendezvous points

One interceptor, node and rendezvous points coincide

(a)

Node stress

the same node. As Figure 5b illustrates, we might delegate such demanding processing to specialized interceptor nodes that produce more network node stress on the root; this would free the root from collapsing due to event processing. In this scenario, the root node selected four equivalent interceptors in a round-robin policy. In real life, this illustrative case could be extended to reduce the interceptor’s network and CPU stress. A root node’s stress increases with the number of remote interceptors, which has the countereffect of relieving the root from unnecessary CPU processing. (Although the rendezvous node could be a relatively CPU-weak node, we selected powerful CPU nodes.) As part of the test, we simulated random failures in these four interceptor nodes using an ad hoc recovery-mechanism policy that restarted new interceptors when the number of live ones fell to one. (We can modify this policy to respawn new interceptors when other conditions are met, but for clearer simulation results, we opted for our default condition.) As each interceptor fails, node stress for the remaining ones noticeably increases. When all of them are down except one, our recovery mechanism enters and respawns three new interceptors, thus softening the node stress and load balancing our system to where it was at the start of the simulation. Throughout our simulations, we found our distributed interception mechanism’s principal hot spot to be rendezvous-point overloading (when there was more than one interceptor per group). This problem is endemic to the majority of group-multicast algorithms, and approaches such as creating rendezvousnode hierarchies have been proposed in literature to alleviate it. Scribe currently does not support this feature, and although it presents the advantages of being a single entry point to the group (thus, being able to perform access control, distributed interception, and so on), it also might become a hot spot in terms of being overwhelmed by messages.

Node stress

Dermi

8,000 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0,000 2,500

5,000

Four interceptors (ideal)

7,500 10,000 12,500 15,000 17,500 20,000 Number of notifications Four interceptors (node failures and recovery)

(b)

Figure 5. Distributed interception simulations. (a) Scenario 1 shows node stress for one interceptor node. Better node stress is achieved when the root and interceptor coincide in the same node. (b) Scenario 2 shows mean node stress for four interceptors. The ideal case is shown in red (no node failures). The blue line shows a case in which up to three interceptors fail and then recover. Mean interceptors’ node stress is augmented in such cases. • • • •

robustness and fault tolerance, centralized-bottleneck avoidance, network locality, and deterministic resource location.

Increasing computer capabilities and network bandwidth have popularized the edge-computing paradigm, which means that these systems take advantage of home-computers’ increasing power and connection speed. To benefit from edge computing, applications should support:

To fulfill these requirements, we believe that structured P2P DHTs represent a suitable substrate for such applications. They offer an efficient, scalable, fault-resilient, and self-organizing layer for building wide-area solutions. We have developed three services that benefit from the underlying substrate: decentralized object location, new P2P call abstractions, and distributed interception capabilities. We foresee many prospective applications that could benefit from these services in the next few years. Such application-domain examples include

• scalability,

• Enterprise edge computing. Akamai’s Edge-

Requirements, Services, and Prospective Applications



MAY • JUNE 2004

81

Middleware Track

Related Work in P2P Distributed Object Middleware lthough other middleware platforms aimed at wide-area environments exist (Legion1 and Globe,2 for example), we believe that Dermi is the first attempt at building P2P distributed object middleware based on distributed hash tables (DHTs).

A

Wide-Area Distributed Object Middleware Globe provides one invocation type (synchronous calls), and it supports neither notifications nor callbacks. Dermi provides synchronous, asynchronous one-to-one, and one-to-many calls, and supports notifications because it is built on top of an event service. Globe’s hierarchical location service maps object identifiers to the locations of moving objects. It is a scalable and efficient service that uses pointer caches to provide search short cuts, among other mechanisms. Dermi is based on a DHT overlay network and achieves scalability — without any hierarchical activity — via a hash function that generates objects’ keys. With Dermi, the number of network hops required to get the information varies depending on the number of network nodes, whereas the number of hops remains constant with Globe. Legion provides an object-based service model with arbitrary object replication and location. Dermi employs a similar approach because it uses messages (or notifications) as its core communication mechanism. Because Globe and Legion don’t reside on top of structured P2P networks, however, neither provides the P2P call abstractions we propose.

Wide-Area Remote Method Invocation Technologies JxtaJeri (http://user-wstrange.jini.org/jxtajeri/

JxtaJeriProgGuide.html) is an integration of JXTA and Jini that lets programmers use the Java RMI programming model to invoke services over a JXTA P2P network.This package uses JxtaSockets to implement a Jini Extensible Remote Invocation (Jeri) transport.JxtaJeri enables a service to expose its remote interfaces over the JXTA network. A programmer can use the higher-level remoteprocedure-call model to construct services. Unlike JxtaJeri, Dermi’s location service is based on a structured P2P network and achieves deterministic and optimal resource location. JxtaJeri is based on unstructured JXTA networks, implying that resource location is at least not as deterministic as it can be in Dermi. By using JXTA protocols, JxtaJeri can benefit from many advantages such as network address translation and firewall traversal.Technically, these are solved problems, but existing DHTs haven’t focused on solving them. Nevertheless, we believe that as DHTs continue to evolve to a more mature technology, researchers will address such technicalities.

MAY • JUNE 2004

Call Abstractions A similar version of the multicall abstraction (called Multicast RPC) already exists in systems like Groupkit (www.group kit.org/) and in the Multicast Object Request Broker, thus supporting multicast object invocations. However, we believe Dermi is the first system to include such abstractions on top of a structured P2P overlay substrate. References 1. M. Lewis and A. Grimshaw, “The Core Legion Object Model,” Proc. 5th IEEE Int’l Symp. High Performance Distributed Computing, IEEE CS Press, 1996, pp. 551–561. 2. M. van Steen, P. Homburg, and A.S. Tanenbaum, “Globe:A Wide-Area Distributed System,” IEEE Concurrency, vol. 7, no. 1, 1999, pp. 70–78. 3. W.W. Terpstra et al.,“A Peer-to-Peer Approach to Content-Based Publish/Subscribe,” Proc.Int’l Workshop

Wide-Area Publish–Subscribe Systems

Distributed Event-Based Systems (DEBS’03),ACM Press,

Dermi uses Scribe as its publish–subscribe P2P infrastructure. Related work in this field includes Rebeca,3 which is a contentbased publish–subscribe system built on top of the Chord overlay network.4 It provides a more complex form of publish–subscribe than Scribe’s topic-based approach. Hermes5 is another publish–subscribe system; it uses a similar approach to Scribe’s, which is based on the Pastry overlay substrate. This system tries to get around topic-based publish–subscribe limitations

citation.cfm?id=966618.966627&dl=GUIDE&dl=GUI

Computing initiative (www.akamai.com/en/ html/services/edgecomputing.html) proposes that enterprises deploy wide-area systems that are accessible from different countries to benefit from the locality of the Akamai network. This means that the code that reads, validates, and processes application requests executes on Akamai’s network, thus reducing

82

by implementing a so-called “type and attribute-based” publish–subscribe model, which extends the expressiveness of subscriptions and advocates multiple inheritance in event types.


2003,(electronic proceedings);http://portal.acm.org/

DE&CFID=19878208&CFTOKEN=52881114. 4. I. Stoica et al., “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc.ACM SIGComm 2001, ACM Press, 2001, pp. 149–160. 5. P. Pietzuch and J. Bacon, “Peer-to-Peer Overlay Broker Networks in an Event-Based Middleware,” Proc. Int’l Workshop Distributed Event-Based Systems (DEBS’03),ACM Press, 2003, (electronic proceedings); http://portal.acm.org/citation.cfm?id=966618 966628&dl=GUIDE&dl=GUIDE&CFID=19878208 &CFTOKEN=52881114.

the strain on the enterprise’s origin server. Our services also can provide resource location and network locality in a decentralized fashion without depending on proprietary networks. For example, a company could deploy a wide-area e-commerce system interconnected with our underlying P2P substrate and using Dermi’s services.


Dermi

• Grid computing. The Open Grid Services Architecture (OGSA; www.globus.org/ogsa/) has standardized several services for developing distributed computing applications. Grid components can be accessed using Web services and located using naming services, and state changes can be propagated using event services. Nevertheless, there is ongoing research toward creating other wide-area grids. Our services could help solve many of the problems this work will encounter. For example, our decentralized object-location service can be very useful for locating resources in a deterministic way. Further, we can exploit grid locality with anycall abstractions and propagate changes using multicall abstractions. Our distributed interception mechanism could also be used for load-balancing purposes. • Multiagent systems. The Foundation for Intelligent Physical Agents (FIPA; www.fipa.org) specifications define a multiagent framework with services for agent location and a messaging service that supports ontologies. Nonetheless, the research community has not yet proposed a wide-area multiagent system. For example, AgentCities (www.agentcities.org) utilizes a central ring to interconnect several agent infrastructures. (The AgentCities network is open to anyone wishing to connect agents and services. The initiative already involves organizations from more than 20 countries involved in a significant number of different projects.) A scalable multiagent system could benefit from our proposed services to achieve decentralized agent location; it could also benefit from network locality to use agent services and from multicalls to propagate state changes simultaneously to many agents. Finally, the computer-supported cooperative work (CSCW) domain represents an interesting arena for Dermi applications. Developers could use our infrastructure to build social networks, massive multiuser games, and online communities. In fact, we are actively developing applications that benefit from Dermi’s new services: • Our decentralized location service works for shared session-location purposes. A shared session is essential to any CSCW toolkit because it defines the basis for shared interactions in a common remote context. We have implemented shared sessions as Dermi objects, and our


location service provides a simple, scalable way to locate them. • The anycall abstraction facilitates late users’ ability to join shared sessions. When new users join a shared session, they need to obtain the session’s state. By sending an anycall to the session group, they can get the required state information from the closest updated member. The multicall abstraction can also be used for shared session components’ state propagation and group calls. • Our distributed interception mechanism can be used to establish group coordination policies among groups of objects contained in a shared session. These policies can be dynamic. If each member of the group of objects relied on a coordination policy embedded locally, replacing such policies would become a painful process. By using Dermi’s distributed interception approach, it is straightforward to exchange interceptors simply by switching pointers at the root. We have developed an example CSCW application called CoopWork, which consists of a plug-in for the Eclipse Java development environment (www. eclipse.org) for cooperative programming. The provides a decentralized development environment and contains many features including method blocking and unblocking, file or method publication, version control, and many others. A downloadable version is available at http://ants.etse. urv.es/DERMI.

Future Work Both experimentally in the PlanetLab testbed and empirically via simulations, we have proven that our middleware approach is viable and that the system has acceptable performance. We’re also continuing to improve Dermi to include several useful features. In contrast to sequential manycalls, for example, we are looking at implementing parallel manycalls for better performance. The behavior would be similar to a multicall, as we again multicast to the tree, though starting from the closest client’s node in the group, rather than at the root itself, thus taking locality into account. Results could be communicated back to the client by each of the group’s nodes that satisfies the condition. Once the client received all necessary data, it would discard other incoming messages from remaining group members. For very large groups, a multicast tree search could be purged by speci-


MAY • JUNE 2004

83

Middleware Track

fying a maximum depth to cover, thus preventing a client from becoming overwhelmed with messages. This methodology would incur more node stress in the client, but it would be more efficient in terms of parallelism. We are also looking to add authentication mechanisms to prevent malicious nodes from compromising our system’s participants. Public-key cryptography is required to achieve this goal, which adds a performance expense. Finally, we’re making our rendezvous-point change buffering algorithms more consistent to account for all possible uses. Acknowledgment This work has been partially funded by the Spanish Ministry of Science and Technology through project TIC-2003-09288C02-00.

ized Object Location and Routing for Large-Scale Peer-toPeer Systems,” IFIP/ACM Int’l Conf. Distributed Systems Platforms (Middleware), ACM Press, 2001. pp. 329–350. 6. F. Dabek et al., “Towards a Common API for Structured Peer-to-Peer Overlays,” Proc. Int’l Workshop P2P Systems (IPTPS’03), Springer-Verlag, 2003, pp. 33–44. 7. M. van Steen, P. Homburg, and A.S. Tanenbaum, “Globe: A Wide-Area Distributed System,” IEEE Concurrency, vol. 7, no. 1, 1999, pp. 70–78. 8. S. Rhea et al, Handling Churn in a DHT, report UCB/CSD-031299, Computer Science Dept., Univ. Calif., Berkeley, 2003. Carles Pairot Gavaldà is a PhD student in the Department of Computer Science and Mathematics at Universitat Rovira i Virgili, Spain. His research interests include distributed systems and middleware infrastructures for structured P2P networks. He has a BS and an MS in computer science from Universitat Rovira i Virgili. Contact him at [email protected].

References 1. A. Rowstron and P. Druschel, “Storage Management and Caching in Past, a Large-Scale, Persistent Peer-to-Peer Storage Utility,” Proc. ACM Symp. Operating Systems Principles, ACM Press, 2001, pp. 188–201. 2. M. Castro et al., “Scalable Application-Level Anycast for Highly Dynamic Groups,” Proc. Networked Group Comm. (NGC ‘03), Springer-Verlag, 2003, pp. 47–57. 3. I. Stoica et al., “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGComm, ACM Press, 2001, pp. 149–160. 4. B. Zhao, J. Kubiatowicz, and A.D. Joseph, Tapestry: An Infrastructure for Fault-Tolerant Wide-Area Location and Routing, tech. report UCB/CSD-01-1141, Computer Science Dept., Univ. Calif., Berkeley, 2001. 5. A. Rowstron and P. Druschel, “Pastry: Scalable, Decentral-

Pedro García López is a professor in the Department of Computer Science and Mathematics at Universitat Rovira i Virgili, Spain. His research interests include computersupported cooperative work and distributed systems. He has a PhD in computer engineering from Universidad de Murcia, Spain. He is a member of the ACM. Contact him at [email protected]. Antonio F. Gómez Skarmeta is a professor in the Department of Computer Engineering at Universidad de Murcia, Spain. His research interests include distributed artificial intelligence and tele-learning and computer-supported cooperative work. He has a PhD in computer science from Universidad de Murcia. He is a member of the IEEE Computer Society. Contact him at [email protected].

IEEE Transactions on Mobile Computing

A

revolutionary new quarterly journal that seeks out and delivers the very best peer-reviewed research results on mobility of users, systems, data, computing information organization and access, services, management, and applications. IEEE Transactions on Mobile Computing gives you remarkable breadth and depth of coverage …

Architectures Support Services Algorithm/Protocol Design and Analysis Mobile Environment Mobile Communication Systems Applications Emerging Technologies

Subscribe NOW!

To subscribe:

http:// computer.org/tmc or call USA and CANADA:

+1 800 678 4333 WORLDWIDE:

+1 732 981 0060