Adaptive Offloading for Pervasive Computing - CiteSeerX

0 downloads 0 Views 322KB Size Report
We conduct trace-driven simulation experiments on an IBM Thinkpad T22 running Redhat Linux 7.1. We first collect the execution traces using the three ...
Accepted by IEEE Pervasive Computing Magazine (to appear).

Adaptive Offloading for Pervasive Computing ∗ Xiaohui Gu†, Alan Messer‡, Ira Greenberg§, Dejan Milojicic¶, Klara Nahrstedt

Abstract Pervasive computing allows a user to access an application on heterogeneous devices continuously and consistently. However, it is challenging to deliver complex applications on resource-constrained mobile devices such as cell phones. Application-based or system-based adaptations have been proposed to address the problem, but they often require application fidelity to be significantly degraded. We believe that this problem can be overcome by dynamically partitioning the application, and by offloading part of the application execution with data to a powerful nearby surrogate. This allows the application to be delivered in a pervasive computing environment without significant fidelity degradation or expensive application rewriting. Runtime offloading needs to adapt to different application execution patterns and resource fluctuations in the pervasive computing environment. Hence, we have developed an offloading inference engine to adaptively solve two key decision-making problems in runtime offloading: (1) timely triggering of offloading, and (2) efficient partitioning of applications. Both trace-driven simulations and prototype experiments show the effectiveness of the adaptive offloading system.

1 Introduction Computer systems have evolved from the era of the mainframe, which is shared by many people, through the era of the personal computer, which is used by one person, to the era of pervasive computing, where a single user possesses multiple heterogeneous mobile devices ranging from a laptop to a personal digital assistant (PDA) to a cell phone. The user wants to use any of these devices as a portal, with the help of other personal devices and available nearby surrogate devices, to seamlessly and consistently access any application, anytime and anywhere. However, it is challenging to run complex applications on mobile devices because of the strict constraints on their resources such as memory capacity, CPU speed, and battery power. A brute-force approach to accommodate device diversity is to rewrite applications according to the resource capacity of each mobile device. However, this approach is very expensive, and the application source code is often proprietary. In the past, various projects [5, 15, 16, 12] addressed this problem using application-based or ∗

Most of the work was done when Xiaohui Gu was a summer intern at Hewlett Packard Laboratories. The work was also partially supported by a NASA grant under contract number NASA NAG 2-1406, an NSF grant under contract number 9870736, 9970139, and EIA 99-72884EQ. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF, NASA, or the U.S. Government. † Xiaohui Gu and Klara Nahrstedt are affiliated with Department of Computer Science, University of Illinois at Urbana-Champaign. Email: {xgu, klara} @ cs.uiuc.edu. ‡ Alan Messer was affiliated with Hewlett Packard Laboratories when most of the work was done. He is now affiliated with Samsung R&D Labs, USA. Email: alan [email protected]. § Ira Greenberg was affiliated with Hewlett Packard Laboratories when most of the work was done. Email: [email protected] ¶ Dejan Milojicic is affiliated with Hewlett Packard Laboratories, Palo Alto, CA. Email: dejan @ hpl.hp.com

system-based adaptations. However, these approaches often require significant degradation of an application’s fidelity to make it possible to fit the application into a mobile device. Moreover, the efficiency of these adaptation systems is often limited by their coarse-grained approaches. To realize the goal of pervasive application delivery without modifying the application or degrading its fidelity, we propose an adaptive offloading system that can dynamically and transparently partition the application and offload part of the application’s execution and data to a nearby surrogate.1 In this article, we present the adaptive offloading system that includes two cooperating parts: (1) distributed offloading platform [14], and (2) offloading inference engine [8].

1.1 Decision-Making Problems for Adaptive Offloading In pervasive computing, resource availability and user mobility are highly dynamic. For example, wireless network bandwidth can fluctuate significantly while a user moves around. To ensure efficient program execution, runtime offloading needs an intelligent decision-making module to adapt to the dynamic changes in the pervasive computing environment. The offloading inference engine should trigger offloading at the right time and offload the right program objects to achieve low offloading overhead and efficient program execution. For example, when the wireless connection is excellent, the inference engine could decide to offload a large amount of application execution and data to avoid additional offloading in the near future. When the wireless connection is poor, the inference engine could decide to offload only the necessary amount of application execution and data to overcome resource constraints and avoid high offloading overhead. We identify two important decision-making problems for adaptive offloading: (1) adaptive offloading triggering, and (2) efficient application partitioning. For the first problem, we need to address adaptability to the pervasive computing environment, configurability to different application-specific offloading goals, and stability of the offloading system. For the second problem, we need to efficiently select the most effective application partitioning from many candidate partition plans. The most effective application partitioning should be able to simultaneously meet multiple user requirements for offloading, such as minimizing wireless bandwidth requirement, minimizing average interaction delay, and minimizing total execution time.

1.2 Solution Overview The overall architecture of the offloading system is illustrated in Figure 1. The user wants to access a memoryintensive application on a resource-constrained mobile device, such as a PDA. The application can be either a distributed application such as content retrieval from a remote server or a local application such as image editor. When the application memory requirement reaches or approaches the maximum memory capacity of the mobile device, an offloading action is triggered. The program objects on the mobile device are partitioned into two groups. Some of the program objects are offloaded to a powerful nearby surrogate to reduce the memory requirement on the mobile device The offloading platform transparently transforms method invocations to offloaded objects into remote invocations. Our assumptions are that (1) the application is written in an object-oriented languages such as Java or C#, and (2) the user’s environment contains powerful surrogates and plentiful wireless bandwidth. A surrogate can be the user’s personal laptop, an embedded server, or some other environmental host. With the proliferation of computing devices and wireless networks (e.g., IEEE 802.11), we believe that these assumptions are realistic. The offloading inference engine is illustrated in Figure 2. To make runtime offloading decisions, the offloading inference engine does not require any prior knowledge about an application’s execution pattern or the runtime environment’s resource status. Instead, the offloading inference engine acquires execution and resource information from the offloading platform’s execution and resource monitors, respectively. To achieve both configurability 1 Currently, we only consider the use of one surrogate device. We plan to extend our system to support the use of multiple surrogates as part of our future work.

2

Partition Selection

Internet Internet

Server

Candidate Partition Plan Generation Module

Function Invocation Data Access

Local Storage

Offloading Triggering Inference Client Classes

Surrogate Classes

Classes

Application Execution Monitor

Classes

Event Filter

Application-Specific Offloading Rules

Wireless Bandwidth Monitor

Offloading Inference Engine Distributed Offloading Platform

Figure 2. Offloading inference engine architecture.

Figure 1. Adaptive offloading system architecture.

and stability, the offloading inference engine employs the Fuzzy Control model [13] as the basis for the offloading triggering inference module. This module can be easily configured using application-specific offloading rules. This approach is analogous to Internet Protocol (IP) configuration for the network connection in a modern operating system. To avoid unnecessary inference overhead, an event filter is used to drop insignificant resource and application execution change events. After the offloading inference engine decides to trigger a new offloading action and selects the new levels of resource utilization, it selects an effective application partitioning from many possible partition plans generated by the partition module of the offloading platform. The offloading platform generates a set of candidate partition plans using a simple heuristic algorithm derived from the MINCUT algorithm [17]. To simultaneously meet multiple user requirements for offloading, the offloading inference engine uses a composite partitioning cost metric to select the best partition plan. The selected application partition plan indicates which program objects should be offloaded to the surrogate and which program objects should be pulled back to the mobile device during the new offloading action. One of the critical resource constraints on the mobile device is its strict memory limitation. Although the memory capacity of mobile devices will continue to increase, the memory limitation will still exist when the user runs multiple applications or multiple instances of the same application. In this article, we focus on describing how to minimize the performance penalty experienced by the application when using adaptive offloading to relieve the mobile device’s memory constraint. Overcoming the memory constraint allows a memory-intensive application to be used on a mobile device with good performance, a result that otherwise cannot be supported without incurring offloading or fidelity degradation. Although our research [14] has shown that adaptive offloading could be applied to other resources such as CPU speed, we only consider memory in this article. Using both extensive tracedriven simulations and prototype experiments, we show that our adaptive offloading system can efficiently support memory-intensive applications on a mobile device in the pervasive computing environment.

2 Distributed Offloading Platform In this section, we introduce the distributed offloading platform, which includes (1) application execution monitoring, (2) resource monitoring, (3) application partitioning candidate generation, (4) surrogate discovery, and (5) transparent remote procedure call (RPC) platform support.

3

2.1 Application Execution Monitoring Without loss of generality, we use Java programs in the rest of the article to illustrate our approach. In the offloading platform, application execution is characterized by a weighted directed graph, called the application execution graph. Each graph node represents a Java class. We choose a class as the graph node because (1) classes represent a natural component unit for all object-oriented programs, (2) classes enable more precise offloading decisions than coarser component granules, and (3) classes make it possible to avoid manipulating a large execution graph with too many fine-grained objects (e.g., a simple image-editing Java program created 16,994 distinct objects during 174 seconds of execution). The weight metrics associated with a graph node include (1) memory size, which describes the amount of memory occupied by the Java objects of the Java class, (2) AccessFreq, which represents how many times the methods or data fields of the class have been accessed, (3) Location, which describes whether the Java class’s objects are currently located in the mobile device (“local”) or in the surrogate (“surrogate”), and (4) IsNative, which indicates whether a class’s objects can be migrated from the mobile device to the surrogate. Some classes must always execute on the mobile device, such as classes that invoke device-specific native methods. Each graph edge represents the interactions/dependencies between the objects of two classes. Each edge is annotated by two fields, (1) InteractionFreq, which represents the number of interactions between the objects of two classes, and (2) BandwidthRequirement, which represents the total amount of information transferred between the objects of two classes. In the current prototype, the entire application execution graph is maintained on the mobile device by the offloading platform execution monitor. After the first application split, the execution information on the surrogate is periodically collected and merged into the execution graph on the mobile device.

2.2 Resource Monitoring To adapt to resource changes, the offloading platform needs to monitor the resources of the mobile device, the surrogate, and the wireless network. The available memory on the mobile device and surrogate are monitored by tracking the amount of free space in the Java heap, which is obtained from the garbage collector of the Java Virtual Machine (JVM). The wireless bandwidth and delay can be estimated by passively observing ongoing traffic through offloading platform, or by actively measuring this information with measurement tools. Whenever some significant change happens (e.g., a certain amount of memory is consumed, or a sufficiently large wireless bandwidth fluctuation occurs), the offloading inference engine decides whether offloading should be triggered.

2.3 Candidate Partition Plan Generation The problem of optimal application partitioning (i.e., finding the minimum cut into two bounded sets) is NPcomplete [6]. However, the offloading system is designed for resource-constrained devices. Hence, the offloading platform uses an efficient partitioning heuristic derived from Stoer’s and Wagner’s MINCUT algorithm [17] to generate a set of candidate partition plans. Let G = (V, E) describe the current application execution graph, where V is the set of nodes and E is the set of edges. Each edge is associated with a cost value reflecting the inter-class interactions/dependencies, which will be introduced later in Section 3.2. Let PM represent the partition on the mobile device, and PS represent the partition on the surrogate. At the beginning, both PM and PS are initialized as empty. The partition candidate generation algorithm is as follows. Step 1. Merge all the nodes that cannot be migrated to the surrogate (i.e., nodes with their isNative field set to true) into one node, v1 . By ‘merging’, we mean that these nodes are coalesced into one node. If there are k edges from a node to these merged nodes then we substitute one edge for these k edges. The new edge has an edge cost equal to the sum of the costs of the k old edges. Suppose there are n nodes after the merging. Let PM = {v1 } and PS = {v2 , ..., vn }. 4

merge

V2

100

30

40 10

V1

V3

20

V3

V 1V 2

50

60 90

V4

V4

60

Pm = {V 1 }, Ps = {V 2, V 3, V4 }, cost = 160

Pm = {V 1, V2}, Ps = {V 3, V 4}, cost = 120

(a)

(b)

Figure 3. Illustration of candidate partition plan generation.

Step 2. Among the neighbors of v1 representing the partition on the mobile device, select the one that has the largest edge cost to v1 . Suppose the selected node is vi . Merge vi with v1 and move vi from PS to PM . We consider the cut Γ ,< PM , PS > as one of the candidate partition plans. This partition plan is recorded with the information about its paritioning cost and the two partitions. Step 3. Repeat from Step 2 until all nodes have been merged with v1 . For example, Figure 3 illustrates the process of the above candidate partition plan generation. The original execution graph with cost annotation is shown by Figure 3 (a). First, v2 is selected among the neighbors of v1 , which has the largest edge cost. We then merge v2 with v1 to get a new graph illustrated by Figure 3 (b). From this new graph, we can derive another candidate partition plan. We continue the merging process until all four nodes are merged into one node. The offloading inference engine can then select the best partitioning from the above candidate partition plans according to its partitioning cost metric, which will be introduced later in Section 3.2.

2.4 Surrogate Discovery In our offloading platform, the mobile device runs as the master device and each surrogate runs as a slave device. The slave device runs a JVM and offloading platform monitoring modules. For simplicity, in our current prototype, only the mobile device runs the candidate partition plan generator and the offloading inference engine components. During runtime, if the offloading inference engine decides that offloading needs to be performed, it triggers a new offloading action. The mobile device then initiates a discovery protocol to find a nearby surrogate that will accept offloaded application executions. While the current prototype uses a well-known server, a surrogate can be easily discovered using a wireless broadcast of a “surrogate discovery” message or a more complex service discovery protocol such as UPnP [2] or Jini [1]. The mobile device then transfers the byte-code and data it wishes to remotely execute on the surrogate. The surrogate loads the related classes and awaits RPC requests from the mobile device to continue the application execution.

2.5 Transparent RPC Platform To support transparent partitioning, a mechanism for transparent RPCs between virtual machines is needed. Java’s existing support for remote execution (RMI) does not provide transparent mapping of calls and objects into RPCs between machines. Hence, we modify the HP’s Chai JVM so that objects can be transparently migrated between the mobile device and the surrogate. In a JVM, each object is uniquely identified by an object reference. To support remote execution, we modify the JVM to flag object references to remote objects and then intercept accesses to remote objects. Using these hooks, the offloading platform converts remote accesses into transparent RPCs between two JVMs. A JVM that receives an RPC request, uses a pool of threads to perform execution on behalf of the other JVM. Using this approach, threads are not migrated. Instead, invocations and data accesses 5

follow the placement of objects. However, there are several issues that must be addressed to support the goal of providing a single, transparent distributed platform between two JVMs. • Java native methods cannot be migrated because they are implemented using non-Java languages and may have different implementations on different platforms. To solve this problem, native invocations are directed back to the master JVM. This gives an application the appearance of executing on the mobile device even though part of its execution is on a surrogate. • Some Java objects are statically shared between application objects, such as System.properties, which contains < key, value > pairs specifying information such as the name of the host operating system. Therefore, to ensure consistency, all accesses to static data are directed back to the master JVM. • Each JVM has a private object reference name-space and does not understand an object reference from another JVM. To overcome these name-space limitations, we modified the JVM so that it will map a reference from another JVM into its own name-space. Each JVM keeps stub local references for remote objects as placeholders. When a JVM invokes a method or accesses an object on the other JVM, it sends an operation referring to the object using its local object reference. The receiving JVM then maps the first JVM’s local reference to its own real local reference for the object. Each JVM maintains its object reference mappings when objects and object references move between the two JVMs.

3 Adaptive Offloading Inference Engine In this section, we present the design details of the offloading inference engine. We describe the two decisionmaking modules of the offloading inference engine, which address the problems of triggering offloading and selecting partitionings. Offloading can add overhead to the application’s execution. This overhead includes the cost of (1) transferring objects between the mobile device and the surrogate, and (2) performing remote data accesses and function invocations over a wireless network. One of the inference engine’s goals is to minimize the offloading overhead while relieving the memory constraint on the mobile device.

3.1 Offloading Triggering Inference To perform offloading triggering inference, the offloading inference engine examines the current resource consumption of the application and the available resources in the pervasive computing environment. It then decides whether offloading should be triggered given the user’s offloading goals. If so, it decides what level of resource utilization should be used on the mobile device, that is, how much memory should be freed up by offloading program objects to the surrogate. At first glance, the problem can be solved using a simple threshold-based approach. For example, threshold-based rules can be hard-coded in the offloading inference engine, such as “if the current amount of free memory on the mobile device is less than 20% of its total memory, then trigger offloading and offload enough program objects to free up at least 40% of the mobile device’s memory.”. However, such a simple approach cannot meet the challenges of adaptability, configurability, and stability that were described in Section 1.1. The offloading inference engine addresses this problem with a Fuzzy Control model [13], which has been shown to be effective for flexible, expressive, and stable coarse-grained application adaptations. The use of this approach in the offloading inference engine is novel because it applies the model to fine-grained application adaptation via runtime offloading. The Fuzzy Control model includes (1) linguistic decision-making rules provided by system or application developers, (2) membership functions, and (3) a generic fuzzy inference engine based on fuzzy logic theory. Based on the Fuzzy Control model, the offloading inference engine’s Offloading rules can be specified as follows, 6

Confidence

lingvar AvailMem on [0,8000] with class low is 0 0 800 900 class moderate is 850 1500 4000 4200 class high is 4150 5500 7500 8000 end

Low/ Moderate

1

Moderate/ High

Moderate

High

Low 0

800

850

900

1500

4200 4000 5500 4150

7500 8000 Available

Memory (KB)

(a)

(b)

Figure 4. Illustration of membership function definition for the linguistic variable AvailMem.

if (AvailMem is low) and (AvailBW is high) then NewMemSize := low; if (AvailMem is low) and (AvailBW is moderate) then NewMemSize := average; if (AvailMem is high) and (AvailBW is low) then NewMemSize := high; The AvailMem and AvailBW variables are input linguistic variables that represent the current available memory and available wireless bandwidth, respectively. The NewMemSize variable is the output linguistic variable representing the new memory utilization on the mobile device. If any of these rules is matched by the current system conditions, the offloading inference engine triggers offloading and derives the offloading memory size using (current memory consumption - new memory utilization). If the difference is negative, it means that some program objects should be pulled back from the surrogate to adapt to low wireless bandwidth. The application developer or the user can easily configure the offloading inference engine using the linguistic offloading rules. However, to interpret the linguistic offloading rules, the offloading inference engine needs to establish mappings between numerical and linguistic values for each linguistic variable. Low, moderate, and high are called linguistic values. In fuzzy logic, the mapping between the numerical value of a linguistic variable and its linguistic values are defined by a membership function. For example, Figure 4 (a) shows the membership function definition for the linguistic variable AvailMem. Figure 4 (b) gives the graph representation of the corresponding membership function. In this example, if the numerical value of AvailMem is within the range [0,800], the offloading inference engine’s stochastic confidence that AvailMem belongs to the set of linguistic value low is 100%. If the numerical value of AvailMem is within [800,900], the confidence that AvailMem belongs to low is the linear decreasing function from 100% to 0%. The intersection between different linguistic values represents uncertainty in stochastic confidence and the result can belong to either linguistic value “low” or “moderate” but with different confidence probabilities. Membership functions are provided by the application developer. The generic fuzzy inference engine implements the fuzzy-logic-based mapping and non-linear adaptation process. It takes the confidence values of fuzzy sets (e.g., low, average, and high) as inputs and generates outputs in the form of confidence values of fuzzy sets for output variables (e.g., NewMemSize). Hence, to use the generic fuzzy inference engine, the offloading inference engine provides two functions, fuzzification to prepare input fuzzy sets for the generic fuzzy inference engine, and defuzzification to convert the output fuzzy sets to actual offloading decisions, such as the new memory utilization on the mobile device.

3.2 Application Partition Selection The offloading inference engine selects the best application partitioning from a group of candidate partition plans generated by the offloading platform. First, the offloading inference engine considers the target memory utilization on the mobile device to rule out partition plans that do not meet this minimum requirement. Then the offloading inference engine selects the best partitioning from the remaining candidate partition plans by using 7

a composite partition cost metric. In the case of memory offloading for overcoming the memory constraints of mobile devices, the user can have multiple offloading requirements such as minimizing wireless bandwidth overhead, minimizing average response time stretch, and minimizing total execution time. The wireless bandwidth cost comes from two factors: (1) migration of program objects during offloading, and (2) remote function calls and remote data accesses. The average response time stretch is decided by the total number of all remote invocations. The total execution time stretch caused by offloading includes all migration delays and remote interaction delays. The offloading inference engine addresses the problem by comprehensively considering different inter-class dependencies and interactions during application execution. For each neighbor node vk of vi , we use bi,k to denote the total amount of data traffic transferred between vi and vk , we use fi,k to define a total interaction number, and we use M Sk to represent the memory size of vk . Thus, the offloading inference engine uses a composite cost metric for the application execution graph edge between vi and vk : Ck , < bi,k , fi,k , M Sk >. In [8], we have shown that such a composite cost metric is most effective for meeting different offloading requirements. The partitioning cost of a candidate partition plan is the aggregated costs of all edges whose end-points belong to different partitions. The offloading inference engine then selects the best partition plan that minimizes the partitioning cost. Suppose bmax , f max , and M S max represent the upper bound of all possible bi,k , fi,k , and M Sk . We define the comparison of two cost metrics Ck and Cl as follows: Ck ≥ Cl if and only if w1 ·

bi,k − bi,l fi,k − fi,l M Sl − M Sk + w2 · + w3 · >0 max max b f M S max 3 X wi = 1, 0 ≤ wi ≤ 1, 1 ≤ i ≤ 3

(1) (2)

i=1

The above comparison equation implies that the offloading inference engine always keeps the Java classes that are more active (i.e., that have larger bandwidth requirements and interaction frequencies) and have smaller memory sizes on the mobile device. On the other hand, the offloading inference engine intends to offload the Java classes that are more isolated (i.e., that have smaller bandwidth requirements and interaction frequencies) and have larger memory sizes to the surrogate. To allow customization, we use wi (1 ≤ i ≤ 3) to represent the importance of the ith factor (e.g., wireless bandwidth, remote interaction delay, and memory size of the Java class) in making the offloading decision. These weights can be adaptively configured according to application requirements and user preferences. For example, if the user cares more about the interaction delay while having plentiful wireless bandwidth, we can set w1 to a lower value and w2 to a higher value.

3.3 Splitting Large Classes As we mentioned before, classes were selected as the execution graph nodes. However, in practice, we found that the memory size of some classes is too large to treat the class as a single node. For example, the String class in the JavaNote application occupied 5.9 MB during execution. If we offload this large class, it will cause large migration and remote invocation overhead. If we do not offload it, we cannot meet the memory constraint of the mobile device. Hence, if the memory size of any Java class exceeds a certain threshold, we create a new node in the execution graph to represent the class. The objects belonging to each large class are distributed into several parts, each of which represents a node in the execution graph. Thus, each large class node is split into several nodes with smaller memory sizes to enable more precise control of memory offloading.

4 Trace-Driven Simulation Experiments We conduct trace-driven simulation experiments on an IBM Thinkpad T22 running Redhat Linux 7.1. We first collect the execution traces using the three benchmark applications described in Table 1. The execution trace files 8

Program DIA Biomer JavaNote

Description Java image editor Graphical molecular editor Java text editor

Operation Open a 180 KB picture image and drag it around Create three complex molecules Open a 600 KB text file

Lifetime

Peak memory Req.

174 s

8,949 KB

261 s 268 s

10,668 KB 7,972 KB

Table 1. Descriptions of the application suite used in our experiments

record method invocations, data field accesses, and object creations/deletions by querying the instrumented JVM. We use ChaiVM, HP’s JVM for embedded and real-time systems, for our experiments. The wireless network traces are collected on the same laptop with an IEEE 802.11 WaveLAN network card using the Ping system utility and available bandwidth measurement tools [10]. The surrogate is represented by a desktop that sits in an office room. The mobile roaming scenario we selected for evaluation was conducted in the computer science department building of the University of Illinois at Urbana-Champaign. The wireless network trace was obtained by having a person with the mobile device start in the office room, enter an elevator and ride it to the basement, and then exit the elevator and walk to a stairway. The measured wireless bandwidth maintains around 4.8 Mbps until the person enters the elevator where it drops to about 2.4 Mbps. It then rises to about 3.6 Mbps when the person walks through the basement. Because the size of the parameters used for function interactions and data accesses is quite small (i.e., < 64 bytes in all execution traces), we only measure the round trip time (RTT) for small data packets, which is about 2.4 ms. The simulator is driven by the execution and network traces described above. The simulator emulates a remote interaction by stretching the total execution time. The remote data access delay is the time duration between sending a request to the remote site and receiving the requested data, which is approximately equal to the RTT. The remote function call delay is the time duration to redirect a function request to the remote site, which is close to half of the RTT, each way. The migration delay is simulated by increasing execution time using the equation, ΣM emory classes to be migrated /current available bandwidth. We set the Java heap size to 8 MB for DIA and Biomer, and to 7 MB for JavaNote, according to their peak memory requirements, shown in Table 1. We consider three performance metrics: (1) total offloading delay, which consists of migration delay, remote data access delay, and remote function call delay. These delays extend the total execution time of an application; (2) average interaction stretch, which represents the average interaction delay stretch caused by remote data accesses and remote function calls; and (3) total bandwidth requirement, which is measured by the sum of the total size of the migrated objects and the total size of the parameters that are passed during remote interactions. Finally, we compare the inference time of different approaches to the offloading triggering (i.e., using hard-coded simple rules or fuzzy control). For comparison, we implemented the common memory management heuristic algorithm least recently used (LRU). The LRU algorithm adopts a simple offloading rule, which triggers offloading when the available memory is lower than 5% of total memory, and sets a new target memory utilization as 80% of total memory. During application partitioning, the LRU algorithm offloads the classes that are least recently used according to the AccessFreq field of each class. We then enhance the LRU algorithm by splitting large class nodes into smaller ones with memory sizes smaller than 500 KB, denoted by SplitClass. Next, we enhance the SplitClass algorithm by replacing its simple threshold-based rules with the fuzzy-control-based offloading triggering, which are denoted by Fuzzy Trigger. Finally, we run the complete offloading inference engine algorithm with partitioning selection using the composite metric defined by Equation (1), where the weights wi (1 ≤ i ≤ 3) are set as equal. We use our approach to denote the complete offloading inference engine algorithm. In [8], we reported the performance comparisons between the composite and other simple metrics (e.g., access frequency or bandwidth requirement alone), which have shown that the composite metric performs better than the simple metrics. 9

1.2 Comparison ratio (average interaction stretch)

Comparison ratio (Total offloading delay )

1.2 1 0.8 DIA 0.6

Biomer JavaNote

0.4 0.2

1 0.8 DIA 0.6

Biomer JavaNote

0.4 0.2 0

0

LRU LRU

SplitClass

Different decision-making approaches

Fuzzy Trigger

Our Approach

Different decision-making approaches

Figure 5. Comparison of total offloading delay by four different decision-making approaches.

Figure 6. Comparison of average interaction stretch by four different decisionmaking approaches.

1.2 350

1 0.8

Execution Time (seconds)

Comparison ratio (total bandwidth requirement)

SplitClass

Fuzzy Trigger Our Approach

DIA Biomer JavaNote

0.6 0.4 0.2 0 LRU

LRU/Split

Fuzzy Trigger

Our Approach

300

X

250

No offloading, Constrained Memory

200

No offloading, Sufficient Memory

150 100

Offloading, Constrained Memory

X

50 0 JavaNote

Different decision-making approaches

Biomer Applications

Figure 7. Comparison of total bandwidth requirement by four different decisionmaking approaches.

Figure 8. Execution time of two applications under the offloading prototype.

We first compare the total offloading delay among the above four different decision-making approaches, illustrated by Figure 5. For comparison, we normalize the delay value of SplitClass, Fuzzy Trigger, and Our Approach to the value of the LRU algorithm. The results show that splitting large classes can reduce the total offloading delay by as much as 60% compared to the simple LRU algorithm. Compared to the SplitClass algorithm that uses simple triggering rules, the Fuzzy Trigger algorithm can further reduce the total offloading delay by as much as 44%. Finally, the offloading inference engine consistently achieves the lowest offloading delay for all three applications. We conduct a similar comparative study for the other two performance metrics, average interaction stretch and total bandwidth requirement, illustrated by Figure 6 and Figure 7, respectively. The results show that splitting large classes can reduce the average interaction stretch by as much as 90% and decrease the bandwidth requirement by as much as 54%, compared to LRU algorithm. Compared to SplitClass, which uses the simple triggering rule, the Fuzzy Trigger algorithm can reduce the average interaction stretch by as much as 100% and decrease the bandwidth requirement by as much as 47%. Compared to the Fuzzy Trigger, our approach can further reduce the average interaction stretch by as much as 62% and decrease the bandwidth requirement by as much as 52%. With regard to the inference overhead, the fuzzy-control-based offloading inference requires twice the inference

10

time used by the simple threshold-based triggering, which is about 0.06 ms.

5 Prototype Experiments We have developed a prototype of the distributed runtime offloading system. The prototype provides a simple working environment that encapsulates the functionality in the trace-driven simulator. Like the simulator, it uses a modified version of HP’s ChaiVM to monitor applications and resources, and uses that information to partition applications. In the prototype, the ChaiVM has been extended to support transparent RPC introduced by Section 2.5. A PDA-style mobile device is emulated by an old 266 Mhz HP laptop with an 11 Mbps IEEE 802.11b PCMCIA card operating on a shared network. A 733 Mhz HP Kayak PC workstation with 128 MB of memory is used to represent the surrogate server. The surrogate is attached at 10Mbps to the corporate wireless network using a network switch that leads to the access point. Both machines run Redhat Linux 6.2 with the Linux 2.4.16 kernel. We use JavaNote and Biomer as our case study applications. We perform the same operations as in the simulation study, which are described in Table 1. To investigate the monitoring overhead, we evaluate the JavaNote application running on our prototype in a single-site mode that only performs monitoring. By using an 8 MB Java heap, the application can execute without running out of memory. To compare, we re-ran this same configuration without monitoring enabled. Our un-optimized implementation using hashed-arrays shows that the monitoring overhead is around 7% of the total execution time. We believe that the monitoring overhead can be reduced by further improving the implementation of the system. Next, we evaluate the additional memory footprint required by the offloading platform. Because the virtual machine already maintains a lot of states we need, this memory footprint growth is mainly the result of the application execution graph. In several runs of the JavaNote application, we observe 138 class creation/deletion events and 1.1 million interaction events that are almost evenly divided between function calls and data accesses. Information is recorded only for interactions between two different classes, which includes the total number of interactions and total number of transferred bytes. In our un-optimized implementation, each class node consumes 13 bytes, while each execution graph edge consumes 12 bytes. This is equivalent to a memory footprint of 15 KB for the JavaNote application. We believe that this amount of storage is acceptable for our system and could be optimized further. To evaluate the run-time performance of memory offloaded applications, we compare the execution time of JavaNote and Biomer on the offloading prototype against normal execution on an un-modified JVM. First, we use the prototype to determine the baseline memory that allows each application to run to completion successfully. We then select a smaller memory size to emulate the memory constraint. For JavaNote, the baseline memory is 7.9 MB and a constrained memory of 6.1 MB is selected. For Biomer, the baseline memory is 10.7 MB and a constrained memory of 8 MB is selected. For simplicity, the prototype triggers offloading when the available memory is lower than 5% of total memory, and tries to free 20% of total memory by offloading. Figure 8 shows the results for this experiment. Without offloading, the constrained memory causes the application to crash when the heap becomes full. The bar shows the time until the execution exited. These bars are labelled with small explosion symbols (X) to indicate that the applications fail before completion. When run with enough heap, the applications take longer to run because they do more processing, and are a better baseline for comparison to an offloaded run. When run with offloading under a constrained heap, the system runs between 1.5% and 5.7% slower than the baseline heap size without offloading. This increased execution cost is the difference between improved performance by borrowing memory from the surrogate minus the cost of monitoring, performing remote accesses, and partitioning. We believe that this low performance cost is well worth the benefit of allowing applications to execute normally rather than crash because of insufficient memory.

11

6 Related Work Besides the related work mentioned in the Introduction, our work is also related to the Spectra project [4], which proposed a remote execution system for mobile devices used in pervasive computing. Spectra can generate a distributed execution plan that balances the competing goals of performance, energy conservation, and application quality. The Puppeteer project [12] supports adaptations without modifying applications. The MONET research group [7] proposed a dynamic service composition and distribution framework for delivering componentbased applications in pervasive computing. To support application-specific adaptation, application developers are provided with meta-level programming tools for deploying their applications in pervasive computing [9, 18, 3]. However, the above work assumes that the application is already written in a component-based fashion and has exported component interfaces to the system. Other closely related work includes research work on application partitioning. The Coign [11] project proposed a system to statically partition binary applications built from COM components. Unlike Coign, our approach performs dynamic runtime partitioning without any off-line profiling. Furthermore, we do not assume component-based applications written in COM components. However, our approach is not without a few limitations. First, because applications are not designed to be aware of their distributed execution, error modes (e.g., loss of communication with the surrogate) can affect executions. Our assumption is that wireless networking technologies such as IEEE 802.11 or Ultra Wide Band in the future will provide fairly good communication coverage in local areas. Second, not all applications are readily partitionable, either because they are too tightly coupled (e.g. it is hard to split a video codec) or badly written (e.g., one large class file is used rather than an object-oriented approach). However, we believe that a large category of applications do not fall into these corner cases and provide good candidates for partitioned offloading.

7 Conclusion and Future Directions We have presented an adaptive offloading system for pervasive computing, which includes the offloading platform support and the offloading inference engine. The adaptive offloading system enables pervasive application delivery without degrading application fidelity or incurring expensive application rewriting. The offloading inference engine makes offloading decisions without assuming any prior knowledge about the application’s execution or system/network conditions in pervasive computing. First, we address two decision-making problems in the runtime offloading system, namely adaptive offloading triggering and efficient partition selection. Second, we use the Fuzzy Control model to achieve adaptability, configurability and stability when making the fine-grained offloading triggering decision. Third, we propose a composite metric for selecting the efficient partition that can simultaneously satisfy multiple user requirements. Our extensive trace-driven evaluations show that with the offloading inference engine, runtime offloading can effectively relieve memory constraints for mobile devices with much lower overhead than other common approaches. Our prototype experiments show that execution and memory overheads introduced by the adaptive offloading system are acceptable. Future research directions for the adaptive offloading system include (1) applying the offloading approach to relieving other resource constraints on the mobile device, such as constraints related to CPU speed and battery lifetime; and (2) supporting the use of multiple surrogates for offloading.

References [1] The Jini Network Technology. http://wwws.sun.com/software/jini/. [2] Universal Plug and Play Forum. http://www.upnp.org/. [3] V. Adve, V. Vi Lam, and B. Ensink. Language and Compiler Support for Adaptive Distributed Applications. Proc. of ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems (OM 2001), Snowbird, Utah, June 2001.

12

[4] J. Flinn, S. Park, and M. Satyanarayanan. Balancing Performance, Energy, and Quality in Pervasive Computing. Proc. of 22nd IEEE International Conference on Distributed Computing Systems (ICDCS 2002), Vienna, Austria, July 2002. [5] A. Fox, S. D. Gribble, Y. Chawathe, and E. A. Brewer. Adapting to Network and Client Variation Using Active Proxies: Lessons and Perspectives. IEEE Personal Communications, Special issue on adapting to network and client variability, August 1998. [6] M. Garey and D. Johnson. Computers and Intractibility: A Guide to the Theory of NP-Completeness. W.H. Freeman Company, 1979. [7] X. Gu and K. Nahrstedt. Dynamic QoS-Aware Multimedia Service Configuration in Ubiquitous Computing Environments. Proc. of 22nd IEEE International Conference on Distributed Computing Systems (ICDCS 2002), Vienna, Austria, July 2002. [8] X. Gu, K. Nahrstedt, A. Messer, I. Greenberg, and D. Milojicic. Adaptive Offloading Inference for Delivering Applications in Pervasive Computing Environment. Proc. of IEEE International Conference on Pervasive Computing and Communications (PerCom 2003), Dallas-Fort Worth, Texas, March 2003. [9] X. Gu, K. Nahrstedt, W. Yuan, D. Wichadakul, and D. Xu. An XML-based QoS Enabling Language for the Web. Journal of Visual Language and Computing (JVLC), Special Issue on Multimedia Language for the Web, 13(1), pp. 61-95, 2002. [10] N. Hu and P. Steenkiste. Evaluation and Characterization of Available Bandwidth Probing Techniques. IEEE JSAC Special Issue in Internet and WWW Measurement, Mapping, and Modeling (to appear), 2003. [11] G. C. Hunt and M. L. Scott. The Coign Automatic Distributed Partitioning System. Proc. of the 3rd USENIX Symposium on Operating System Design and Implementation (OSDI’99), February 1999. [12] E. Lara, D. S. Wallach, and W. Zwaenepoel. Puppeteer: Component-based Adaptation for Mobile Computing. Proc. of 3rd USENIX Symposium on Internet Technologies and Systems, March 2001. [13] B. Li and K. Nahrstedt. A Control-based Middleware Framework for Quality of Service Adaptations. IEEE Journal of Selected Areas in Communications, Special Issue on Service Enabling Platforms, 17(9), September 1999. [14] A. Messer, I. Greenberg, P. Bernadat, D. Milojicic, D. Chen, T.J. Guili, and X. Gu. Towards a Distributed Platform for Resource-Constrained Devices. Proc. of IEEE 22nd International Conference on Distributed Computing Systems (ICDCS 2002), Vienna, Austria, July 2002. [15] B. D. Noble. System Support for Mobile, Adaptive Applications. IEEE Personal Coomunications, 7(1), February 2000. [16] B. D. Noble, M. Satyanarayanan, D. Narayanan, J. Eric Tilton, J. Flinn, and K. R. Walker. Agile Application-Aware Adaptation for Mobility. Proc. of the 16th ACM Symposium on Operating Systems Principles (SOSP), Saint-Malo, France, October 1997. [17] M. Stoer and F. Wagner. A simple min-cut algorithm. Journal of the ACM, 44(4), pp.585-591, July 1997. [18] D. Wichadakul, X. Gu, and K. Nahrstedt. A Programming Framework for Quality-Aware Ubiquitous Multimedia Applications. Proc. of ACM Multimedia 2002, Juan Les Pins, France, December 2002.

13