Runtime Performance Modeling and Measurement ... - Semantic Scholar

3 downloads 103502 Views 537KB Size Report
Many distributed object application domains, such as military, financial, and health care, have stringent quality of service (QoS) requirements. Hosting these ...
Runtime Performance Modeling and Measurement of Adaptive Distributed Object Applications John Zinky, Joseph Loyall, Richard Shapiro BBN Technologies, 10 Moulton Street, Cambridge, MA USA {jzinky, jloyall, rshapiro}@bbn.com

Abstract. Distributed applications that can adapt at runtime to changing quality of service (QoS) require a model of the expected QoS and of the possible application adaptations. QoS models in turn require runtime measurements, both inband and out-of-band, from across the application’s components. As the comprehensiveness of the model increases, so does the quality of adaptation. But eventually the increasing comprehensiveness becomes too complex for the QoS Designer to deal with effectively. In addition, performance models of any complexity are expensive to create and maintain at runtime. The QoS Designer therefore needs a set of distributed-QoS tools to assist in the construction of models, the handling of quality vs. complexity tradeoffs, and the efficient maintenance of models at runtime. This paper describes the Quality Objects (QuO) middleware support that provides for developing a performance model; collecting and organizing run-time measurements of a system, both in-band and outof-band; and maintaining the model at runtime in an efficient way.

1

Introduction

Many distributed object application domains, such as military, financial, and health care, have stringent quality of service (QoS) requirements. Hosting these applications on wide-area networks, with their unpredictable and changing resource availability, and in embedded environments, with their constrained resources, requires applications to be aware of the resources in their environment and able to adapt in multiple ways to changes in resource availability. Based on these two requirements, an adaptation strategy must be developed that chooses an application behavior for any given environmental situation. The design of this strategy is the job of the QoS Designer, a role which is distinct from, and complementary to, that of the Application Designer who develops the functional algorithmic code of the application. A performance model predicts how the application will behave given the usage patterns, underlying resources, QoS requirements, and adaptation strategy. The appropriate level of detail in the model depends on the available adaptations and resource parameters, and on the price the application is willing to pay for adaptivity. A more detailed model provides finer-grained control over the application's QoS behavior, but also requires more effort to construct at design time and to keep current at runtime. The examples will illustrate the tradeoffs between complex fine-grained modeling and simpler but coarser models.

CLIENT Delegate Probes

OBJECT OBJECT (SERVANT)

Contract Measured QoS

Expected QoS

Delegate Probes

Resource Status Service Infer Integrate Correlate Translate Probes Collect Piggybacked Measurements Status Probes ORB

Resource

Resource

Probes Resource

ORB

Fig. 1. In-band and out-of-band measures

This paper describes the support that QuO middleware provides for collecting runtime measurements of the system, for developing a performance model of the application and its environment, and for maintaining the performance model at runtime. Maintaining this model and its data is an important part of the QoS Designer’s work in creating an adaptive strategy. One of the major recent advances in the QuO middleware is the development of the Resource Status Service (RSS), a distributed service for measuring, aggregating, and disseminating resource information in a distributed system. In this paper, we describe the RSS and its use in creating and maintaining performance models of runtime systems. Figure 1 shows how QuO runtime supports both in-band measurements and outof-band expectations of QoS parameters. In-band measurements are inserted directly into the function call tree. This provides actual end-to-end QoS measurements of remote function calls using specific resources. Out-of-band measurements monitor the system resources and try to infer expected QoS. Integrating these two kinds of measurements is at the heart of any adaptation strategy. The paper is organized as follows. Section 2 provides a brief overview of the QuO middleware. This section may be skipped if you’re already familiar with QuO. Section 3 introduces an example distributed object application, an image server system developed using QuO, which is part of our open-source software toolkit and will serve as a running example throughout the rest of the paper. Section 4 describes QuO’s support for gathering in-band measurements. Section 5 describes QuO’s support for out-of-band measurements, including the RSS. Section 6 describes QuO support for creating efficient runtime models. Finally, Section 7 describes how to calibrate the performance models, by combining in-band performance measurements and resource capacity measurements. Each section includes examples based on the image server application.

2

The QuO Framework for Adaptive Applications

The Quality Objects (QuO) framework is an extension to traditional distributed object middleware, such as CORBA and RMI, which manage the functional interactions between objects. In ideal circumstances, CORBA and RMI can give the illusion that remote objects are local. Where resources are limited and QoS requirements are stringent, this illusion is impossible to maintain. In the traditional approach, the algorithms for managing adaptation to constrained resources are entangled with the application’s functional algorithms. This results in overly complicated code that’s very difficult to maintain and extend. QuO provides support for programming QoS measurement, control, and adaptation in the middleware layer, separating the system specific and adaptive code from the functional code of the client and object implementations. In this way, QuO supports reuse of adaptive code and eases the application programmer’s burden of programming system issues. As illustrated in Figure 2, a QuO application extends the traditional distributed object computing (DOC) model with the following components: QuO contracts summarize an application’s current operating mode, expected resource utilization, rules for transition among operating states, and means for notifying applications of changes in QoS or in system status. Contract specifications are written in a high-level specification language called CDL and preprocessed into Java or C++ by the QuO code generator. System condition objects (Sysconds) provide interfaces to system resources, mechanisms, and managers. They provide high-level reusable interfaces to measure, manipulate, and control lower-level real-time control and measurement capabilities. They export values that describe facets of system status, such as the current memory utilization or priority of a running thread, and provide interfaces to control system characteristics, such as modifying the processor clock rate or scheduling priorities. QoS-aware Delegates are adaptive components that modify the system’s runtime behavior along the paths of method calls and returns. QuO delegates are implemented as wrappers on method stubs or skeletons, thereby inserting behavior between the client and server. Delegates are written in a high-level aspect language called ASL and converted into Java or C++ by the QuO code generator. Qoskets pull together contracts, delegates, and system conditions into reusable components that are independent of any specific functional interfaces. Combining a

in args

CLIENT

OBJECT

operation()

OBJ REF

(SERVANT)

out args + return value Contract

Contract

Delegate

SysCond SysCond

IDL

SysCond

Delegate

SysCond

IDL

MECHANISM/PROPERTY MANAGER

SKELETON

STUBS

ORB

IIOP

Network

IIOP

OBJECT ADAPTER

ORB

Fig. 2. The execution model of an adaptive application using QuO

Processed

UnProcessed

Big

Small

Fig. 3. The image server can produce images that are big or small, processed or unprocessed with different resource usage characteristics

functional interface with a Qosket makes a new object that implements the functional interface and manages some QoS aspect. In summary, QuO includes high-level specification languages, a code generator, a runtime kernel, libraries of reusable qoskets and system condition objects, QoS property managers. These components are described in detail in other papers [8, 14, 15, 17], as is the application of QuO to properties such as security [18] and dependability [1]. In this paper we concentrate on QuO’s support for developing a performance model of an application and its environment, and for maintaining the performance model at runtime.

3

An Example: Data Dissemination in a Wide-Area Network

As a running example in this paper, we will use an image server application that we have developed using the QuO adaptive middleware and which serves as the basis for many of our example and experimental applications. It consists of a remote data server maintaining a database of images and a client requesting images from the remote server. The data server has the capability of producing versions of the images of different sizes and of different quality as illustrated in Figure 3. The image server exposes interfaces enabling the client to request pictures that are “big” or “small”, “processed” or “unprocessed.” Big pictures use more CPU resources to display and more bandwidth to transmit than small pictures do. Processed pictures use more CPU resources on the server side to improve the image quality. The challenge for the QoS designer is to program into the application an adaptive tradeoff between timeliness and quality. The user wants the best picture, but is not willing to wait very long. Better pictures (bigger and processed) take longer to transmit because they take more resources. The application needs to measure the timeliness of image delivery, and when round-trip image delivery and processing slows, the

application needs to gather enough information to determine whether the source of the slowdown is network or CPU degradation and adapt accordingly. As a basic example of adaptation, we use a qosket, called Bottleneck, which partitions the operating environment along the dimensions of bandwidth and CPU resources. Bottleneck’s contract has four regions with high and low server CPU along one dimension and high and low network bandwidth along the other. The Bottleneck qosket also includes system condition objects for determining the status of the runtime environment, used by the contract to determine the high and low regions. The qosket encapsulates all the behavior needed to measure the relevant system resources and to determine whether the constrained resource is the network, the CPU, or both. Note that the qosket is completely independent of the application and is therefore reusable. When the QoS designer combines the Bottleneck qosket with the image server application, he specifies a binding of the contract regions to method calls on the remote object using QuO’s Adaptation Specification Language, ASL [12]. The QuO code generator creates a delegate that calls the appropriate server methods based on the Bottleneck contract region. While the functional application continues to call the original remote read method, the delegate will transparently substitute calls to other methods, depending on the state of the resources. When there are no resource bottlenecks, readBigProcessed is used because it gives the best picture. When both CPU and bandwidth resources are scarce, readSmallUnprocessed is used because it reduces the time to process and transmit the picture. Likewise, readSmallProcessed is used when the Network is the only bottleneck and readBigUnprocessed is used when the CPU is the only bottleneck. This is a simple strategy for trading off timeliness and quality with respect to the system constraints. The image server application is typical of several of our experimental and transitioned applications, differing in the images they provide, the system in which they are hosted, and the adaptation choices they offer to maintain QoS. For example, in the avionics example described in [9], the image server delivers virtual target folders, which contain map images with notations and other information. The client and server are embedded in separate aircraft and communicate through a wireless link. Because of the extremely constrained bandwidth of the wireless link and the large size of the data images, the server offers the choice to break each image into smaller tiles, which are delivered separately and reassembled by the client, and to choose the quality of each tile. Higher quality tiles use more bandwidth and CPU. In the dependable server application described in [10], the image server provides two interfaces, one that authenticates requests and services them with a secure server and another that does not authenticate requests. In this application, the client has the option to tradeoff security for speed, since the authenticating server requires extra data (and thus uses more bandwidth) for the authentication and more time and CPU to validate the identity of the requester. The QoS designer can design many schemes to resolve the tradeoff between picture quality and call latency. The cleverness and appropriateness of the specific adaptation scheme developed is irrelevant to this discussion. What is important is the kind of adaptation schemes that are possible and how well QuO mechanisms support them. In the following sections, we will show how additional system information made available by measurement and modeling can be used to help refine the adaptation.

4

In-band Instrumentation

The basic idea of in-band instrumentation is to insert measurement points along the call path from the client to the server and back. This instrumentation gathers measurements along the method call and return as illustrated in Figure 4, measuring things such as the number of calls; round trip latency; the time spent in the network; or the effective capacity of some underlying resource. The problem with adding instrumentation in traditional applications is that the code has to be placed at many places along the path and the results gathered together for processing and dissemination. Adding instrumentation code breaks the normal boundaries/interfaces as defined by the functional decomposition of the distributed system. Special support is needed to add this code and use its results without creating a tangled mess. CORBA provides some support for inserting instrumentation into the data path between the client and server. CORBA interceptors [11] allow requests and replies to be intercepted at several points during the transmission of a remote call. CORBA Pluggable Protocols [7] allow new transport protocols to be used instead of the default IIOP protocol over TCP. The new transport protocols can add instrumentation to the messages to measure QoS properties and can provide control over network resources using RSVP [19] or Diffserv [5]. Other distributed object middleware, such as Java RMI, do not have CORBA’s open implementation, so instrumentation must be added above or below the equivalent of the ORB. For instrumentation below the ORB, QuO uses a Gateway Shell [14], which can intercept method calls and manage their QoS as request and reply messages are transmitted over the network resources. QuO supports above the ORB instrumentation for both CORBA and RMI. QuO’s ASL language and code generator uses aspect-oriented programming (AOP) techniques to weave code inside methods for both the client-side and server-side delegates. Instrumentation code often needs to be added to all methods in an interface, e.g., adding a timer call before and after each remote method call. Features of native languages, such as Java and C++, do not readily support code that cross-cuts many methods, although Java’s class reflection [16] could be used to query an object for a list of all its methods and construct an instrumentation delegate at runtime. QuO uses

in args

CLIENT

OBJECT

operation()

OBJ REF

(SERVANT)

out args + return value Contract

Contract

Delegate

SysCond SysCond

IDL

SysCond

Delegate

SysCond

IDL

MECHANISM/PROPERTY MANAGER

SKELETON

STUBS

ORB

IIOP

Network

IIOP

OBJECT ADAPTER

ORB

Fig. 4. QuO’s in-band instrumentation gathers information along the remote method call and return

an approach to support instrumentation in many methods similar to those in other aspect-oriented languages, such as AspectJ. However, unlike AspectJ, QuO supports both Java and C++, and supports weaving code across distributed systems. 4.1

Example: Client-Side Latency Measurement

Suppose that the QoS designer needs to keep the latency of a call in the image server application below a threshold, so that delivery of images is smooth. If the latency is too high, the read method could downshift to a remote method that uses fewer resources, such as from readBigProcessed to readBigUnprocessed, to readSmallUnprocessed. The QoS designer can create a contract that implements the downshift as a state machine, but needs access to a measure of the latency to trigger the downshift. QuO’s contract evaluation mechanism includes support for measuring method latency. Usually, a QuO contract is evaluated before and after a remote method is called. QuO provides a Probe Syscond class, illustrated in Figure 2, which catches the contract evaluation signal and measures the method latency. Different types of Probe Syscond can process the raw latency into statistical metrics, such as the average latency over the last ten calls. When the contract downshifts to a new region, the latency is expected to go down, but the averaging mechanism still remembers old values. To avoid controlling based on old values, the new behavior must be locked in until the statistics converge to a meaningful value. QuO contracts support locking the contract into the current region until the statistics have stabilized. 4.2

Example: Correlated Server and Client Latency

Suppose the QoS designer needs to determine which resource is the bottleneck. In the last example, the downshift behavior arbitrarily chose to reduce the server load (adapting to readBigUnprocessed). But the cause of the latency problem could just as well have been the network. The client side can measure end-to-end QoS characteristics, such as the overall latency or the sizes of requests and replies, but cannot determine which sub-components are contributing to the latency. To determine the relative contribution of sub-components, timers must be set and read as the call enters and leaves each component (identified by the yellow arrows in Figure 4). The measurements can be compared to differentiate between components. The approach is to pass a trace record from the client to the server and back, so that the client can determine the amount of time spent using different resources, such as network bandwidth and server-side CPU. The ASL for adding a trace record is more complicated than adding simple behavior calls, because it needs to add an additional parameter to the interface to carry the trace record between the client and server. But if the interface is being changed at the client, the server must also change its interface to support the new trace record parameter. The consequence is that the remote object now has two interfaces, the normal interface and one with the instrumentation parameter. QuO includes a reusable qosket

that manages adding a trace record and processing the results to get the relative network and server latency. 4.3

Example: Regression for Resource Capacity

Suppose the QoS designer needs to know the capacity of a resource such as the network bandwidth. The measured bandwidth could be used as a threshold to determine if the network capacity is high or low. The QoS designer can use ASL to invoke statistical processing of the trace record when it’s returned to the client. QuO has a default linear regression library that can be used to fit a curve for resource capacity, based on the measured latency and load. Note that the ASL and regression library are not application specific. This set of tools can therefore be configured as a reusable qosket. 4.4

Example: Detecting a Drop in Effective Bandwidth

Suppose the QoS designer needs to detect a drop in the effective capacity of a resource, for example due to a sudden increase in cross traffic on the network link between the client and server. The contract could select small or big pictures based on a threshold for a resource’s capacity. But the QoS designer does not want the selection to thrash when the effective capacity is close to the threshold. One solution is to use hysteresis, where the threshold value going down is lower than the threshold value going up. QuO contracts support hysteresis by locking the contract in a region until some condition is true. The QoS designer can use the statistical properties of the fitted capacity to make the threshold even more robust. The upper error bound can be used for going down and the lower error bound can be used for going up. When the fit is poor the error bounds spread out making the system harder to transition between regions. When the underlying system changes abruptly, detecting the absolute magnitude of the change is difficult. Since the fit for effective capacity is based on past measurements, during a transition there are samples from both the old capacity and the new capacity. Figure 5 shows a capacity prediction using a box car filter of the last 10 image transfers. The small graphs show the fit for the effective capacity before, during, and after the onset of the traffic spike. Notice the left and right scatter plots: the slopes of the points accurately estimate the effective capacity and there are no outliers. The center scatter plot shows measurements during a transition between regions. It has measurements from both regions and therefore a less clear slope. The fit can have any value, based on the usage pattern during the transition; in this case, the fit underestimates the effective capacity. During a transition it is best to ignore the fit and lock down the contract region until the fit consists only of measurements of the new capacity.

Capacity Estimation with 800kbps Cross Traffic Spike 1400 1200 1000 Upper 95%

800

Expected (avg) Lower 95%

600 400 200 0

Time (invocations) Time 25 Fit Values

Time 47 Fit Values

700000 600000

Time 77 Fit Values

700000

700000

600000

600000

500000

500000

500000

400000

400000

400000

300000

300000

200000

200000

100000

100000

0

300000 200000 100000

0 0

500

1000

Latency (msec)

1500

2000

0 0

500

1000

Latency (msec)

1500

2000

0

500

1000

1500

2000

Latency (msec)

Fig. 5. Estimating bandwidth during a quick change

5

Out-of-band Instrumentation

Out-of-band instrumentation gathers performance information about the underlying system outside the context of remote method calls. Information is collected from the resources along the path between the client and the remote object. The information from different resources is integrated and translated into a consistent representation. Given a model of how the application uses the resources, the expected QoS can then be inferred (Figure 1). Out-of-band information is traditionally used by centralized network and system management applications. But here, we want this information disseminated to the applications themselves. The applications become aware of the underlying resources and can use this information to adapt, i.e., they become network-aware applications. CMU’s Remos [3] is an example of a collection system specifically designed to collect and integrate data from networks and hosts and disseminate them to applications. The Remos modeler is capable of representing individual resources or aggregate capacity, such as the bandwidth along a path in the network. Other resource managers also fill this role, such as the Globus Resource Monitor [6] and Desiderata Resource Manager [13].

Expected QoS RDL Ontology

Resource Status Service

Data Scopes Formulae Data Feeds

Model Level Resource Level Integration Level Translate Translate Store Store Collect Collect http

Custom

Translate Store Collect CORBA

Configuration

Remos

StatusTEC

(Base-Line)

(Network)

(Host)

Fig. 6. Architecture of the Resource Status Service Architecture

5.1

QuO Resource Status Service

QuO’s Resource Status Service (RSS) extends the idea of network-aware application by supporting both a resource model and a high-level application model. QuO’s RSS has an open implementation, with several well-defined integration points at which different types of out-of-band collectors can be added. Remos is an example of such collector. Figure 6 shows the QuO RSS architecture and its integration points. RSS is a completely distributed service; each application process effectively gets its own RSS server, which gathers information needed by the process. The RSS can be integrated into the application process (currently Java only) or can be located nearby as an independent CORBA server (currently necessary for C++). The local RSS server maintains a representation of the status of each resource used by the application process. The status is updated independently of the application even when the resource is remote. These status values are used to make adaptive decisions in critical data paths, which can’t afford to query for remote status on demand. The local status representation is the current best guess for the remote status. Any given RSS value can subscribe to other RSS values, in which case it will be updated whenever any of its dependant values changes. QuO system condition objects can also subscribe to RSS values. This data-driven forward chaining follows all the data dependencies and can ultimately trigger adaptation by means of the system condition objects. For example, a remote poller can detect a change in the status of a resource and disseminate the status change to all interested RSS servers. Inside an RSS server the status value can propagate through internal formulas resulting in a change to the value of system condition objects. QuO contracts are evaluated when their ob-

served system condition objects change value, which might result in a transition between contract regions. The transition code can trigger an application adaptation. Note that the forward chaining is asynchronous and independent of the operation of the application, which may or may not be using the resource at the time. The asynchronous forward chaining in RSS also improves cache consistency for ordinary system conditions. For example, a QuO delegate requests a contract evaluation in order to dispatch the appropriate adaptive behavior. Contract evaluations use the value of system condition objects, which by design are always available. So the evaluation returns quickly, because the system condition objects were being updated in anticipation of their being used. The RSS implementation has other mechanisms for ensuring that the status representation is updated, including support for backward chaining, in which data queries can trigger further queries for supporting data. 5.2

Data Values

Data in the RSS is represented as a multi-field record. As the value is processed, meta-data is added to help with integrating data from several sources. In the current implementation that meta-data fields consist of a timestamp, a source stamp, a credibility rating and a units tag. One of the key features of the RSS is the integration of several raw data values into a single integrated value or best guess. When new raw data arrives, the RSS must decide if the new raw value changes the current integrated value (since changes to integrated values trigger forward chaining). This decision is derived from the following considerations. The results are summarized in the integrated data’s credibility-rating. • • • •

Aggregation: the time period over which the observations were made. Staleness: how old the data is; Source type: the method used to collection the data Trust or Collector Authority: whether the collector of the information is trusted or the information is just hearsay. • Sensitivity: Whether a key component of the data has low credibility. 5.3

Data Feeds

No universally accepted standard exists for data collection, so the RSS takes the pragmatic approach of adding an integration point, called a Data Feed, for interfacing with different collection systems. A Data Feed is responsible for moving the data from a collection system into the RSS. Besides having to implement the collector’s data transfer protocol, the Data Feed must also manage which status values to collect and how often. Some collectors push data to the client, which matches well with the RSS’s internal publish and subscribe mechanisms. Other collectors must be polled. Also, the collector’s data format needs to be translated into the internal RSS format, which may include semantic as well as syntactic translation, such as adding credibility meta-data.

QuO 3.0 comes with three flavors of Data Feeds: static data via http (e.g. resource configurations); ad hoc dynamic data using special-purpose protocols (e.g. bandwidth information from Remos); and mixed static and dynamic data using a CORBA TypedEventChannel (e.g. host probes). QuO 3.0 comes with a CORBA TypedEventChannel (TEC) to disseminate status information from remote collectors [21]. 5.4

Example: Dynamic Calculation of Expected Resource Capacity

Suppose the QoS Designer wants to detect denial of service attacks. Since these kinds of attacks consume excessive resources, one indication would be a mismatch between measured and expected resource capacity. In-band measurement with regression (described in Section 4) can determine the current effective resource capacity but cannot offer any notion of what the capacity of the resource should be. One technique for computing expectation is to use historic in-band measurements, i.e. take “baseline” measurements over past runs of the application. But for highly dynamic environments, such as mobile networks, history is a bad predictor of the present. Other techniques are needed to calculate the expected resource capacity. The RSS manages the gathering of out-of-band measurements from which the expected capacity for resources can be calculated. It integrates the data from all the Data Feeds that are available. Some of these feeds are static (e.g. configuration data on a web page), some are dynamic (bandwidth capacity calculations from Remos), and the integration formulas must accommodate both. For example, if only the static web data is available, the RSS will use those values, but assign them a low credibility. If a Remos collector becomes available, the RSS will use those dynamic values instead of the static data and will increase the credibility correspondingly. The integration formulas can also calculate specific values from more general ones, for example by narrowing configuration information about bandwidth between subnets into an expected bandwidth between hosts on those subnets.

6

Runtime Performance Models

For the purposes of this paper, a runtime performance model is a shadowrepresentation of the running application, including representations of the underlying resources, the static structure of the application, and a specific application' instance. QuO RSS allows the explicit creation of such models. The model of underlying resources represents both the resources and their interconnections (topology). The resource model is analogous to a network management system with an internal model-object for each host and network connection. Network and System management modeling is a fairly mature technology and has extensive ontologies for classifying resources and their relationships [2]. The current RSS modeling of resources is a medium-grain implementation based on these ontologies. The representation of the application's structure is also an explicit part of the model. In this case the model-objects represent “interesting” classes from the application’s class hierarchy. Currently, the application models are hand coded. We plan to

automate this process using the QuO code generators or existing representations of application models such as UML. A model of an application instance combines both the resource representation and application structure representation. The application’s dynamic call tree is represented and has HasA links to the resources and IsA links to the application structure. The resulting model is a skeleton on which specific QoS predictors can be attached. Figure 7 shows how a predictor for method latency was added to the skeleton representing the application structure (Class), resources (Host) and call tree (Method). To support efficient runtime construction of model-objects, QuO RSS has two kinds of modeling components with rich meta-data (reflectivity). Data Scopes represent model-objects and their relationships. Data Formulas represent dynamic attributes of the model-objects. Model construction happens in two phases. The first step is creation of the modelobjects (Data Scopes), their attributes (Data-Formulas) and their relationships. Setting up the Data Scopes is a fairly expensive operation because resolving relationships involves walking many data structures. Setting up the Data Scopes is analogous to the application binding a client to a remote object. This operation usually happens only once and changes infrequently during the application's life cycle. Likewise Data Scopes are rarely created, but frequently evaluated. The second step is evaluating the Data Formulas. This operation is very efficient because dependencies and caches are established when Data Scopes are created. 6.1

Data Scopes

Data Scopes are a class of objects which have additional meta-data to allow reflectivity at runtime. The main feature of a Data Scope is that existence of its attributes is resolved at runtime and depends on the inter-relationships among Data Scopes. When a Data Scope is given an attribute name to resolve, it searches its local attributes and the attributes of other Data Scopes for which it has relationships. The search scheme is completely programmable by the QoS Designer, but some basic relationships are supported. The containment relationship (HasA) allows parents to share their attributes with their children. For example, if an object is contained in a host, the object has all the attributes of the host. Another relationship is proto-type, which can set the default values for types of objects. Also, the relationship reference describes how remote objects are hooked together. Figure 7 illustrates a graphical representation of an object served by a specific host. 6.2

Data Formulas

Data Formulas are a class of objects allowing for forward and backward chaining of calculations. Data Formulas manage a mathematical formula and keep track of the formula's dependencies. Data Formulas are part of a publish and subscribe mesh. When a formula value changes all the formulas that depend on the changed formula are reevaluated. Data Formulas can also handle other issues, such as data credibility,

Host capacity Process Object Method Method

Class Method Method Load

Latency= Latency=Load/Capacity Load/Capacity Fig. 7. A model of method latency using Data Scopes and Data Formulas

unit conversion, caching, and snapshotting. For example, Data Formulas allow for automatically handling missing or low credibility data. Figure 7 shows a Data Formula for latency being attached as an attribute for a Method Data Scope. The Latency Formula depends on other formulas that can be resolved in the context of the Method Scope. The resolution is done when the Latency Formula is created. The publish and subscribe mesh updates the Latency Formula when its dependent formulas change value. 6.3

Example: Predicting Object Latency

Suppose the QoS Designer wants to choose the fastest server to request a picture. The contract could pick the server based on a prediction of end-to-end latency, which includes object latency and network latency. The server-side method latency is the amount of time taken above the orb to process the picture. Network latency is the time to transmit the request and reply messages from the client to the server, including marshaling and transmission time. For this example we will predict just the serverside method latency. Server-side method latency depends solely on the server CPU resources. Method latency is longer if the picture is larger or if the CPU resources decrease. A model for the processing time of a picture is: latency is the number of instructions needed to process a picture divided by the effective capacity of the server CPU. The number of instructions is the size of the picture times the number of instructions it takes to process a byte. The effective capacity of a server depends on its raw CPU capacity and the competition from other processes. Linux exports an estimate of CPU capacity called BogoMips and an estimate of competing load called LoadAverage. The effect of CPU LoadAverage for Linux is a hard metric to model [4]. A simple model would be that

when the server process is not greedy and the read processing is small, the read request sneaks in and gets the whole CPU regardless of the LoadAverage. If the object is greedy, the CPU scheduler will multiplex the CPU between the waiting processes including the server. So the Load Average reduces the capacity. Also, the number of CPUs on the server reduces the effect of load average. MethodLatency = (( ReplySize * ReplyInstructionsPerByte + RequestSize*RequestInstructionsPerByte) /(CPUMips/Max(1,(LoadAverage/NumberCPUs))

Note this formula assumes that there are no queuing effects due to other method calls waiting to use the object. Also the formula assumes that the per-invocation load is negligible, i.e., the load is independent of picture size. The QoS designer can add more comprehensive formulas. Note that this formula is reuseable, i.e. this formula could be used to model the latency of any method call on any object instance. But the parameters of the formula are relative to the object's context. For example, each object instance is hosted on a different server, and each method on the object has a different ReplyInstPerByte. When the MethodLatency's Formula is bound to a method's Scope, the relative references are resolved by the Scope. For example, to find the BogoMips the method’s Scope would follow its containment relationships until it found a BogoMips formula on its host scope. To find ReplyInstPerByte, the Scope would follow the proto-type relationships, because this is a characteristic of this type of method. Allowing Scopes to dynamically resolve the parameter bindings is a powerful tool for reusing formulas. Also, because the RSS integrates data from many sources and picks the most credible, the formula is robust even if some of the data sources are missing.

7

Calibration of Models

In the Method Latency Formula of the last example, an application specific parameter appeared called ReplyInstructionsPerByte. This parameter is different for each method called on the server object. Separating out this parameter allows the Latency Formula to be decoupled from a specific type of hardware. But how does the QoS Designer determine this characteristic of a specific object implementation? One way to calibrate models is to run off-line performance experiments over a wide variety of hardware and software configurations. The results of these experiments can be used to fit parameters of the model. The QoS designer can use QuO instrumentation (described in Sections 4 and 5) to collect latency and load (picture size) measurements. QuO instrumentation can log this information to a file. After several runs of the application using different resources and load mixes the raw data can be fed to a regression program. The QoS designer must use knowledge of how the application is structured to determine the right kind of model to fit and the parameters needed. In the above example, a linear model based on picture size was used and gave adequate results. Higher order models or non-linear models can also be used as dictated by application implementation.

Once the calibration parameters are determined, they need to be loaded into the performance model at runtime. The QuO RSS Data Feed service allows the parameters to be published. Because the calibration is specific to an application implementation it should be stored with the application's static structure which is published as part of the configuration database on a central web page. Besides the calibration results being published using the RSS, the raw calibration data could also be published by the applications. We plan to investigate automatic calibration where different instances of the application publish the raw data and a central calibration service can process this data. One run of the application usually does not cover enough of the range of resources and usage patterns to get a good fit. So using data from multiple runs at different times and users may cover more than enough range. Also, having the applications publish raw measurements allows a summary of expected usage patterns which can be correlated by user and group. 7.1

Example: Comparing Benchmarks BogoMips vs JIPS

One of the problems in calibrating applications is determining the capacity of the host resources. Linux publishes an estimate of host capacity called BogoMips, which is based on the processor clock frequency and a fudge factor of the architecture type. But using just BogoMips is not good enough to predict application performance. When running Java on our test bed machines we have observed a factor of 100 difference between hosts, from old Sparc IPCs running JDK 1.1.7 to Xeon PCs running Java 1.3 with JIT. Besides the version of Java, the machine architecture and OS also changes the host's capacity to run Java applications. So we developed a simple Java benchmark that estimates the Java Instructions per Second (JIPS), which includes the effects of the Java VM, OS and host architecture. Figure 8 shows the results of calibrating the image server application against BogoMips and JIPS. All measurements used the same version of Java, but different processor types. The JIPS benchmark had a good fit with the image server application, whereas the BogoMips had outliers for non-Pentium II/III processors. The JIPS benchmark measures the processing time in a tight loop doing integer arithmetic, which matches well to image processing. This type of loop does not take advantage of the large cache of the Xeon processor or the fast bus speed of the Sun server. These machines perform well at other tasks such as compiling, but showed disappointing performance for the JIPS benchmarks. We plan to investigate using several types of JIPS benchmarks that can be compared to the application. When the application is calibrated, it can specify which benchmark best predicts the performance of the applications.

8

Conclusions

The QoS designer needs help. Adding adaptive behavior to existing systems is difficult. While QuO middleware supports developing reusable adaptive behaviors and adding them to existing applications, useful adaptation requires awareness and measurement of the dynamic conditions of the system. In this paper, we described several

JIPS Fit

BogoMips Fit 45 40 35 30 25 20 15 10 5 0

1600 1400 1200

y = 0.4778x

1000 800 600 400 200 0 0

1000

2000

Measured Capacity

3000

y = 0.018x

0

1000

2000

3000

Measured Capacity

Fig. 8. JIPS micro benchmark predicts CPU capacity over a wide range of processor types

ways to gather runtime measurements useful in triggering adaptive behavior. These included both in-band measurements gathered in the path of object interactions and out-of-band measurements gathered by direct observations of the system independent of application operation. We described the support that QuO provides for in-band and out-of-band measurements and for creating runtime performance models of the system, which help organize the awareness of resources. QuO’s RSS is a powerful reusable service that can be used to integrate available information into a coherent view of the underlying resources in a system. Accurate runtime performance models and dynamic system measurement are important to creating efficient adaptive behavior for a performance critical application. We illustrated our results in the context of a distributed image server application. The QuO software and the applications described in this paper are available open-source at http://quo.bbn.com.

References 1.

2. 3. 4.

Cukier M, Ren J, Sabnis C, Henke D, Pistole J, Sanders W, Bakken D, Berman M, Karr D, Schantz R. “AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects,” Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems, October 1998. DMTF, “Common Information Model (CIM) Standard. http://www.dmtf.org/standards/standard_cim.php Dinda P, Gross T, Karrer, R Lowekamp B, Miller N, Steenkiste P, and Sutherland S, The Architecture of the Remos System, 10th IEEE Symposium on High-Performance Distributed Computing (HPDC'01), IEEE, August 2001, San Francisco. Dinda P, Lowekamp B, Kallivokas L, O'Hallaron D, The Case For Prediction-based Besteffort Real-time Systems, 7th International Workshop on Parallel and Distributed Realtime Systems (WPDRTS 1999),

5. 6. 7. 8.

9.

10. 11. 12. 13. 14.

15.

16. 17.

18. 19. 20. 21.

IETF RFC 2475 “An Architecture for Differentiated Services” Czajkowski, Foster, Karonis, Kesselman, Martin, Smith, Tuecke. “A Resource Management Architecture for Metacomputing Systems” Proc. IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing, po. 62-82, 1998. Kuhns F, O’Ryan, Schmidt D, Othman 0, Parsons J, “The Design and Performance of a Pluggable Protocols Framework for Object Request Broker Middleware”, IFIP workshop on Protocols for High-speed Networks, Aug 1999. Loyall J, Bakken D, Schantz R, Zinky J, Karr D, Vanegas R, Anderson K. “QoS Aspect Languages and Their Runtime Integration,” Lecture Notes in Computer Science, 1511, Springer-Verlag. Proceedings of the Fourth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers (LCR98), Pittsburgh, Pennsylvania, 28-30 May 1998. Loyall JL, Gossett JM, Gill CD, Schantz RE, Zinky JA, Pal P, Shapiro R, Rodrigues C, Atighetchi M, Karr D. “Comparing and Contrasting Adaptive Middleware Support in Wide-Area and Embedded Distributed Object Applications,” Proceedings of the 21st IEEE International Conference on Distributed Computing Systems (ICDCS-21), April 1619, 2001, Phoenix, Arizona. Loyall JP, Pal PP, Schantz RE, Webber F. “Building Adaptive and Agile Applications Using Intrusion Detection and Response,” Proceedings of NDSS 2000, the Network and Distributed System Security Symposium, February 2-4 2000, San Diego, CA. Object Management Group, “Portable Interceptors Specification” (orbos/99-12-01), http://www.omg.org QuO Toolkit Users’ and Reference Guides. http://www.distsystems.bbn.com/tech/QuO/release/ Ravindan B, Welsh L, Brugggerman C, Shirazi B, Cavanaugh C, “A Resource Mangement Model for Dynamic, Scalable, Dependble, Real-time Systems, IEEE Real-time Technology and Applications Symposium. Schantz R, Zinky J, Karr D, Bakken D, Megquier J, Loyall J. “An Object-level Gateway Supporting Integrated-Property Quality of Service,” Proceedings of The 2nd IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC 99), May 1999. Schantz RE, Loyall JP, Atighetchi M, Pal PP. “Packaging Quality of Service Control Behaviors for Reuse,” Proceedings of the 5th IEEE International Symposium on Objectoriented Real-time distributed Computing (ISORC 2002), April 29 - May 1, 2002, Washington, DC. Sun Microsystems. Java Language Specification. “Dynamic Proxies” jdk 1.3. Vanegas R, Zinky J, Loyall J, Karr D, Schantz R, Bakken D. “QuO's Runtime Support for Quality of Service in Distributed Objects,” Proceedings of Middleware 98, the IFIP International Conference on Distributed Systems Platform and Open Distributed Processing, September 1998. Webber F, Pal P, Schantz R, and Loyall J. “Defense-Enabled Applications,” Proceedings of the Second DARPA Information Survivability Conference and Exposition (DISCEX II), Anaheim, CA, 12-14 June 2001. Zhang L, et al. “RSVP: a New Resource Protocol,” IEEE Network, 7(6), September 1993. Zinky J, Bakken D, Schantz R. “Architectural Support for Quality of Service for CORBA Objects,” Theory and Practice of Object Systems 3(1), 1997. Zinky J, Bakken D, Krishnaswamy V, Ahamad M. “PASS - A Service for Efficient Large Scale Dissemination of Time Varying Data Using CORBA,” IEEE ICDCS 1999.