Performance Techniques for COTS Systems cots ... - Semantic Scholar

2 downloads 60156 Views 344KB Size Report
because the source code for the components involved isn't .... transfers them to an application server (IBM's. Websphere, in ... the component and monitoring its execution .... Table 1. Some approaches and tools for predicting performance for COTS-based systems .... jor open source integrated development envi- ronment.
focus

cots integration

Performance Techniques for COTS Systems Erik Putrycz, National Research Council of Canada Murray Woodside and Xiuping Wu, Carleton University

Learn about a new approach that uses component-based modeling and tracing techniques to manage performance in COTS-based systems throughout the entire life cycle.

OTS components can provide much of the functionality of distributed information systems. These components range from stand-alone elements, such as a Web server or database system, to platform software or an operating system, to embedded functional components, such as a calendar manager or an inventorymanagement JavaBean. System performance (delay and throughput) is determined by all these elements and their configuration parameters

C

and by how they’re deployed on the hosts and networks. If an assembled product performs inadequately, options for improvement are limited because the source code for the components involved isn’t available. So, we must experiment with ■ ■ ■ ■

tuning the performance parameters, trying new deployment options for the components, redesigning the component interactions, or selecting different components.

Adapting measurement techniques for the COTS situation and extrapolating about measurements from predictive models can enhance this trial-and-error process. The COTS approach’s built-in advantage is that the components can be measured before the new sys36

IEEE SOFTWARE

Published by the IEEE Computer Society

tem is designed and then the system can be quickly prototyped. Also, vendors can sometimes provide component properties, as proposed in PECT (Prediction-Enabled Component Technology).1 The question is how to exploit this knowledge. New approaches to modeling and tracing that deal with concurrency can be used together to maximize the performance of COTS systems throughout their life cycle. Models help us predict performance and plan the architecture, deployment, and measurements, while trace analysis helps us diagnose problems in detail.

Our performance analysis process Figure 1 shows the elements of our performance analysis process for COTS components. The artifacts (shown at the top) range from concrete components and a system plan

0740-7459/05/$20.00 © 2005 IEEE

Granularity System

Component System Planned system

A

A B

B Components

c

(Custom)

c

Harness Component

A

Model building

System

Detecting and diagnosing

Vendor

Performance Data data sheet Submodel A

Instrumentation & tracing

Benchmarking & verification

Submodel C

Model builder

System model

Device utilization

Re su lts

Specified limit

Figure 1. Elements of our performance analysis process for COTS components. Artifacts include components, a planned system, and the complete system. The process includes creating a model, detecting and diagnosing problems, and recalibrating the model.

Throughput

Capacity by device saturation

Max

Delay

Layered queuing prediction

Device queuing prediction

Throughput

Capacity by layered queuing Objectives Diagnose issues

Capacity by a queuing model

Eclipse plug-in for analyzing interactions between components (severity levels displayed in shades of red)

Improve integration code Predict

to the complete system. We develop measurements by coupling the system or components with workload generators and instrumentation (shown as the “harness”). Initially, we use the results to qualify components, create submodels, and create an initial predictive model (indicated on the figure’s left-hand side). We use the predictions to identify potential problems and develop tests for them. Later, we use the results for the entire system to verify the performance against the requirements, to identify problems, and to recalibrate the model (the figure’s right-hand side). Once we find a problem and collect traces, we can ap-

Tune performance parameters

Plan deployment and scaling Make test plans

ply trace analysis techniques, which might suggest design changes centered on the execution path. Model analysis might suggest more radical changes, and then we can follow an improvement cycle. Our process emphasizes two new performance technologies for COTS: constructing predictive models from component properties (component-based modeling) and using platform-level tracing for insight into hidden interactions of concurrent components. We’re also particularly concerned with dealing with concurrency in the COTS components—a major feature of several component technologies.

July/August 2005

IEEE SOFTWARE

37

Figure 2. Component architectures and performance parameters: (a) performance components, (b) performance notation, and (c) demand calculations.

Application server Large database operation

Database server

(a)

Small database operation

Offered service (device demands: D = local D* = total) Component (use of logical resources)

IEEE SOFTWARE

7 C DC = 6, DC* = 6ms

(c)

Component

Offered services

Required services

to break down a COTS product into its subcomponents and their interactions. Figure 3 shows some internal detail in a generic Java 2 Enterprise Edition (J2EE) application. A Web server accepts clients’ HTTP requests and transfers them to an application server (IBM’s Websphere, in this example). This consists of an execution engine that executes the transaction business logic in the Enterprise JavaBeans and a presentation engine (a JavaServer Pages engine, in this example). The Websphere server relies on a Java Virtual Machine (JVM) and a database server. The EJBs and the JSP pages represent this system’s application-specific “glue code” or integration code; the rest of the components are COTS. The colors in figure 3 identify classes of software with different difficulties and opportunities related to measurement. Components might contain concurrent threads, and subcomponents might be concurrent processes. We can sometimes gather details about the product structure and subcom-

Using an architectural model helps us understand the roles of the different components in a planned system and their impact on performance. As figure 2 shows, we’ve adapted the UML2 notation for software components to focus on performance concerns. The examples show components as boxes with ports (the squares on the boundary). The lollipop symbol indicates offered services, and the half circle connected to it represents required services. For performance concerns, we’ve redefined the port concept around distinct workloads rather than around groups of functions (see figure 2a). We first cluster the database interfaces and then divide them between two ports—one defined for large transactions and the other for small transactions—because this distinction has performance significance. Figures 2b and 2c introduce performance parameters attached to the two interfaces and the component body. To understand performance, we might need

Web server

B DB = 3, DB* = 3ms

Required service (request count for number of operations required)

Models and architecture for COTS systems

38

2

(b) Port

Figure 3. A generic Java 2 Enterprise Edition application. The Enterprise JavaBeans and JavaServer Pages represent the application-specific “glue code” or integration code in this system; the rest are COTS components.

A DA = 1, DA* = 49ms

Websphere platform JavaServer Pages engine JSP

Java Virtual Machine Custom Enterprise JavaBeans Database server

Execution engine Integration and glue code components

Platform components

Offered services

Application-level (COTS) components

Port

Required services

w w w . c o m p u t e r. o r g / s o f t w a r e

ponents from documentation (the “data sheet” shown in figure 1) and through reverseengineering. With Java-based products, the available list of packages and classes used might reveal some functional subcomponents.

Performance characterization The description of performance characteristics can be shallow or deep. A test or benchmark giving delay for the component in a particular configuration is shallow; a causal model is a deeper description that describes a wider range of configurations.

Benchmarks. We can use benchmarks to select the best COTS components to meet the performance requirements. The difficulty of using them in a COTS-based system is in establishing the components’ context. Models. Models are based on the structural relationships shown in figure 2 and on performance data for resource usage. Performance data consists of three kinds of data. Device demands is the average CPU time for a component’s operation associated with the offered service ports. This data might be scenario-dependent and might require separating the small and large operations on the basis of their device demands (see the large and small database operations with separate interfaces in figure 2a). Other device loadings (such as disk or network loadings) might also be important. Interaction attributes give the number of required service operations that are demanded, per component operation, and whether an interaction is blocking or asynchronous. Interactions are associated with the component’s required service ports. Logical resources, which include threads, buffers, and caches, are associated with a component and are modeled by internal detail. We can obtain values that aren’t available from the vendor by setting up a test harness for the component and monitoring its execution (see figure 1). From these values, we can make preliminary predictions for system performance, for a particular choice of components. We can then use performance data in some kind of predictive model. Table 1 lists several approaches to modeling that are applicable to COTS-based systems. Additional approaches, such as Petri nets and process algebras, are described elsewhere.2 These approaches are inter-

esting from a research viewpoint, but we don’t include them here because the models don’t scale well enough to systems of practical complexity (this might change in the future, of course). Table 1 also summarizes the approaches’ strengths and weaknesses. We can apply some approaches to performance data at the component level, while other approaches are unaware of components. Models are approximate and must be used with care, taking into account the accuracy of the data used to calibrate them. However, many studies have applied and validated the modeling techniques listed in table 1.2–8

Performance prediction

Choosing an approach for performance prediction involves a trade-off between simplicity and capability.

Choosing an approach for performance prediction involves a trade-off between simplicity and capability. Component-based techniques hold the performance data for the components separately and combine them on demand for a given architecture and a given selection of components. This supports rapid comparison of alternative components and architectures and preserves useful data from one use of a component to the next. So we can better leverage the data-gathering effort and can compare different component versions to identify problems that newer versions could introduce. A system-based approach is more work, because it starts from scratch for every configuration, except for small parameter changes.

Device demand. These models are straightforward. They use the total device demands (in time per system operation) found by profiling for each component or by using a calculation based on a labeled component diagram (see figure 2c). Components are labeled by their local demand D in CPU time per interaction (DA for component A) and by derived values of total demand D*A , which includes the descendants of A. Outgoing component connectors are labeled by the interaction frequency, which is the number of invocations made during one execution of the source component. D*A is computed as the sum of the local demand by component A, plus the product of the interaction frequency times the cumulative demands for each component it calls. We can perform the calculation for I/O and other operations as well as for CPU time demand. A demand model then computes each device utilization as throughput  total demand for a combination of system operations. Fig-

July/August 2005

IEEE SOFTWARE

39

Table 1 Some approaches and tools for predicting performance for COTS-based systems Performance model

Component- or system-based?

Device demand

Component

Device queuing

Component

Layered queuing

Component

Curve-fitting or regression

System

Simulation

Component

Description

Evaluation

Tools

This is the simplest prediction model; it predicts device saturation. This is widely used in software performance engineering.2 For COTS-based systems, it involves adding components’ device demands3,4 to derive a global queuing network model. The specifications can be UML specifications3 or execution graphs.5 This is based on software components, their provided and required services, and their performance parameters.6 It captures both hardware and software contention. A componentbased approach to constructing models is described elsewhere.7

Simple but doesn’t predict delay.

Can be computed by hand Queuing Network Analysis Package

Using measurements on a prototype in a particular configuration, we can use a fitted function to extrapolate other, nearby configurations. Ian Gorton and his colleagues described experiments on a prototype Enterprise JavaBean application5 and predicted the performance impact of changes to the EJB platform, the threading level, and other resources. Simulation mimics the system arbitrarily closely. Simonetta Balsamo and Moreno Marzolla derive a simulation model from the architecture design in UML.8

ure 1 includes a graph that shows device utilization versus throughput (see the “model building” box). The straight line represents the device utilization, which can help define an ad hoc capacity limit based on a critical utilization—presumed to be the largest value for adequate performance. Over all devices, the one with the highest utilization value determines the capacity limit. However, demand models can’t predict QoS measures.

Device queuing. QoS measures involving delays require a model that estimates resource contention and queuing—for instance, by using simulation or a queuing model. Figure 1 illustrates the results, showing capacity by both layered and device queuing (again, see the “Model building” box). A basic QM gives the second curve—labeled “device queuing prediction”—and estimates only device con40

IEEE SOFTWARE

w w w . c o m p u t e r. o r g / s o f t w a r e

Models hardware contention, bottlenecks, and device concurrency but not software concurrency limits or resources. Simple form of extended queuing model for simultaneous resource demands and concurrent. software components. Can analyze hardware and software contention and bottlenecks but requires more information about the software structure. Good accuracy for small changes. Difficult to scale to analyze large configurations, because the data doesn’t cover the cases closely enough.

Layered Queuing Network solver, Method of Layers

Can provide more accuracy but needs more details, takes time to develop, and requires heavy computing resources to simulate with accuracy.

CSIM, Opnet, Hyperformix Workbench

SPlus, many statistical packages

tention, which might give an overly optimistic estimate of capacity.

Layered queuing. Our work uses a layered queuing model in which servers include software processes and thread pools, other kinds of logical resources, processors, and other kinds of devices.6 All these servers have queues and offer classes of services. Offered services of a component in figure 2 are modeled as multiple classes of services. The results graph in figure 1 shows how the LQM is more conservative than the QM, because it includes additional resources and resource contention. The cost of this additional information is that the LQM requires more information about the software. Because an LQM has a structure that follows the software structure, model elements representing components can be combined directly in the model. The Component-Based

System

Performance issue

Profiling tools available?

Yes

No Integration code (for example, custom EJBs and JavaServer Pages)

Execution with profiler

Statistical analysis

No

No Application-level traces (for example, Web server logs and database server traces)

Application trading available?

Data collection

Yes Low-level platform trace (for example, JVM execution trace and network logging)

Data collection

Operating-system-level monitoring (for example, Windows NT Perfmom and Solaris DTrace)

Yes Time correlation

No

Platform monitoring available?

Issue located?

Issue located? Merge and inspect

No

Trace analysis Data collection

Time correlation

Figure 4. Strategies

Modeling Language7 defines component submodels (possibly with complex internal structure, as indicated in figure 3) and is used to assemble layered models of components. CBML parameters are associated with model entities, as figure 2b indicates.

can be challenging. Depending on the COTS for detecting and product and the technologies, the diagnosing diagnosing capabilities range from customizable and doc- performance issues. umented tracing to nothing. Figure 4 summarizes the detection and diagnosis methods applicable in COTS-based systems.

Simulation. The agile programming movement promotes development techniques and processes to achieve a high flexibility for projects. A part of the methodology promotes using prototypes and evolving them to track performance problems. Alistair Cockburn suggests investigating performance issues using partial prototypes and simulators.9

Integration code analysis

Detecting and diagnosing problems Once a COTS-based system is deployed and running, locating a performance issue (such as a long delay or high resource usage)

We can analyze integration code (such as EJB and JSP) using standard methods for custom software such as profiling. Profilers help locate hot spots in the code and identify which functions are called most frequently. If the problem is in the integration code, statistical analysis of the most frequently called functions can point out the problem.

Application-level data collection Some COTS components provide capabilities for tracing the components’ internal func-

July/August 2005

IEEE SOFTWARE

41

Our work on trace analysis lets us diagnose problems in COTS-based systems that don’t provide data collection mechanisms.

tions to diagnose and analyze performance. Performance guides in the product’s documentation also help us understand the traces, optimize performance, and locate performance issues. For instance, WebSphere’s Performance Monitoring Infrastructure collects performance data from various subcomponents (such as the J2EE client, the Java Database Connectivity driver, or the session data) through counters attached to each subcomponent. To interpret these counters, the documentation contains a guide, called Monitoring and Performance.

Operating-system-level data collection Operating-system-level data collection is available for most operating systems. Instrumentation built into the kernel monitors the use of low-level resources. The granularity and detail depend on the operating system. Sun Microsystem’s Dynamic Tracing DTrace toolkit10 offers a sophisticated framework to trace low-level activity on Solaris. Providers are loadable kernel modules that communicate with the DTrace kernel module using a welldefined API. For every point of instrumentation, providers call back into the DTrace framework to create a probe. Providers offer access to low-level data such as process information, any system calls, and I/O statistics. The D language lets the users specify arbitrary predicates and actions. Windows (NT4 and later) contains a framework for performance monitoring. Counters are kernel objects that provide statistical data concerning performance usage. The Windows kernel provides counters for monitoring networking, processes, I/O, memory usage, and so forth. In addition to this low-level data, applications and services can provide counters. For instance, the .NET framework adds several counters to monitor the virtual-machine internals (just-in-time compilation, garbage collection, and so on).

Time-based correlation The most common analysis method is timebased trace correlation. The traces of all the components involved in system execution are collected and correlated using a single property (which is usually time). This method is very effective for application-level traces, which already contain all the significant events. In such a case, we can identify performance issues by understanding how all the components are involved. 42

IEEE SOFTWARE

w w w . c o m p u t e r. o r g / s o f t w a r e

Tools such as the Windows performance monitor or IBM’s Tivoli tools rely on this technique. The Windows performance monitor displays real-time graphs of different performance counters. By correlating all the performance counters, an administrator can understand if low-level resource usage or the system configuration is the cause. IBM’s Tivoli Enterprise Console uses event correlation and filtering techniques with rulesets for each product to diagnose performance problems.

Platform trace analysis for COTS-based systems Using only platform-level traces, our work on trace analysis lets us diagnose problems in COTS-based systems that don’t provide data collection mechanisms.11 (By platform, we mean any runtime platform, such as a virtual machine on which the COTS products are running.) Most virtual machines (for example, .NET or Java) offer an interface for monitoring execution and can trace events such as method entry and exit. For example, the JVM specifications include a mechanism for plugging in profilers called the Java Virtual Machine Profiling Interface (JVMPI), which can trace the following set of events without special compilation or instrumentation: ■ ■ ■ ■ ■ ■

method events (method enter and exit); object allocation, move, and free; garbage collection start and finish; thread start and end; class load and unload; and JVM initialization and shutdown.

Also, the JVMPI can generate heap and objects dumps. Our method first collects data in a lowlevel execution trace from the virtual machine. We could also collect this low-level trace using instrumentation at the boundaries between interacting components. In this trace, the data collection tools record the methods’ start time, end time, thread context, class information, and caller identifier. The low-level traces contain usually enough information to “guess” which method belongs to which component (for instance, using the package information in Java). To reduce the trace size, filters on method, calls, or classes capture only the relevant COTS components. The low-level trace’s thread context and ex-

em3 Thread 1 execution trace

em4

em5 Time

em3 calls em4 and em5 Thread 2 execution trace

em1

em2 Time

Figure 5. Thread and cross-thread execution trace: constructing the execution trace using the low-level data collected. (emn denotes one execution of the method mn.)

Thread 1 and Thread 2 traces are merged

em3 Cross-thread execution trace

em1

em2

em4

em5 Time

em3 calls em4 and em5 emn Execution of method emn Start

em3 is considered the cause of em2, em4, and em5

End

ecution context information let us reconstruct an execution trace containing the callers and callee of each method in the selected components. The upper part of figure 5 shows how to construct the execution trace using the lowlevel data collected: emn denotes one execution of the method mn. On thread 1, during the execution of m3, methods m4 and m5 were executed. m3 is consequently considered the caller of m4 and m5. The execution trace for a thread is a list of method execution couples (caller, callee). In this case, on thread 1, it contains ((em3, em4), (em3, em5)). In multithreaded environments, resource accesses are often delegated to another thread. If access to resources on one thread causes a performance issue in another thread, then an execution trace per thread wouldn’t link the events in one thread to the resource accesses. To solve that problem, a cross-thread execution trace (represented in the lowest part of figure 5) is built by merging the threads and correlating all events on the same timeline. On thread 1, methods em4 and em5 are executed during method m3. On thread 2, methods em1 and em2 are executed (for instance, the database calls). When all events are considered on the same thread, execution of em3 is the cause of em2, em4 and em5. This attempt to extract the interactions between the threads aims to provide a better understanding of the situation. The execution trace is constructed for any pair of components (Cn, Cm), where Cn is usually the integration code and Cm another

COTS component. Callers considered are from Cn and callees from Cm. For instance, if the integration code (located in component C1) is using components C2 and C3 then execution traces are calculated for (C1, C2) and (C1, C3). Depending on the pair of components selected, we choose either the execution trace per thread or the cross-thread execution trace. Using this execution trace, we can calculate statistics on COTS components usage. This step lets us identify the system components responsible for performance issues. Then, we can analyze the statistics to offer a high-level way to interpret results with a simple numeric scale (called severity levels). The severity levels are attached to methods in the integration code to point out which methods cause the most accesses in the COTS components responsible for performance issues. Our prototype can display the results in the source code in Eclipse, a major open source integrated development environment. (See figure 1, which highlights in red severity levels in the source code.)

Aggressively managing performance issues Combining modeling and measurement techniques not only warns of potential problems but also leads to possible solutions. We can address performance issues at each development stage. In early planning, we assemble a model to represent the intended system, based on component submodels created from prior knowledge and testing. We can integrate perform-

July/August 2005

IEEE SOFTWARE

43

About the Authors Erik Putrycz is a research associate at the Software Engineering Group of the National Research Council of Canada. He is currently doing research on integration and interoperability in COTS-based systems with a focus on performance issues. He received his PhD from Institut National des Telecommunications (Evry, France), where he did research on middleware, load balancing, and resource management for large-scale networks. Contact him at erik.putrycz@ nrc-cnrc.gc.ca.

Murray Woodside taught at Carleton University in Ottawa until his recent retirement and now continues research and teaching with an appointment as Distinguished Research Professor. His research interests include all aspects of software performance and dependability. Much of this work is based on active servers (also known as layered queuing), which he invented and has applied to various kinds of distributed systems. He received his PhD in control engineering from Cambridge University, England. He’s an associate editor of Performance Evaluation. Contact him at [email protected].

Xiuping Wu is a PhD student in the Department of Systems and Computer Engineering at Carleton University. Her main research interest is software performance prediction—in particular, she’s focusing on prediction techniques for component-based software systems and on software architectures and performance testing. Contact her at the Dept. of Systems and Computer Eng., Carleton Univ., 1125 Colonel By Dr., Ottawa, Canada K1S 5B6; [email protected].

ance requirements into the model. If the model’s predictions show inadequate performance, we can reconstruct it (following the approaches introduced earlier for improving the final product). The difference is that, using models, we can do this earlier and faster and can explore numerous possible solutions. During the middle design stages, we can interpret measurements on a single component (using the model) to estimate its effect on the entire system. We can diagnose and solve problems identified in a given component using trace analysis (as described earlier), before integration begins. In late stages, once the system is deployed and running, we can extend the results of system testing (again using the model) to analyze scalability limits for a scale beyond the laboratory’s scope. Our research on trace analysis brings new solutions for diagnosing and locating the components responsible for performance issues. In addition to the diagnosis, our analysis links performance issues to the glue code and helps developers react. Thus, we can identify, diagnose, and resolve performance issues at every stage and can refine the model and results in a spiral model. The essence of this approach is to combine better data with models that can leverage its usefulness. This exploits a key characteris44

IEEE SOFTWARE

w w w . c o m p u t e r. o r g / s o f t w a r e

tic of COTS systems, which is that working components that can be measured and modeled are available before the project begins.

C

OTS-based software performance demands more powerful investigative methods than custom software. This performance is particularly important when components include internal concurrency, as is the case in J2EE application servers. We need component-based performance modeling to drive system planning, using layered modeling when considering concurrency, and we need high-level traces (application-level, if possible, supported by platform traces) to capture measurements related to these structures and diagnose performance issues.

References 1. S.A. Hissam et al., “Packaging Predictable Assembly,” Proc. 1st Int’l IFIP/ACM Working Conf. Component Deployment, LNCS 2370, Springer-Verlag, 2002, pp. 108–124. 2. S. Balsamo et al., “Model-Based Performance Prediction in Software Development: A Survey,” IEEE Trans. Software Eng., vol. 30, no. 5, 2004, pp. 295–310. 3. A. Bertolino and R. Mirandola, “Towards ComponentBased Software Performance Engineering,” Proc. 6th Int’l Conf. Software Eng. Workshop Component-Based Software Eng., Carnegie Mellon Univ., 2003; www.csse. monash.edu.au/~hws/cgi-bin/CBSE6/Proceedings/ papersfinal/p8.pdf. 4. H. Gomaa and D. Menasce, “Design and Performance Modeling of Component Interconnection Patterns for Distributed Software Architectures,” Proc. 2nd Int’l Workshop on Software and Performance (WOSP 00), ACM Press, 2000, pp. 117–126. 5. S. Chen et al., “Performance Prediction of COTS Component-Based Enterprise Applications,” Proc. 5th Int’l Conf. Software Eng. Workshop Component-Based Software Eng.: Benchmarks for Predictable Assembly, Carnegie Mellon Univ., 2002; www.sei.cmu.edu/pacc/ CBSE5/liu-cbse5-29.pdf. 6. R.G. Franks et al., “Performance Analysis of Distributed Server Systems,” Proc. 6th Int’l Conf. Software Quality, Am. Soc. for Quality Control Software Division, 1996, pp. 15–26. 7. X. Wu and M. Woodside, “Performance Modeling from Software Components,” Proc. 4th Int’l Workshop Software and Performance (WOSP 04), ACM Press, 2004, pp. 290–301. 8. S. Balsamo and M. Marzolla, “A Simulation-Based Approach to Software Performance Modeling,” ACM SIGSOFT Software Eng. Notes, vol. 28, no. 5, 2003, pp. 363–366. 9. A. Cockburn, “Learning from Agile Software Development—Part One,” CrossTalk: J. Defense Software Eng., vol. 15, no. 10, 2002, pp. 10–14. 10. B.M. Cantrill, M.W. Shapiro, and A.H. Leventhal, “Dynamic Instrumentation of Production Systems,” Proc. 2nd Swiss Unix Conf. (SUCON 04), USENIX, 2004, pp. 1–6. 11. E. Putrycz, “Using Trace Analysis for Improving Performance in COTS Systems,” Proc. 2004 Centre for Advanced Studies Conf. Collaborative Research, IBM Press, 2004, pp. 68–80.