The Why and the How of Building Performance into Application ...

3 downloads 78 Views 106KB Size Report
The IT organization is unaware of the cause of the degradation and so has to bring ... The Survey Respondents were asked “When your IT organization is in the ...
The Why and the How of Building Performance into Application Development

Executive Summary The traditional approach to application performance management whereby IT organizations focus on managing individual technology domains is fundamentally flawed. The manifestations of the flaws in that approach include the fact that in the majority of cases when the performance of an application begins to degrade:  The degradation is noticed first by the end user and not by the IT organization.  The IT organization is unaware of the cause of the degradation and so has to bring together a team of people to identify the cause and solve the problem. In order to more effectively respond to the existing challenges that are associated with managing application performance, as well as the challenges brought about by the increasing adoption of virtualization, cloud computing and mobility, IT organizations must:  Create linkages between their application development organization and their infrastructure operations organization.  Perform effective testing of application performance both during application development and prior to implementing a significant change to the IT infrastructure. This paper will include the story of a large financial institution whose IT organization used application performance management tools to build creditability with the business partners who fund IT. As will be described, those tools have also enabled those business partners to make informed decisions that impact the functionality and performance of an application during the development process. Representatives of The Financial Institution’s IT organization stated that the most important application performance management functionality that they have is their ability to:  Group key metrics and present them on an easy to understand dashboard  See transactions across multiple tiers as well as across downstream processes  Closely mirror the production environment The Financial Institution conducted a study to quantify the ROI that is associated with using application performance management tools in a pre-production environment. That study showed that The Financial Institution’s use of application performance management tools in a preproduction environment enabled them to:  Find defects that would never have been found  Provide recommendations for quality improvement  Significantly reduce the amount of time and resources that are allocated to firefighting The ROI study also showed that the payback period for the investment they made was roughly 6 months.

2

Introduction On an ever-increasing basis, the typical organization’s core business processes rely on a combination of applications and the supporting IT infrastructure. One of the implications of that situation is that if those applications, or the infrastructure that support them, are not performing well, then neither are those core business processes. In order to ensure acceptable application performance, and hence help to ensure acceptable business performance, IT organizations have traditionally implemented an approach to service management whereby they focus primarily on managing individual technology domains with the assumption that if each domain is performing well, then the application is performing well. That approach is not longer acceptable. The issue isn’t that managing technology domains is bad. It isn’t. Managing technology domains has been and will continue to be one of the key roles of the IT organization. However, as described in Application Performance Management: New Challenges Demand a New Approach1, the approach of managing individual technology domains and assuming that if each domain is performing well then the application is performing well, is fundamentally flawed. One manifestation of the flaws in that approach is that in the majority of cases when the performance of an application begins to degrade, the degradation is noticed first by the end user and not by the IT group. One goal of this white paper is to describe an innovative process that drives efficiency in the processes that IT organizations use to develop and manage applications. This process enables IT organizations to reduce development time and deliver higher quality applications. Another goal of this white paper is to describe how a new approach to application performance management can enable IT organizations to better manage application performance. To help achieve those goals, this white paper will incorporate the results of a survey that was distributed in April 2012 to the subscribers of Webtorials. Throughout this white paper, the 190 IT professionals who completed that survey will be referred to as The Survey Respondents. In addition, this white paper will incorporate the results of an interview that was conducted with a senior performance engineer and a manager of performance engineering at a major financial institution and it will also include the results of a ROI study that the financial institution conducted. The study focused on quantifying the ROI associated with their use of application performance management tools in a pre-production environment. Throughout this white paper, the company that was interviewed will be referred to as The Financial Institution and the interviewees will be referred to respectively as The Performance Engineer and The Performance Manager.

The Current Environment The Survey Respondents were asked “When your IT organization is in the process of either developing or acquiring an application, how much attention does it pay to how well that application will perform over the WAN?” Their answers are shown in Figure 1.

1

http://www.webtorials.com/content/2012/09/apm-new-challenges-demand-a-new-approach.html

3

The data in Figure 1 shows that almost three quarters of all IT organizations place at most moderate emphasis on performance while either developing or acquiring an application. The lack of emphasis on an application's performance over the WAN during application development often results in the development and implementation of applications that run poorly once they are placed into production. One of the reasons for that phenomenon is that due to factors such as chatty protocols, an application can run well over a high-speed, low latency LAN in a development environment but run poorly over a relatively low-speed, high latency WAN in a production environment. The Survey Respondents were given a set of outcomes and they were asked to indicate the type of outcome that typically occurs if one or more of their company’s business critical applications are performing badly. They were allowed to indicate multiple outcomes. The outcomes that were mentioned the most often are shown in Table 1. Impact The Company Loses Revenue IT Teams Are Pulled Together Company Loses Customers CIO Gets Pressure from his/her Boss Harder for IT to get Funding

Percentage 62.0% 59.8% 45.1% 45.1% 44.6%

Table 1: Impact of Poor Performance One conclusion that can be drawn from the data in Table 1 is that if a business critical application is performing poorly, it has a very significant business impact as the company loses both revenues and customers. The fact that in the majority of instances in which a business critical application is performing badly that IT teams are pulled together is further evidence that the current approach to application performance management is not effective. That follows because if the current approach were effective, IT organizations would be able to rapidly identify the cause of degraded application performance and wouldn’t need to pull teams together to do so.

4

The approach of regularly pulling IT teams together to troubleshoot problems has a number of negative side affects. One of these side affects is that this approach increases the amount of time and maximizes the amount of resources that it takes to resolve troubles. Another negative side affect is that this approach tends to create an adversarial environment. For example, in the past it was somewhat common for the performance testing group within The Financial Organization to lack insight into the root cause of degraded application performance and as a result, blame was often placed on their test software, test scripts or even the load test itself. One of the group’s performance engineers described this approach to trouble shooting as “Lots of blaming, unless detailed data was given in regards to the problem area (e.g., DBA, network, application server). Most people were very defensive unless empirical data was given.”

Complicating Factors It is extremely difficult to both manage the existing services and to provide innovative solutions in a traditional IT environment, such as the one shown in Figure 2. This environment is characterized by branch office employees using a laptop or desktop computer and a wired network to access an application that resides in physical servers in one of the company’s data centers.

Figure 2: Traditional IT Environment However, as described in detail in Application Performance Management: New Challenges Demand a New Approach, there are several key trends, including the trends described below, which are making it dramatically more difficult for IT organizations to manage the existing IT services, let alone provide innovative solutions.  Server Virtualization The traditional approach to management must change to where it is performed on a virtual machine (VM)-by-VM basis. Failure to do so means that IT organizations can’t perform key application performance management tasks such as baselining the performance of an application that runs in a VM.  Cloud Computing One of the new challenges associated with the adoption of public and/or hybrid cloud computing solutions is that IT organizations are typically held responsible for the performance of these solutions even though in most cases they don’t have the same access to

5

the enabling IT infrastructure that they would have if the application was entirely intracompany.  Mobility The new generation of mobile devices (e.g., smartphone and tables) doesn’t have the same management functionality as PCs do. As a result, IT organizations typically lose the ability to manage the performance of applications that are accessed using these devices.

Performance Testing Within The Financial Institution In order to avoid both the negative impact of poor application performance that were highlighted in Table 1 and the previously discussed side affects to the current approach to troubleshooting, IT organizations must make two major changes. One of those major changes is that IT organizations need to create linkages between their application development organization and their infrastructure operations organization. These linkages are often referred to as DevOps and The Survey Respondents were asked to indicate the degree to which their IT organization had already established these linkages. There responses are shown in Table 2.

Little if any attempt to establish these linkages We don’t have formal DevOps teams, but we have established goals and processes to link the two organizations We have formal DevOps teams Table 2: Status of DevOps Deployment

Percentage of Respondents 13.9% 32.2%

53.9%

The data in Table 2 indicates that the majority of IT organizations recognize the importance of creating linkages between their application development organization and their infrastructure operations organization and have taken steps to create those linkages. The second major change that IT organizations need to make is that they need to do effective testing of application performance both during application development and before implementing a significant change to the IT infrastructure. One goal of this testing is to identify how changes in the infrastructure drive the need for more capacity and/or impact application performance. Another goal of this testing is to eliminate as many potential sources of degraded application performance as possible in the development and test phrase and hence minimize the number of instances of degraded performance that occur once the application is in production. The Financial Institution’s performance test team was founded in 1998 and at the time was comprised of two people. Today there are fifty members of the team and the focus of the team is primarily on identifying potential performance problems during application development. While the performance test team does not directly support ongoing operations, they do have a role to play. As part of that role, the performance test team assists with troubleshooting and the triage process of finding the root cause of degraded application performance.

6

The Performance Engineer stated that over time the use of an effective application performance management tool is one of the factors that has enabled the team to build creditability, both internally with business partners who fund IT, and externally with the IT vendors that the institution uses. Another factor that has helped them to build creditability is that they continually gather data on what is happening in the production network and the response times that are being experienced. This data helps the team to validate and/or modify their testing. The Performance Engineer pointed out that there is a lot of focus within The Financial Institution on setting and meeting SLAs. He gave the example of where the SLA of an application that was recently under development called for a five second response time, but that the pre-production testing they did showed that there would be a nine second response time. This data was brought to their business partners because “It is the business partners that have to determine how they want to spend their funds. They could require code changes or they could require broader architectural changes.” The Performance Engineer went on to say that the pre-production test data that they bring to their business partners helps the business partners decide how much of their funds they will spend on the functionality of a new application and how much they will spend on the performance of that application. The Performance Manager emphasized the importance of starting to do performance testing early in the application development process and to continue with the testing throughout the application development process. She added that “We are talking about millions of dollars to develop these applications and if there is a performance issue on a key transaction, our business partners want to know about that early on so that they can make the decision whether or not to shift additional resources to the project.” According to The Performance Manager, their business partners are technologically savvy and that many of them “Want to see hard core test data and that they really like that data.” In terms of their current test environment, the two interviewees agreed that the most important functionality that they have is their ability to:    

Group key metrics and present them on an easy to understand dashboard See transactions across multiple tiers as well as across downstream processes Perform transaction traces Closely mirror the production environment

As previously noted, The Financial Institution conducted a study to quantify the ROI that is associated with using application performance management tools in a pre-production environment. One of the results of that study is that The Financial Institution’s use of application performance management tools in a pre-production environment enabled them to:    

Reduce the impact on planned projects Find defects that would never have been found Provide recommendations for quality improvement Significantly reduce the amount of time and resources that are allocated to firefighting

The study also quantified some of the key ROI metrics that The Financial Institution achieved across a three-year period. Those metrics are summarized in Table 3. 7

Best Case 393% 6 Months 280%

Simple ROI Payback (months) Internal Rate of Return Table 3: Summary of the ROI Analysis

Most Likely 304% 6 Months 225%

Worst Case 197% 8 Months 156%

The data in Table 3 highlights the significant savings that are associated with using effective application performance management tools in a pre-production environment. When asked about their future direction relative to testing application performance, The Performance Manager pointed out that their test environment will have to become even more sophisticated in part because their production environment is already complex and is getting more so. She added that even though they have already demonstrated significant savings, “The push to do things faster and cheaper is enormous and will become even more intense over the next year or two. It is like being in the grip of a vice.” While it is possible to test application performance in a traditional lab environment with live systems, this approach tends to be costly. A notably more effective and efficient approach that companies such as The Financial Institution can take on a going forward basis is to implement a virtualized test environment in which the quality assurance organization clones a target system’s dynamic behavior, performance and data so that the need for live systems is either eliminated or at least notably reduced.

Summary and Call to Action The typical application performance management environment today has three key characteristics. Those characteristics are that in the majority of instances: 1. When the performance of an application has degraded, the degradation is noticed first by the end user and not by the IT organization. 2. When the performance of a business critical application has degraded, root cause analysis takes an elongated period of time and it also takes the involvement of multiple groups within an IT organization. 3. When the performance of a business critical application has degraded it has a very significant negative impact on both the business and the IT organization. If IT organizations don’t overcome the issues listed above, IT organizations will become increasingly irrelevant. In order to overcome those issues, IT organizations need to adopt a new approach to application performance management. This new approach must focus not on technology domains, but on business transactions. In order for this new approach to enable IT organizations to achieve their goal of being able to rapidly identify the root cause of degraded application performance, the approach must include the ability for end-to-end monitoring of every end-user transaction and the ability to perform deep-dive problem diagnostics. Since a 8

cloud service provider will process a growing number of business transactions, having the ability to do synthetic transaction monitoring of cloud service providers is a key component of the new approach to application performance management. Because of the trends described in this white paper, as part of this new approach, management data must be gathered across both physical and virtual infrastructure components and from both within the enterprise as well as from network and cloud service providers. While implementing a more effective approach to application performance management will add considerably to the value that an IT organization provides to the company’s business unit managers, IT organizations increase their value even more by being able to test for application performance both during application development as well as prior to implementing a significant change to the IT infrastructure such as mobility, server virtualization or cloud computing. This follows because having this ability enables IT organizations to both eliminate most of the potential sources of degraded application performance and it also positions the IT organization to be regarded as the trusted advisor to the company’s business unit managers. In order for this testing to be affordable and to not impact the application development process, IT organizations need to implement a virtualized test environment in which the quality assurance organization clones a target system’s dynamic behavior, performance and data so that the need for live systems is either eliminated or at least notably reduced. The greatest value add, however, comes if the IT organization uses the same tools to evaluate the performance of an application during development or prior to implementing a significant change as it does to manage the ongoing performance of that application. This follows because if the application development teams and the operations team have a common view of the application and the factors that cause performance degradation, this dramatically reduces the amount of time and resources that it takes to identify the root cause of degraded application performance and this approach also dramatically improves the perceived quality of the application.

9