Risk Filtering, Ranking, and Management Framework Using ...

5 downloads 13173 Views 993KB Size Report
information gained during application to refine the scenario filtering and decision ... 1 Center for Risk Management of Engineering Systems, Univer- .... method this diagram takes the form of a master ..... 3.1 Computer Information Systems (CIS).
Risk Analysis, Vol. 22, No. 2, 2002

Risk Filtering, Ranking, and Management Framework Using Hierarchical Holographic Modeling Yacov Y. Haimes,1 Stan Kaplan,1 and James H. Lambert1

This paper contributes a methodological framework to identify, prioritize, assess, and manage risk scenarios of a large-scale system. Qualitative screening of scenarios and classes of scenarios is appropriate initially, while quantitative assessments may be applied once the set of all scenarios (hundreds) has been prioritized in several phases. The eight-phase methodology is described in detail and is applied to operations other than war. The eight phases are as follows: Phase I, Scenario Identification—A hierarchical holographic model (HHM) is developed to describe the system’s ‘‘as planned’’ or ‘‘success’’ scenario. Phase II, Scenario Filtering—The risk scenarios identified in Phase I are filtered according to the responsibilities and interests of the current system user. Phase III, Bi-Criteria Filtering and Ranking. Phase IV, Multi-Criteria Evaluation. Phase V, Quantitative Ranking—We continue to filter and rank scenarios based on quantitative and qualitative matrix scales of likelihood and consequence; and ordinal response to system resiliency, robustness, redundancy. Phase VI, Risk Management is performed, involving identification of management options for dealing with the filtered scenarios, and estimating the cost, performance benefits, and risk reduction of each. Phase VII, Safeguarding Against Missing Critical Items—We examine the performance of the options selected in Phase VI against the scenarios previously filtered out during Phases II to V. Phase VIII, Operational Feedback—We use the experience and information gained during application to refine the scenario filtering and decision processes in earlier phases. These eight phases reflect a philosophical approach rather than a mechanical methodology. In this philosophy, the filtering and ranking of discrete scenarios is viewed as a precursor to, rather than a substitute for, consideration of the totality of all risk scenarios. KEY WORDS: Risk filtering; risk assessment; risk management; hierarchical holographic modeling

1. INTRODUCTION The need for such ranking arises in a variety of situations. For example: thousands of military and civilian sites have been identified as contaminated with toxic substances; myriad risk scenarios are commonly identified during the development of software-intensive engineering systems; and thousands of mechanical and electronic components of the Space Shuttle are placed on a critical item list (CIL) in an effort to reveal significant contributions to program risk. In all such risk identification procedures we must then prioritize a large number

If we adopt the definition of risk as a ‘‘set of triplets’’ (Kaplan and Garrick 1981), then it is clear that the first and most important step in a quantitative risk analysis (QRA) is identifying the set of risk scenarios, Si . If the number of such scenarios is large, then the second step must be to filter and rank the scenarios according to their importance, as determined by their likelihood and consequence. 1

Center for Risk Management of Engineering Systems, University of Virginia.

383

0272-4332/02/0400-0383$22.00/1  2002 Society for Risk Analysis

384 of risk scenarios according to their individual contributions to the overall system risk. A dependable and efficient ranking and filtering of identified risk elements can be an important aid toward systematic risk control and reduction. Infrastructure operation and protection highlights the challenges to risk filtering, ranking, and management in large-scale systems. Infrastructures that are becoming increasingly vulnerable to natural and willful hazards are our manmade engineered systems; these include telecommunications, electric power, gas and oil, transportation, water-treatment plants, water-distribution networks, dams, and levees. Fundamentally, such systems have a large number of components and subsystems. Most waterdistribution systems, for example, must be addressed within a framework of large-scale systems, where a hierarchy of institutional and organizational decision-making structures (e.g., federal, state, county, and city) is often involved in their management (Haimes et al. 1997). Coupling exists among the subsystems (e.g., the overall budget constraint imposed on the overall system), and this further complicates their management. A better understanding of the interrelationship among natural, willful, and accidental hazards is a logical step in helping to improve the protection of critical national infrastructures. Such efforts should build on the experience gained over the years from the recovery and survival of infrastructures assailed by natural and human hazards. Furthermore, it is imperative to model critical infrastructures as dynamic systems in which current decisions have impacts on future consequences and options. Within the activity known as total risk management of a system (Haimes 1991), the term risk assessment means identifying the ‘‘risk scenarios,’’ i.e., determining what can go wrong in the system and all the associated consequences and likelihoods. The next steps are to generate mitigation options, evaluate each in terms of its cost, benefit, and risk tradeoffs, and then decide which options to implement and in what order. Filtering and ranking aids this decision process by focusing attention on those scenarios that contribute the most to the risk. This article presents a methodological framework to identify, prioritize, assess, and manage scenarios of risk to a large-scale system from multiple overlapping perspectives. The organization of the article is as follows. After reviewing earlier efforts in risk filtering and ranking, we discuss hierarchical holographic modeling as a method for

Haimes, Kaplan, and Lambert identification of risk scenarios. Next we describe the guiding principles and the eight phases of the developed methodological framework. This is followed by an example applying the framework to a mission in support of an operation other than war (OOTW). Finally, we offer conclusions and opportunities for future work. 2. PAST EFFORTS IN RISK FILTERING AND RANKING Most real systems are exposed to numerous sources of risk. Over the last two decades, the problem of ranking and prioritizing these sources has challenged not only decisionmakers, but the risk analysis community as well. Sokal (1974) discusses classification principles and procedures that create a distinction between two methods: monothetic and polythetic. The monothetic category establishes classes that differ by at least one property that is uniform among members of each class, whereas the polythetic classification groups individuals/objects that share a large number of traits, but do not agree necessarily on any one trait. Webler et al. (1995) outline a risk ranking methodology through an extensive survey example dealing with an application of sewage sludge on New Jersey farmland. Working with expert and lay community groups, two perceptions of risk are developed and categorized, and weights are used to balance the concerns of the two groups. They demonstrate how discussion-oriented approaches to risk ranking can supplement current methodological approaches, and present a taxonomy that addresses the substantive need for public discussion about risk. Morgan et al. (1999, 2000) propose a ranking methodology designed for use by federal risk management agencies, calling for interagency taskforces to define and categorize the risks to be ranked. The taskforces would identify the criteria that all agencies should use in their evaluations. The ranking would be done by four groups: federal risk managers drawn from inside and outside the concerned agency, lay people selected somewhat randomly, a group of state risk managers, and a group of local risk managers. Each ranking group would follow two different procedures: (1) a reductionist and analytic approach and (2) a holistic and impressionistic approach. The results would then be combined to refine a better ranking. The four groups would meet together to discuss their findings. In a most recent contribution in this area, Categorizing Risks for Risk

Risk Filtering, Ranking, and Management Framework Ranking, Morgan et al. (2000) discuss the problems inherent in grouping a large number of risk scenarios into easily managed categories, and argue that such risk categories must be evaluated with respect to a set of criteria. This is particularly important when hard choices must be made in comparing and ranking thousands of specific risks. The ultimate risk characterization should be logically consistent, administratively compatible, equitable, and compatible with cognitive constraints and biases. Baron et al. (2000) conducted several extensive surveys of experts and nonexperts in risk analysis to ascertain their priorities as to personal and government action for risk reduction, taking into account the severity of the risk, the number of people affected, worry, and probabilities for hazards to self and others. A major finding of these surveys ‘‘is that concern for action, both personal and government, is strongly related to worry. Worry, in turn, is affected mainly by beliefs about probability.’’ A risk ranking and filtering (RRF) methodology was developed for the purpose of prioritizing the results of failure modes and effects analysis (FMEAs) (CRMES 1991; Haimes 1998). This risk prioritization methodology considers multiple quantitative factors, such as reliability estimates, as well as qualitative factors, such as expert rankings of component criticality. 3. HIERARCHICAL HOLOGRAPHIC MODELING (HHM) It is important to improve our understanding of the intricate interdependencies of our critical infrastructures. Therefore, any methodology must be comprehensive and holistic, addressing the hierarchical institutional, organizational, managerial, and functional decision-making structures, in conjunction with other determining factors. Since many organizational as well as technology-based systems are hierarchical in nature, the risk management of such systems is driven by this reality and must be responsive to it. The risks associated with each subsystem within the hierarchical structure contribute to and ultimately determine the risks to the overall system. The distribution of risks between the subsystems often plays a dominant role in the allocation of resources. The aim is to achieve a level of risk that is deemed acceptable in the judgmental decision-making process taking into consideration the tradeoffs among all the costs, benefits, and risks.

385 Hierarchical holographic modeling has been extensively and successfully used for identifying the risk scenarios in numerous projects (Haimes 1981, 1998; Lambert et al. 2001). The HHM framework was developed because it is impractical to represent within a single model all the important and critical aspects of complex systems. HHM offers multiple visions and perspectives, which add strength to a risk analysis. It has been extensively and successfully deployed to study risks for government agencies such as the President’s Commission on Critical Infrastructure Protection (PCCIP), the FBI, NASA, the Virginia Department of Transportation (VDOT), and the National Ground Intelligence Center, among others. The HHM methodology/philosophy is grounded on the premise that in the process of modeling large-scale and complex systems, more than one mathematical or conceptual model is likely to emerge. Each of these models may adopt a specific point of view, yet all may be regarded as acceptable representations of the infrastructure system. Through HHM, multiple models can be developed and coordinated to capture the essence of the many dimensions, visions, and perspectives of infrastructure systems. Perhaps one of the most valuable and critical aspects of hierarchical holographic modeling is its ability to facilitate the evaluation of the subsystem risks and their corresponding contributions to the risks in the total system. In the planning, design, or operational mode, the ability to model and quantify the risks contributed by each subsystem markedly facilitates identifying, quantifying, and evaluating risk. In particular, HHM has the ability to model the intricate relationships among the various subsystems and to account for all relevant and important elements of risk and uncertainty. This makes for a more tractable modeling process and results in a more representative and encompassing risk assessment process. As pointed out by Kaplan et al. (2001), HHM can be regarded as a general method for identifying the set of risk scenarios. It has turned out to be particularly useful in modeling large-scale, complex, and hierarchical systems such as defense and civilian infrastructure systems. To understand HHM in this way, we first remind ourselves of the principle that the process of identifying the risk scenarios for a system of any kind should begin by laying out a diagram that represents the ‘‘success,’’ or ‘‘as planned,’’ scenario of the system. In the HHM method this diagram takes the form of a master

386 chart showing different ‘‘perspectives’’ on the system requirements (for an example, see Fig. 1). Perspectives are portrayed by columns in the chart, each with a head topic. In Fig. 1, head topics include technological, organizational, legal time-horizon, user-demands, and socioeconomic. Each perspective in the chart is then broken down into boxes or subtopics. Each subtopic box can then be thought of as representing a set of ‘‘success criteria,’’ i.e., actions or results that are supposed to occur as part of the definition of the system’s ‘‘success.’’ Consider now the set of such criteria represented by the jth box in the ith perspective. For each such box we can then generate a set of risk scenarios by asking: ‘‘What can go wrong with respect to this class of success criteria?’’ i.e., ‘‘How could it happen that we would fail to achieve this set of success criteria?’’ (More pointedly, if we wanted to identify or anticipate terrorism-type scenarios, we might ask: ‘‘If I wanted to make something go wrong with respect to this class of success criteria, how could I do it?’’ (Kaplan et al. 1999)). By answering these questions we generate a set of risk scenarios associated with the jth subtopic box of the ith perspective, and it is now natural to think of this box as a ‘‘source of risk.’’ The union of these sets of risk scenarios, over all the boxes, should now yield a complete set of risk scenarios for the system or operation as a whole. Taking the union only over the boxes in one perspective would typically yield a subset—an approximation—of the complete set of risk scenarios. Similarly, the union of the sets of success criteria corresponding to one perspective yields a subset—an approximation—to the total set of success criteria of the system as a whole. No one perspective, typically, is adequate on its own to consider the welfare of all current and future stakeholders. Multiple perspectives of success are useful for developing an inclusive set of answers to ‘‘What can go wrong?’’ The nature and capability of HHM is thus to identify a comprehensive, therefore large, set of risk scenarios. It does this by presenting multiple, complementary perspectives of the success scenario requirements. To deal with this large set we need a systematic process that filters and ranks these identified scenarios so that we can prioritize risk mitigation activities. The first purpose of this article is to assemble and discuss a number of published approaches toward such a systematic process.

Haimes, Kaplan, and Lambert

4. RISK FILTERING, RANKING, AND MANAGEMENT (RFRM): A METHODOLOGICAL FRAMEWORK

4.1. GUIDING PRINCIPLES It is constructive to identify again the two basic structural components of HHM. First are the head topics, which constitute the major visions, concepts, and perspectives of success. Second are the subtopics, which provide a more detailed classification of requirements. Each such requirement class corresponds to a class of risk scenarios, namely, those that impact upon that requirement. In this sense, each class of requirements is also considered as a ‘‘source of risk.’’ Thus, by its nature and construction, the HHM methodology generates a comprehensive set of sources of risk, i.e., categories of risk scenarios, commonly in the order of hundreds of entries (Haimes 1998). Consequently, there is a need to discriminate among these sources as to the likelihood and severity of their consequences, and to do so systematically on the basis of principled criteria and sound premises. For this purpose, the proposed methodological framework for risk filtering and ranking is based on the following major considerations: • It is often impractical (e.g., due to time and resource constraints) to apply quantitative risk analysis to hundreds of sources of risk. In such cases qualitative risk analysis may be adequate for decision purposes under certain conditions. • All sources of evidence should be harnessed in the filtering and ranking process to assess the significance of the risk sources. Such evidence includes common sense, professional experience, expert knowledge, and statistical data. • Six basic questions characterize the process of risk assessment and management and serve as the compass for the methodological approach. For the risk assessment process, there are three questions (Kaplan and Garrick 1981): • What can go wrong? • What is the likelihood of that happening? • What are the consequences? There are also three questions for the risk management process (Haimes 1991, 1998):

Risk Filtering, Ranking, and Management Framework

387

Fig. 1. Excerpt from a hierarchical holographic model developed to identify sources of risk to operations other than war (Dombroski et al. 2002).

388

Haimes, Kaplan, and Lambert • What are the available options? • What are the associated tradeoffs? • What are the impacts of current decisions on future options?

To deploy the RFRM methodology effectively, the variety of perspectives of ‘‘success’’ and sources of risk must be considered, including those representing hardware, software, organizational, and human failures. Risks that also must be addressed include programmatic risks, such as project-cost overrun and time delay in meeting completion schedules, and technical risks, such as not meeting performance criteria. An integration of empirical and conceptual, descriptive and normative, quantitative and qualitative methods and approaches is always superior to the ‘‘either-or’’ choice. Relying, for example, on a mix of simulation and analytically-based risk methodologies is superior to either one alone. The tradeoffs that are inherent in the risk management process manifest themselves in the RFRM methodology as well. The multiple noncommensurate and often conflicting objectives that characterize most real systems guide the entire process of risk filtering and ranking. The risk filtering and ranking process is aimed at providing priorities in the scenario analysis. This does not imply that sources of risks that have been filtered in an early phase of methodology are ignored; just that the more urgent sources of risks or scenarios are explored first.

4.2. RFRM Phases Eight major phases constitute the risk filtering, ranking, and management (RFRM) method. A case study in Section 5 demonstrates the efficacy of the proposed method. 4.2.1. Phase I: Identification of Risk Scenarios Through Hierarchical Holographic Modeling (HHM) Most, if not all, sources of risk are identified through the HHM methodology as discussed earlier. In their totality, these sources of risk describe ‘‘what can go wrong’’ in the ‘‘as planned’’ or success scenario. Included are acts of terrorism, accidents, and natural hazards. Each subtopic represents a category of risk scenarios, i.e., descriptions of what can go wrong. Thus, through the HHM we generate

a diagram that organizes and displays the complete set of system success criteria from multiple overlapping perspectives. Each box in the diagram represents a set of actions or results that are required for the successful operation of the system. At the same time, any failure will show up as a deficiency in one or more of the boxes. Fig. 1 is an excerpt from a hierarchical holographic model developed for characterization of support for operations other than war by the military. It is important to note the tradeoff inherent in the construction of the HHM. A more detailed HHM will yield a more accurate picture of the success scenario, and consequently lead to a better assessment of the risk situation. In other words, an HHM that contains more levels in its hierarchy will facilitate identifying the various failure modes for the system since the system structure is described in greater detail. A less detailed HHM, however, encapsulates a larger number of possible failure scenarios within each subtopic. This leads to less specificity in identifying failure scenarios. Of course, the more detailed HHM will be more expensive to construct in terms of time and resources. Therefore, there is a tradeoff: detail and accuracy versus time and resources. Consequently, the appropriate level of detail for an HHM is a matter of judgment dependent on the resources available for risk management and the nature of the situation to which it is applied. 4.2.2. Phase II: Scenario Filtering Based on Scope, Temporal Domain, and Level of Decision Making In Phase II, filtering is done at the level of ‘‘subtopics’’ or ‘‘sources of risk.’’ As mentioned earlier, the plethora of sources of risk identified in Phase I can be overwhelming. The number of subtopics in the HHM may easily be in the hundreds (Haimes 1998). Clearly, not all subtopics in the HHM can be of immediate and simultaneous concern to all levels of decision making and at all times. For example, in operations other than war (OOTW), three decision-making levels are identified (strategic, planning, and operational), and several temporal domains are considered (first 48 hours, short-, intermediate-, and long-term, disengagement, and postdisengagement). At this phase of the risk filtering process, the sources of risk are filtered according to the interests and responsibilities of the individual risk manager/

Risk Filtering, Ranking, and Management Framework

389

decisionmaker. The filtering criteria at this phase include the decision-making level, the scope (i.e., what risk scenarios are of prime importance to this manager), and the temporal domain (which time periods are important to this manager). Thus, the filtering in Phase II is achieved on the bases of expert experience and knowledge of the nature, function, and operation of the system being studied and of the role and responsibility of the individual decisionmaker. This phase often reduces the number of risk sources from several hundred to around 50. 4.2.3. Phase III: Bi-Criteria Filtering and Ranking Using the Ordinal Version of the U.S. Air Force Risk Matrix

Fig. 2. Example risk matrix for Phase III.

In this phase filtering is also done at the level of subtopics. However, the process moves closer to a quantitative treatment, where the joint contributions of two different types of information—the likelihood of what can go wrong and the associated consequences—are estimated on the basis of the available evidence. This phase is accomplished in the RFRM by using the ordinal version of the matrix procedure adapted from Military Standard (MIL-STD) 882, U.S. Department of Defense (DoD), cited in Roland and Moriarty (1990). With this matrix, the likelihoods and consequences are combined into a joint concept called ‘‘severity.’’ The mapping is achieved by first dividing the likelihood of a risk source into five discrete ranges. Similarly, the consequence scale also is divided into four or five ranges. The two scales are placed in matrix formation, and the cells of the matrix are assigned relative levels of risk severity. Fig. 2 is an example of this matrix, e.g., the group of cells in the upper right indicates the highest level of risk severity. The scenario categories (subtopics) identified by the HHM are distributed to the cells of the matrix. Those falling in the low-severity boxes are filtered out and set aside for later consideration. As a general principle, any ‘‘scenario’’ that we can describe with a finite number of words is actually a class of scenarios. The individual members of this class are subscenarios of the original scenario. Similarly, any subtopic from the HHM diagram to be placed into the matrix represents a class of failure scenarios. Each member of the class has its own combination of likelihood and consequence. There may be failure scenarios that are of low probability

and high consequence and scenarios that are of high probability and low consequence. In placing the subtopic into the matrix the analyst must make a judgment as to the likelihood and consequence range that characterizes the subtopic as a whole. This judgment must be such as to avoid overlooking potentially critical failure scenarios, and at the same time avoid overstating the likelihood of such scenarios. 4.2.4. Phase IV: Multi-Criteria Evaluation In Phase III we distributed the individual risk sources, by judgment, into the boxes defined in Fig. 2 by the consequence and likelihood categories. Those sources falling in the upper right boxes of the risk matrix were then judged to be the ones requiring priority attention. In Phase IV we take the process one step further by reflecting on the ability of each scenario to defeat three defensive properties of the underlying system; namely, resilience, robustness, and redundancy.2 As an aid to this reflection, we present a set of 11 ‘‘criteria’’ defined in Table I. These criteria relate to the ability of the scenarios to 2

Classifying the defenses of the system as resilience, robustness, and redundancy (3 Rs) is based, in part, on an earlier and related categorization of water-resources systems by Matalas and Fiering (1977), updated by Haimes et al. (1997). Redundancy refers to the ability of extra components of a system to assume the functions of failed components. Robustness refers to the insensitivity of system performance to external stresses. Resilience is the ability of a system to recover following an emergency. Scenarios able to defeat these properties are of greater concern, and thus are scored as more severe.

390

Haimes, Kaplan, and Lambert Table I. Eleven Criteria of a Risk Scenario Relating to its Ability to Defeat the Defenses of the System

Undetectability refers to the absence of modes by which the initial events of a scenario can be discovered before harm occurs. Uncontrollability refers to the absence of control modes that makes it possible to take action or make an adjustment to prevent harm. Multiple paths to failure indicates that there are multiple and possibly unknown ways for the events of a scenario to harm the system, such as circumventing safety devices, for example. Irreversibility indicates a scenario in which the adverse condition cannot be returned to the initial, operational (pre-event) condition. Duration of effects indicates a scenario that would have a long duration of adverse consequences. Cascading effects indicates a scenario where the effects of an adverse condition readily propagate to other systems or subsystems, i.e., cannot be contained. Operating environment indicates a scenario that results from external stressors. Wear and tear indicates a scenario that results from use, leading to degraded performance. HW/SW/HU/OR (Hardware, Software, Human, and Organizational) interfaces indicates a scenario in which the adverse outcome is magnified by interfaces among diverse subsystems (e.g., human and hardware). Complexity/emergent behaviors indicates a scenario in which there is a potential for system-level behaviors that are not anticipated from a knowledge of the components and the laws of their interactions. Design immaturity indicates a scenario in which the adverse consequences are related to the newness of the system design or other lack of a concept proof.

defeat these defensive properties. (These criteria are intended to be generally applicable but the user may of course modify them to suit the specific system under study.) As a further aid to this reflection, it may be helpful to rate the scenario of interest as ‘‘high,’’ ‘‘medium,’’ or ‘‘low’’ against each criterion (using Table II for guidance) and then to use this combination of ratings to judge the ability of the scenario to defeat the system. The criteria of risk scenarios related to the three major defensive properties of most systems are presented in Table I. These (example) criteria are intended to be used as a base for Phase V. After the completion of Phase IV, the ranking of the remaining scenarios is undertaken in Phase V with the quantitative assessments of likelihood and consequence. Scenarios that are judged to be less urgent (based on Phase IV) can be returned to for later study. 4.2.5. Phase V: Quantitative Ranking Using the Cardinal Version of the MIL-STD 882 Risk Matrix In Phase V, we quantify the likelihood of each scenario3 using Bayes Theorem and all the relevant evidence available (Kaplan 1990, 1992). The value of quantification, of course, is that it clarifies the results, disciplines the thought process, and replaces 3

The quantification of likelihood should, of course, be based on the totality of relevant evidence available, and should be done by processing the evidence items through Bayes Theorem (Kaplan 1990, 1992).

opinion with evidence. More on the use of Bayes Theorem is discussed in Phase V of Section 5. Calculating the likelihood of scenarios avoids possible miscommunication when interpreting verbal expressions such as ‘‘high,’’ ‘‘low,’’ and ‘‘very high.’’ This approach yields a matrix with ranges of probability on the horizontal axis, as shown in Fig. 3. This is the ‘‘cardinal’’ version of the ‘‘ordinal’’ risk matrix first deployed in Phase III. Filtering and ranking the risk scenarios through this matrix typically reduces the number of scenarios from about 20 to about 10. 4.2.6. Phase VI: Risk Management Having quantified the likelihood of the scenarios in Phase V, and having filtered the scenarios by likelihood and consequence in the manner of Fig. 3, we have now identified a number of scenarios, presumably small, constituting most of the risk for our subject system. We now turn our attention to risk management and ask: ‘‘What can be done, and what is cost-effective to do about these scenarios?’’ The first of these questions puts us into a creative mode. Knowing the system and the major risk scenarios, we create options for actions, asking, ‘‘What design modifications or operational changes could we make that would reduce the risk from these scenarios?’’ Having set forth these options, we then shift back to an analytical and quantitative thought mode: ‘‘How much would it cost to implement (one or more of) these options? How much would we reduce the risk from the identified scenarios?’’ ‘‘Would these options create new risk scenarios?’’

Risk Filtering, Ranking, and Management Framework

391

Table II. Rating Risk Scenarios in Phase IV Against the 11 Criteria Criterion Undetectability Uncontrollability Multiple paths to failure Irreversibility Duration of effects Cascading effects Operating environment

Wear and tear Hardware/Software/ Human/Organizational Complexity and emergent behaviors Design immaturity

High Unknown or undetectable Unknown or uncontrollable Unknown or many paths to failure Unknown or no reversibility Unknown or long duration Unknown or many cascading effects Unknown sensitivity or very sensitive to operating environment Unknown or much wear and tear Unknown sensitivity or very sensitive to interfaces Unknown or High degree of complexity Unknown or highly immature design

Medium

Low

Not Applicable

Late detection Imperfect control Few paths to failure

Early detection Easily controlled Single path to failure

Not applicable Not applicable Not applicable

Partial reversibility Medium duration Few cascading effects

Reversible Short duration No cascading effects

Not applicable Not applicable Not applicable

Sensitive to operating environment

Not sensitive to operating environment

Not applicable

Some wear and tear

No wear and tear

Not applicable

Sensitive to interfaces

No sensitivity to interfaces

Not applicable

Medium complexity

Low complexity

Not applicable

Immature design

Mature design

Not applicable

Moving back and forth between these modes of thought, we arrive at a set of cost-effective options that we now would like to recommend for implementation. However, we must remember that we have evaluated these options against the filtered set of scenarios remaining at the end of Phase V. Thus, in Phase VII we take another look at the effect these options might have on the risk scenarios previously filtered out. 4.2.7. Phase VII: Safeguarding Against Missing Critical Items Reducing the initial large number of risk scenarios to a much smaller one at the completion

of Phase V may inadvertently filter out scenarios that originally seemed minor but could become important if the proposed options were actually implemented. Also, in a dynamic world, early indicators of newly emerging critical threats and other sources of risk should not be overlooked. Following the completion of Phase VI, which generates and selects risk management policy options and their associated tradeoffs, we ask the question: ‘‘How robust has the policy selection and risk filtering/ranking process been?’’ Phase VII, then, is aimed at providing added assurance that the proposed RFRM methodology creates flexible reaction plans if indicators signal the emergence of new or heretofore undetected critical items. In particular, in Phase VII of the analysis, we: 1. Ascertain the extent to which the risk management options developed in Phase VI affect or are being affected by any of the risk scenarios discarded in Phases II to V. That is, in light of the interdependencies within the success scenario, we evaluate the proposed management policy options against the risk scenarios previously filtered out. 2. Revise as appropriate the risk management options developed in Phase VI in light of what was learned in Step 1 above.

Fig. 3. Risk matrix with numerical values for use in Phase V.

Thus, a purpose of Phase VII is to enable the refinement of risk management options in light of previously screened-out scenarios.

392

Haimes, Kaplan, and Lambert

The detailed deployment of Phase VII is mostly driven by the specific characteristics of the system. The main guiding principle in this phase focuses on cascading effects due to the system’s intra- and interdependencies that may have been overlooked during the filtering processes in Phases I to V. As well, the defensive properties that are addressed in Phase IV may be revisited to ensure that the system’s redundancy, resilience, and robustness remain secure by the end of Phase VII.

purpose is to support the exchange via the bridge of humanitarian medical and other supplies among several nongovernmental organizations and public agencies. These entities and the allied force must communicate in part over public telecommunications networks and the Internet regarding the security status of the bridge. As well, the public will need to be informed about the status of the bridge via radio, television, and the Internet. The RFRM will be used to identify, filter, and rank scenarios of risk to the mission.

4.2.8. Phase VIII: Operational Feedback New methodology and tools can be improved on the basis of the feedback accumulated during their deployment, and the proposed RFRM is no exception. Following are guiding principles for the feedback data-collection process: • The HHM is never considered finished; new sources of risk should be added as additional categories or new topics. • Be cognizant of all benefits, costs, revenues, and risks to human health and the environment. In particular, no single methodology or tool can fit all cases and circumstances. Therefore, a systematic data-collection process that is cognizant of the dynamic nature of the evolving sources of risk and their criticalities can maintain the viability and effectiveness of the proposed risk filtering and ranking method. 5. DEMONSTRATION FOR AN OPERATION OTHER THAN WAR (OOTW) To demonstrate the proposed risk filtering, ranking, and management (RFRM), we use a case study that has been conducted with the National Ground Intelligence Center, U.S. Department of Defense, and with the U.S. Military Academy at West Point. The case study of operations other than war (OOTW) focuses on the United States and allied operations in the Balkans (Dombroski et al. 2001). The overall aim of the case study is to ensure that the deployment of U.S. forces abroad for an OOTW would be effective and successful, with minimal casualties, losses, or surprises. We take as our case study the following mission: U.S. and allied forces engaged in the Balkans are asked to establish and maintain security for 72 hours at a bridge crossing the Tirana River in Bosnia. The

5.1. Phase I: Developing the HHM To identify risk scenarios that allied forces might encounter in this case study the following four HHMs were developed (Haimes et al. 2001): 1. 2. 3. 4.

Country HHM; U.S. HHM; Alliance HHM; and Coordination HHM.

For demonstration purposes and to limit the size of the example, the present article shows only the Telecommunications head topic of the Country HHM (see Fig. 4). Of the subtopics shown in Fig. 4, we choose the 11 subtopics (risk scenarios) listed in Table III for input to the Phase II filtering. 5.2. Phase II: Scenario Filtering by Domain of Interest In Phase II, we filter out all scenarios except those in the decisionmaker’s domain of interest and responsibilities. In operations other than war, one may consider three levels of decisionmakers: Strategic (e.g., Chiefs of Staff), Operational (e.g., Generals and Colonels), and Tactical (e.g., Captains and Majors). The concerns with and interest in a specific subset of the risk scenarios will depend on the decision-making level and on the temporal domain under consideration. At the strategic level, Generals may not be concerned with the specific location of a Company’s base and the risks associated with it, while the Company’s commander would be. For this example, we assume that the risk scenarios 1.5 Technology and 6 Regulation in Table III were filtered out based on the decisionmaker’s responsibilities. The surviving set of nine risk scenarios shown in Table IV becomes the input to Phase III.

Risk Filtering, Ranking, and Management Framework

393 Table III. List of 11 Scenarios to be Filtered in Phase II Subtopic 1.1 1.2 1.3 1.4 1.5 2. 3.1 3.2 4. 5. 6.

Telephone Cellular Radio Television Technology Cable Computer Information Systems (CIS) Management Information Systems (MIS) Satellite International Regulation

5.3. Phase III: Bi-Criteria Filtering To further reduce the number of risk scenarios, in Phase III we subject the remaining nine subtopics (risk scenarios) to the qualitative severity-scale matrix as shown in Fig. 5. We have assumed that evidence for the evaluations shown in Fig. 5 came from reliable intelligence sources providing knowledge about the telecommunications infrastructure in Bosnia. Also, for the purpose of this example, we further assume that the decisionmaker’s analysis of the subtopics (risk scenarios) results in removing the risk scenarios that received a moderate or low risk valuation from the subtopic set. In this example—the subtopics 1.3 Radio, 1.4 Television, and 3.2 Management Information Systems (MIS)—attained a moderate valuation and are removed. The remaining set of six risk scenarios are shown in Table V. 5.4. Phase IV: Multi-Criteria Filtering Now that the decisionmaker has narrowed the set of risk scenarios to a more manageable one, the user can perform a more thorough analysis on each subtopic. Table VI lists the remaining six subtopics Table IV. List of Nine Scenarios to be Filtered in Phase III Subtopic 1.1 1.2 1.3 1.4 2. 3.1 3.2 4. 5. Fig. 4. Telecommunications head topic of OOTW HHM.

Telephone Cellular Radio Television Cable Computer Information Systems (CIS) Management Information Systems (MIS) Satellite International

394

Haimes, Kaplan, and Lambert Table VI. Risk Scenarios for Seven Remaining Subtopics Subtopic

Risk Scenario

1.1 Telephone 1.2 2.

3.1 4. 5. Fig. 5. Qualitative severity scale matrix.

(risk scenarios), and gives each a more specific definition. In Phase IV, the user assesses each of these remaining subtopics in terms of the 11 criteria that are identified in Table I. Table VII summarizes these assessments. As part of our example we assume that these assessments result from analyzing each of the subtopics (risk scenarios) against the criteria, using intelligence data and expert analysis. 5.5. Phase V: Quantitative Ranking The user has thus far narrowed the important scenario list from 11 to six. Employing the quantitative severity-scale matrix and the criteria assessments in Phase IV, the user will now reduce the set further. In Phase V the same severity-scale index introduced in Phase III is used, except that the likelihood is now expressed quantitatively as shown in Figure 6. 5.5.1. Telephone Likelihood of Failure ¼ 0.05; Effect ¼ A (Loss of life); Risk ¼ Extremely High. This failure will cause loss of life and incapacitate the mission. Based on intelligence reports, Table V. List of Six Scenarios to be Evaluated in Phases IV and V Subtopic 1.1 1.2 2. 3.1 4. 5.

Telephone Cellular Cable Computer Information Systems (CIS) Satellite International

Failure of any portion of the telephone network for more than 48 hours Cellular Failure of any portion of the cellular network for more than 24 hours Cable Failure of any portion of the coaxial and/or fiber optic cable networks for more than 12 hours CIS Loss of access to Internet throughout the entire country for more than 48 hours Satellite Failure of the satellite network for more than 12 hours throughout the region International Failure of international communications network for more than six hours

however, enemy forces operating in Bosnia do not appear to be preparing for an attack against the telephone network. Therefore, we assign only 5% probability to this scenario.4 Should such an attack occur, a failure would be detectable. 5.5.2. Cellular Likelihood of Failure ¼ 0.45; Effect ¼ A (Loss of life); Risk ¼ Extremely High. U.S. forces will be dependent on cellular communications, thus this failure could cause loss of mission and loss of life. Intelligence reports and expert analysis show that insurgent forces may be preparing for an attack on the cellular network, knowing that coalition forces are utilizing it. There4

The Bayesian reasoning behind this assignment is as follows: Let A denote an enemy attack against the phone network. Let E denote the relevant evidence, namely that the intelligence reports no preparations for an attack. By Bayes then PðAjEÞ ¼ P0 ðAÞ  PðEjAÞ=P0 ðEÞ P0 ðEÞ ¼ PðEjAÞ  PðAÞ þ PðEjnotAÞ  PðnotAÞ Our prior state of knowledge about A, before receiving the evidence is P0 ðAÞ ¼ 0:5 ¼ PðnotAÞ. The probability of intelligence seeing evidence E, i.e., no preparations, if the enemy is going to attack is small. We take it as P(E|A) ¼ 0.05. (This is our appraisal of the effectiveness of our intelligence.) The probability of intelligence not seeing preparations giving that the enemy is not going to attack is high P(E|notA) ¼ 0.99. (This expresses our confidence that the enemy would not make preparations as a deceptive maneuver. Therefore P0 ðEÞ ¼ 0:05  0:5 þ 0:99  0:5 ¼ 0:025 þ 0:495 ¼ 0:52 PðAjEÞ ¼ 0:5  0:05=0:52 ¼ 0:05

Risk Filtering, Ranking, and Management Framework

395

Table VII. Scoring of Subtopics for OOTW Using the Criteria Hierarchy Criteria Undetectability Uncontrollability Multiple Paths to Failure Irreversibility Duration of Effects Cascading Effects Operating Environment Wear and Tear Hardware/Software/Human/Organizational Complexity and Emergent Behaviors Design Immaturity

1.1 Telephone

1.2 Cellular

Low Med High Med High Med High Med High Med Med

fore, we assign a 45% likelihood that the risk scenario will occur during the operation as assessed by this intelligence. Analysis also shows that an attack’s effects will be difficult to reverse. 5.5.3. Computer Information Systems (CIS) Likelihood of Failure ¼ 0.015; Effect ¼ C (Loss of some capability with compromise of some mission objectives); Risk ¼ Moderate. U.S. forces would not be immediately dependent on the CIS network, so this may cause some loss of capability, but should not cause the mission to fail. Detailed analysis of the CIS network shows that if an attack occurs against the existing Bosnian network, its effects may be severe with a low likelihood (about 0.015). 5.5.4. Cable Likelihood of Failure ¼ 0.3; Effect ¼ B (Loss of mission); Risk ¼ High.

Low Med Med High High Med High High High High High

2. Cable

3.1 CIS

4. Satellite

5. International

Med High High Med High Low High Low Med Low Med

High High High High High Low High High High High High

Low Med Med High High High Med Med High High High

High High High Low High High High High High High Med

U.S. forces utilize existing fiber optic and coaxial cable networks to communicate over the region. However, the network is not a primary communications platform. Intelligence of insurgent and enemy activity shows that forces are preparing for an attack on the cable network due to its vulnerability across the country. Therefore, we assign a likelihood of 0.3 for this risk scenario, given the current security over the network. 5.5.5. Satellite Likelihood of Failure ¼ 0.55; Effect ¼ A (Loss of life); Risk ¼ Extremely High. Because U.S. forces are strongly dependent on satellite communications, any loss for 12 hours or more can result in a loss of life and mission. An intelligence analysis of the satellite network shows that the network is protected throughout Bosnia, but not enough to ensure that forces opposing the operation will not succeed when attacking it. Due to the criticality of the network, enemy forces will likely target the network. Based on this assessment, the likelihood of the failure scenario occurring is high (0.55). 5.5.6. International

Fig. 6. Quantitative severity scale matrix.

Likelihood of Failure ¼ 0.15; Effect ¼ A (Loss of life); Risk ¼ Extremely High. Here we assume that any loss of international communications for six hours or longer throughout the region would cut off U.S. forces from other countries and U.S. strategic decisionmakers. Therefore, this is a very high-risk failure. Due to expert analysis of forces opposing the operation, an attack against international communications would be difficult but fairly likely. Therefore, we assign the

396 likelihood of 0.15 to this scenario. Even if it did occur, its effects may be somewhat reversible within six hours. Assuming that we filter out all subtopics (risk scenarios) attaining a risk valuation of moderate or low risk, CIS is filtered out. Therefore, the remaining five critical risk scenarios are: Telephone, Cellular, Cable, Satellite, and International Communications. Based on the assessments shown above and in Fig. 6, planners of the operation would surely want to concentrate resources and personnel on ensuring that the cellular, cable, satellite, telephone, and international communications networks are well protected and guarded.

Haimes, Kaplan, and Lambert impose priorities, and take appropriate actions to minimize the risks. The risk filtering, ranking, and management methodological framework presented here addresses this process. The eight phases of risk filtering, ranking, and management reflect a philosophical approach rather than a mechanical methodology. The philosophy can be specialized to particular contexts, e.g., operations other than war, an aerospace system, contamination of drinking water, or the physical security of an embassy. In this philosophy, filtering and ranking discrete classes of scenarios is viewed as a precursor to, rather than a substitute for, analysis of the totality of all risk scenarios. ACKNOWLEDGMENTS

5.6. Phase VI: Risk Management In Phase VI, a complete quantitative decision analysis is performed, involving estimates of cost, performance benefits, and risk reduction, and of management options for dealing with the most urgent remaining scenarios. Examples for Phases VI to VIII are beyond the scope of the risk filtering and ranking aspects of this article. Readers who are interested in the deployment of these phases may consult the following three sources: Dombroski (2001), Lamm (2001), and Mahoney (2001).

The research documented in this article was supported in part by the Virginia Transportation Research Council. We wish to thank our graduate students: Matthew Dombroski for his significant contributions to Section 5: ‘‘A Demonstration Problem’’; Ruth Y. Dicdican, Mike Diehl, Gregory Lamm, Maria (Peach) Leung, Brian Mahoney, Mike Pennock, and Joost Santos for their helpful comments and suggestions; Grace Zisk for her editorial assistance; and Della Dirickson for her administrative assistance. REFERENCES

5.7. Phase VII: Safeguarding Against Missing Critical Items In Phase VII, we examine the performance of the options selected in Phase VI against the scenarios that have been filtered out during Phases II to V. 5.8. Phase VIII: Operational Feedback Phase VIII represents the operational phase of the underlying system, during which the experience and information gained is used to continually update the scenario filtering and decision processes, Phases II to VII. 6. CONCLUSIONS AND FUTURE WORK Needless to say, any military operation, even one other than war, is a matter of great seriousness. Once undertaken, it is important to ensure its success. Just as the enemy will probe for weak spots, so the planners of the operation must identify,

Baron, J., J. C. Hershey, and H. Kunreuther, ‘‘Determinants of priority for risk reduction: the role of worry.’’ Risk Analysis, 20(4): 413–427, 2000. CRMES, ‘‘Ranking of space shuttle FMEA/CIL items: The risk ranking and filtering (RRF) method,’’ Center for Risk Management of Engineering Systems, University of Virginia, Charlottesville, VA, 1991. Dombroski, M., Y. Y. Haimes, J. H. Lambert, K. Schlussel, and M. Sulcoski, ‘‘Risk-based methodology for the characterization and support for operations other than war.’’ To appear in Military Operations Research Journal, 2002. Dombroski, M., ‘‘A risk-based decision support methodology for operations other than war,’’ Masters of Science Thesis, Department of Systems and Information Engineering, University of Virginia, 2001. Haimes, Y. Y., ‘‘Hierarchical holographic modeling.’’ IEEE Transactions on Systems, Man, and Cybernetics, 11(9), 606– 617, 1981. Haimes, Y. Y., ‘‘Total risk management.’’ Risk Analysis, 11(2), 169–171, 1991. Haimes, Y. Y., Risk modeling assessment, and management. New York: John Wiley & Sons, 1998. Haimes, Y. Y., N. C. Matalas, J. H. Lambert, B. A. Jackson, and J. F. R. Fellows, ‘‘Reducing the vulnerability of water supply systems to attack.’’ Journal of Infrastructure Systems, American Society of Civil Engineers, 4(4): 164–177, 1997. Kaplan, S., ‘‘On inclusion of precursor and near miss events in quantitative risk assessments: A Bayesian point of view and a

Risk Filtering, Ranking, and Management Framework space shuttle example.’’ Journal of Reliability Engineering and System Safety, 27, 103–115, 1990. Kaplan, S., ‘‘‘Expert information’ vs. ‘expert opinion;’ another approach to the problem of eliciting/combining/using expert knowledge in PRA.’’ Journal of Reliability Engineering and System Safety 35, 61–72, 1992. Kaplan, S., and B. J. Garrick, ‘‘On the quantitative definition of risk.’’ Risk Analysis, 1(1), 11–27, 1981. Kaplan, S., Y. Y. Haimes, and B. J. Garrick, ‘‘Fitting hierarchical holographic modeling (HHM) into the theory of scenario structuring and a refinement to the quantitative definition of risk.’’ Risk Analysis, 21(5), 807–819, 2001. Kaplan, S., S. Vishnepolschi, B. Zlotin, and A. Zusman, ‘‘New tools for failure and risk analysis, anticipatory failure determination (AFD) and the theory of scenario structuring.’’ Monograph published by Ideation International Inc., Southfield, Michigan, 1999. Lambert, J. H., Y. Y. Haimes, D. Li, R. Schooff, and V. Tulsiani, ‘‘Identification, ranking, and management of risks in a major system acquisition.’’ Reliability Engineering and System Safety, 72(3), 315–325, 2001. Lamm, G., ‘‘Assessing and managing risks to information assurance: A methodological approach,’’ Masters of Science Thesis, Department of Systems and Information Engineering, University of Virginia, 2001.

397 Mahoney, B., ‘‘Quantitative risk analysis of GPS as a critical infrastructure for civilian transportation applications,’’ Masters of Science Thesis, Department of Systems and Information Engineering, University of Virginia, 2001. Matalas, N. C. and M. B Fiering, ‘‘Water-resource system planning, in climate, climate change and water supply.’’ In Studies in geophysics, pp. 99–109, National Research Council, National Academy of Sciences, Washington, DC, 1977. Morgan, M. G., B. Fischhoff, L. Lave, and P. Fischbeck, ‘‘A proposal for risk ranking within federal agencies.’’ In Comparing environmental risks: Tools for setting government priorities, J. Clarence Davies (Ed.), Resources for the Future, Washington, DC, 1999. Morgan, M. G., H. K. Florig, M. L. DeKay, and P. Fischbeck, ‘‘Categorizing risks for risk ranking.’’ Risk Analysis, 20(1), 49, 2000. Roland, H. E. and B. Moriarty, System safety engineering and management, 2nd ed. New York: John Wiley & Sons, 1990. Sokal, R. R. ‘‘Classification: purposes, principles, progress, prospects.’’ Science, September 27, 1974. Webler, T., H. Rakel, O. Renn, and B. Johnson, ‘‘Eliciting and classifying concerns: A methodological critique.’’ Risk Analysis, 15(3), 421, 1995.

 2002 Society for Risk Analysis