Producing reliable software: an experiment - CiteSeerX

3 downloads 65323 Views 598KB Size Report
metrics; Software development methodologies; Cleanroom certification. 1. Objective of the study. A modest corporate sponsored software development.
The Journal of Systems and Software 61 (2002) 213–224 www.elsevier.com/locate/jss

Producing reliable software: an experiment Carol Smidts a,*, Xin Huang a, James C. Widmaier b,1 a

Reliability Engineering Program, University of Maryland, College Park, MD 20742, USA b Department of Defense, USA

Received 1 March 2001; received in revised form 1 July 2001; accepted 1 September 2001

Abstract A customer of high assurance software recently sponsored a software engineering experiment in which a small real-time software system was developed concurrently by two popular software development methodologies. One company specialized in the stateof-the-practice waterfall method rated at a Capability Maturity Model Level 4. A second developer employed his mathematically based formal method with automatic code generation. As specified in separate contracts, C++ code plus development documentation and process and product metrics (errors) were to be delivered. Both companies were given identical functional specifications and agreed to a generous and equal cost, schedule, and explicit functional reliability objectives. At conclusion of the experiment an independent third party determined through extensive statistical testing that neither methodology was able to meet the user’s reliability objectives within cost and schedule constraints. The metrics collected revealed the strengths and weaknesses of each methodology and why they were not able to reach customer reliability objectives. This paper will explore the specification for the system under development, the two development processes, the products and metrics captured during development, the analysis tools and testing techniques used by the third party, and the results of a reliability and process analysis. Ó 2002 Published by Elsevier Science Inc. Keywords: Software reliability; Capability maturity model; Formal methods; Software engineering experiment; Software process and product metrics; Software development methodologies; Cleanroom certification

1. Objective of the study A modest corporate sponsored software development activity was initiated to see if one could produce software of a given functional reliability using current software development processes and techniques under jointly agreed upon schedule and costs constraints. Indeed, software engineering as a science has progressed significantly over the last decades and it has become of interest and of importance to assess the status of the technology (i.e. tools and process) and its ability to satisfy customer’s demands. Since reliability is a measure of user satisfaction, it seemed only natural to set-up an experiment that evaluates the ability of modern and high-quality techniques at satisfying given levels of reliability. * Corresponding author. Tel.: +1-301-405-7314; fax: +1-301-3149601. E-mail addresses: [email protected] (C. Smidts), [email protected] (J.C. Widmaier). 1 Tel.: +1-301-688-1043.

0164-1212/02/$ - see front matter Ó 2002 Published by Elsevier Science Inc. PII: S 0 1 6 4 - 1 2 1 2 ( 0 1 ) 0 0 1 4 9 - 2

In this activity, two software applications were built from the same requirements specifications using different software engineering methodologies. One development methodology combined a classical ‘‘waterfall method’’ (Royce, 1987) of software development with a high level of process maturity (Level 4) defined by Carnegie Mellon’s Software Engineering Institute Capability Maturity Model (CMM, 1995). The company and the specific organization implementing the software had been externally assessed at Level 4 and was on its way to Level 5. The other development organization employed a state-of-the-art technology based upon formal methods with theorem proving and automatic code generation. Specware (Juellig et al., 1996) developed by Kestrel Institute embodied this technology. Specware had been employed in the past to develop security kernels and other high assurance applications. Each software development effort was managed under separate subcontracts to companies specializing in each methodology. There were a few simple ground rules to the experiment. Each company was given equivalent

214

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

funding and schedule delivery requirements and welldefined reliability objectives for two defined classes of failures. The reliability requirements were explicitly stated in their contracts. The customer gave a copy of the requirements specifications to the two teams of software developers to help them size the application and determine whether the schedule of delivery and the funding were adequate. On request, the customer verbally explained the nature of the application and answered the development organizations’ questions on the planned application before the contract was finalized. Both organizations agreed that both schedule and funding were ‘‘generous’’. In addition, it was made clear to the developers that they could have unlimited access to the customer to understand, refine, and correct the specification should faults be uncovered. It should be noted that the process of modeling the application for testing uncovered errors in the requirements such as: ambiguous and incomplete statements. The specification was composed of four pages of written description in IEEE format 830 (IEEE, 1998) (1347 words) containing in addition flowchart representations of the application. It was also made clear to the two development organizations that they would have the capability to negotiate a reduction in the scope of work if the refinements to the requirements lead to unanticipated efforts. Process metrics on skill level, hours, and errors uncovered as a function of development stage were to be gathered. The final delivery was to include the source and executable C++ code for the system, documentation on the employed engineering process, design and development documents, software test plans, procedures, and test results. An independent third party, the University of Maryland’s Reliability Engineering Program, reviewed the customer’s requirements, refined the requirements with the help of the user, modified the test model after all faults were removed from the specification, and developed test cases for the specified system using Teradyne’s TestMaster (Apfelbaum, 1997). They then executed the test scripts using Mercury’s tool WinRunner (Mercury, 1997). SPRE Inc., a consulting company specializing in solutions to software reliability engineering problems, was hired to independently validate the test models developed by the University of Maryland. Each delivered software system was tested by the University of Maryland using the same test scenario. Failure data were gathered for each developed software application and then used to estimate the operational/ functional reliability. Process metrics at discrete stages were gathered from the two software developers and were compared to help understand what led to differences in reliability (see Fig. 2 for a description of the experimental process).

2. Description of the software under test The software system specified for contracted development was a fictitious version of a personnel badge reader typically found at the entrance to restricted buildings. The Personnel Access Control System (PACS) is an automated entry access (gate) which reads a personal I.D. card containing an individual’s name and social security number. The user swipes the card in the reader while the system searches for a match in the database, which may be periodically updated by the system administration. If a match occurs, the system allows the user to enter his/her personal identification number (PIN), a four digit code, into a twelve position keyboard display. The system then validates/invalidates the PIN and instructs/prevents entry through the gate. A single line display screen provides instructional messages to the user. An attending security officer monitors a duplicate message on his console with a gate entry override capability. There are six simple hardware components to the PACS system – the card reader, keyboard, single line digital display unit, the guard display unit, guard reset unit, and the gate. Fig. 1 summarizes the functional requirements as stated by the customer to the two software system developers. The complete specification document provided to the software development teams can be found in Huang (1998). For simplicity the system was to consist of only one reader which operated 24 h a day, seven days a week. Usage (i.e. system loading) was to vary from light to heavy depending on time of day and day of week. A Level 1 failure of the software was defined to be the condition or conditions in which the software was hung, or valid user cards and valid pins were not processed, invalid users had access, timing requirements were compromised, or the guard could not override the system. In summary, a Level 1 failure was defined to be one which brought the system to a critical state. The reliability target was specified to be 0.99 per transaction (i.e. on average 99 out of 100 typical gate transactions were to run successfully without Level 1 failures). Level 2 failures on the other hand, were defined to be less severe but manifested themselves as the system not working properly. The guard could override these malfunctions and still keep the system running. Anomalies such as an entrant carrying a large package who needs extra time were to constitute conditions for Level 2 system failure. Thus, a Level 2 failure was to have an operational work around. The target reliability was specified to be 0.9 per transaction (i.e. 9 out of 10 transactions were to run successfully). The specification also included messages for both the user and guard, auditing log requirements, database size constraints, keyboard timing and response requirements, and a few additional modest performance requirements.

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

215

Fig. 1. Personnel access control system (PACS).

3. Reliability definition Reliability estimations were to be determined from testing the software (Musa and Widmaier, 1997) using typical user input function frequency profiles (i.e. op-

erational profiles). Operational profiles are defined as particular functions/operations and their frequency of usage (Musa, 1999). In this experiment 88 operations comprehensively defined the complete functional profile for the specified system. However, only 35 were needed

216

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

Fig. 2. The experimental process.

to cover 99% of all the user’s expected operations. The reliability value in this project was defined as the probability of no failure per gate transaction. Equivalently, it was the number of successful gate transactions divided by the total attempted gate transactions for the given operational profile. In actuality, there are numerous ways to estimate the operational reliability taking into account runs which are partially successful or situations where faults are removed and the system operations are continued. However, in this experiment there was no requirement for demonstrating improvement in reliability (reliability growth) by the developers as a function of software evolution (calendar time). Thus, computing the reliability for this experiment by the developers should have equated to the simple success to success plus failure ratio also known as Nelson’s model (Nelson, 1978). For example, if the program failed twice in 100 independent system test runs, the reliability would have been 0.98 per transaction. Reliability does of course vary with the user’s operational profile (Musa and Widmaier, 1996). The test team elicited the operational profile from the user. The user’s profile was not given in the specifications as it is

reasonable to expect that software developers familiar with the concept of reliability would know to obtain such information from the user.

4. Development activities 4.1. Conventional SEI Level 4 approach A development team which has reached Level 4 in the ascending scale of 1 to 5, is characterized as one which is focused on product and process quality. In particular, a Level 4 organization has to quantitatively manage its process using measures such as software reliability measures (CMM, 1995). The team in this approach consisted of a program manager, a requirement specification analyst, a quality assurance and process engineer, one junior and two senior experienced software engineers. Their activities could be categorized into two groups: engineering and management. Engineering activities included the Requirements Management, Software Product Engineering, and Quantitative Process Management.

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

Supporting these major activities were management related activities of: Project Planning, Project Tracking and Oversight, Peer Reviews, Integrated Software Management, Intergroup Coordination, Training, Quality Assurance, Quality Management, and Configuration Management. Each activity had a corresponding documented process or procedure. The design methodology of choice was to introduce the object oriented methodology to generate the required C++ source code. The Unified Modeling Language (UML) (Harmon, 1998) and its computer-based support tool Rational Rose (Quatrani, 2000) were employed for the first time to specify the system requirements and lead into development. Testing was addressed with 16 documented test cases. These were run during system integration and system test. A statement was made at delivery that the C++ ‘‘code was 100% reliable’’; however, no reliability demonstration tests were conducted. The same statement appeared in the team’s requirements specification document. In other words, the team’s specification document indicated a commitment to develop a 100% reliable code. 4.2. Formal methods approach The formal methodology involved the use of Haskell and Specware. Haskell (Hudak, 2000) is a general purpose, purely functional programming language incorporating many recent innovations in programming design language. Haskell provides high-order functions, non-strict semantics, static polymorphic typing, user-defined algebraic data types, pattern-matching, list comprehensions, a module system, a monadic I/O system, and a rich set of primitive data types, including lists, arrays, arbitrary and fixed precision integers, and floating-point numbers. Specware (Juellig et al., 1996) is an advanced environment for the formal development of complex software systems. Its specification style belongs to the algebraic category. The tool is based on category theory (Barr, 1995). Category theory addresses the description and characterization of certain recurring mathematical structures within a unified framework. The fundamental elements of category theory are the class of objects, the function associated with each pair of objects, the arrows between them, and the axioms that the objects and arrows have to satisfy. SLANG is Specware’s specification language. Specware development consists in the systematic transformation of formal specifications into executable programs, using high level, parallel and sequentially composable design steps formally represented within and applied by the system. The formal methods team consisted of a contract manager and two senior computer scientists. Aside from scoping the cost and schedule the two technical people were responsible for all development process activities.

217

Requirements analysis was a very intense and productive activity which produced multi-level state diagrams. The team opted to a prototyping stage which involved modeling, executing and verifying Haskell specifications. Specware as required in the contract could have been used directly to express the hierarchal functionality of the PACS system in SLANG and run as an executable specification. According to the programmer however Haskell offered additional features with which to capture features of imperative programming languages as well as being ‘‘more user-friendly’’. The Haskell prototype once developed was automatically converted to Specware specifications so that the automatic code generation feature of Specware could be exercised. The source code generated was in C++ and was compiled by GNU C++ compiler. The development team’s integration test set-up was handwritten and only 30 test cases were run. However, verification of the specifications had been done earlier on the executable Haskell model. As with the first contractor, this developer had no reliability demonstration test and assumed a ‘‘correct and complete implementation’’.

5. Testing During the subcontracted development activities, the University of Maryland prepared for the reliability assessment of the two operational coded solutions and analysis of developer process metrics. The assessment closely replicated a Cleanroom certification process (Prowell et al., 1999). The first step in the assessment process was the construction of a test model using the tool TestMaster. An outside independent consultant (SPRE Inc.) was hired by the customer to review the TestMaster models constructed by the University of Maryland (Huang, 1998). TestMaster is a relatively new test design tool based on the model reference test (MRT) technology. It uses the concept of Extended Finite State Machine (EFSM) (Gill, 1962) to graphically and textually represent the specifications and describe the application’s desired behavior. Software tests are created automatically using a script generator. The generator develops tests by finding a path through the specified finite state machine representation of the application from the starting to the exit state. The path is a sequence of events and actions that traverse the diagram, defining an actual-use scenario. Each transition in the path may contain events, output actions, control information (predicates and constraints), variable assignments, and testing information. Once a path has been defined, the test generator creates a test script for that path by concatenating the

218

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

test action statements (in the language of the test execution environment) and data values required to move the system from its current state to the next state (Apfelbaum, 1997). When the test executor applies this script to the system under test, the system should follow the sequence defined by the path if the system’s implementation is correct. Fig. 3 shows the high level TestMaster Model. The model is hierarchical with higher level states ‘‘calling’’ lower level states. Figs. 4–6 show progressively lower hierarchical levels in the state machine model. Fig. 7 gives an example test script generated by the test tool. To fully automate the entire testing process, the test generation tool had to be followed by a test execution tool. In this study, Mercury’s tool WinRunner was chosen (see Fig. 8). Win–Runner’s test executor recognizes its own Test Script Language (TSL) which was used to construct TestMaster snippets. To calculate the reliability of PACS from the reliability of its functional and non-functional elements (such as performance), it is necessary to determine the possible operations of the software (see Table 1), the functional and non-functional elements involved in each operation and their probability of failure as well as the way in which these different elements combine. Fault trees were used to show how the different functional elements, non-functional elements and operations combine. Fig. 9 shows part of the fault tree obtained. The upper left side of the tree is concerned with failures of the computer clock function and failures of the dump function. These functions are carried out by hardware

devices. The central part of the tree is concerned with the failure of PACS while the code executes in operation 1. The rightmost central position of the tree is a transfer gate to additional logic accounting for failures which have the potential to occur when PACS executes other operations. A theoretical 88 operations with frequencies ranging from 2.7E-10 to 0.37 were identified. The top five operations, i.e. the five most likely operations are displayed in Table 1. A total of 35 out of the 88 operations contributed to a system reliability of 0.999, a value greater than the specified 0.99 requirement. Modeling 0.995 probability space would still have been sufficient. Testing proceeded along different operations by constraining the TestMaster model in each operation with an identical number of tests per operation. The testing approach could have considered a random selection of test cases; however, this would have precluded the in-depth investigation of the low probability operations and the study of the behavior of the code in this context. In all 200 tests were run on each of the 35 operations for a total of 7000 test cases. Information on the probability of failure of the functional and non-functional elements involved in each operation was collected during these tests and helped populate the fault tree built for the system. Note that a simpler reliability approach could have been used which would not have used fault trees to represent failure of PACS and would have modeled the software as a black box. However, we were also interested in an identification of the weak areas of

Fig. 3. High level TestMaster model for PACS.

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

219

Fig. 4. PACS TestMaster model (Enter_Card).

Fig. 5. PACS TestMaster model (Read_Card).

the software. Furthermore, the detailed modeling allows an easy reevaluation of the application’s reliability for changes in the operational profile.

6. Results C++ software was designed, developed, and delivered under the two methodology development subcontracts.

However, what was delivered for product and process requirements failed to meet all programmatic expectations in many areas, from functionality to the lack of explicit reliability estimates to the required process metrics. Ironically, both developers claimed they had satisfied contractual requirements from both the process and software product standpoints. Process metrics concerned with manpower and skill levels were captured for

220

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

Fig. 6. PACS TestMaster model (Good_Card_Source).

Fig. 7. Sample test script generated by TestMaster.

Fig. 8. General testing approach.

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

221

Table 1 Five most likely PACS operations Description of the operations

Probability

I. One good card and one good PIN with no erase II. One good card and one good PIN with erase (once or twice) III. One good card, one bad PIN (timing is incorrect) and one good PIN with no erase IV. One good card, two bad PINs (timing is incorrect) and one good PIN with no erase V. One good card and walking away without entering a PIN. The system automatically resets

    

0.3708688 0.1252381 0.1420415 0.05440142 0.08818182

Fig. 9. PACS’ fault tree detailed description of operation 1.

Table 2 Summary of analysis results Summary of analysis results SEI CMM Level 4 + Waterfall

Formal methods

People

Company X 1 Entry level engineer 2 Senior S/W engineer 1 Requirements analyst 1 QA/process engineer 1 Program manager

Company Y 1 Ph.D computer scientist 1 Computer scientist 1 Contract manager

Process Requirements (h) System & S/W Design Implementation Integration and Test (Management)

SEI/CMM Level 4 83 211 47 36 385

Undocumented 178 52 283 36 8

762

557

Object oriented Rational rose Unified Modeling Language

Haskell Specware

Total Technology

all stages during development by both subcontractors. These process metrics are summarized in Table 2.

In the SEI CMM Level 4 + Waterfall methodology much care was taken for initial project preparation and

222

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

Fig. 10. Defect distribution per phase over the development life cycle.

estimation. The cost and schedule estimation predicted both early delivery and cost under-run. Project management was greatly involved at this and subsequent stages and as seen in Table 2 amounted to almost 50% of all man-hours. Requirements analysis was minimized after having initially judged the customer’s ‘Requirements Specifications’ to be ‘‘one of the best seen’’. However, lack of involvement of the design, development, and testing team may have been the reason for not finding the hidden faults in the Requirements Specifications. Ambiguous and incomplete requirements were settled internally without recourse to the user. However, this team did tabulate defects found at each of their major development stages (see Fig. 10). The CMM Level 4 (Quantitative Processes and Quality Management) contractor degraded for a short

while to a ‘‘maturity level’’ between Level 1 and Level 2 (ad hoc/repeatable, respectively) when schedule and funding grew short. Technical heroes were conscripted to meet cost and schedule when the junior engineer failed to solve a problem with new Object Oriented tools. When the University of Maryland ran 7000 user profile tests on this contractor’s 296 lines of code, reliability was determined to be 0.56 for Level 1 failures (on the average 44 of 100 test cases caused Level 1 failures). Reliability for Level 2 failures was determined to be 0.97 (see Tables 3 and 4). Processes and methodology were well documented and traceable through the experimental development (see Table 5). The CMM Level 4 + Waterfall approach fell short of the stated reliability objectives for a number of reasons as the process metrics revealed. A primary weakness was that requirements analysis was not sufficient nor did it involve all ‘‘next-step’’ parties to detect inconsistencies or incompleteness. Consequently, the developed product reflected only what the coder interpreted to be the requirements. Another weakness was the fact that a junior person being assigned to work on the project got in technical trouble with new OO tools and technology. Finally, inadequate scaling of the Level 4 process down to a small project may have also contributed to the degraded performance as witnessed by the heavy loading of management on the resource consumption. Documentation of the process was very good and the code should be easy to maintain.

Table 3 Failures identified in each version Version X failures

Version Y failures

 If by any means the input for the first PIN digit is either ‘‘*’’ or ‘‘#’’, which is the erase sign, the system will take it as a legitimate input for PIN digit 1 and ask for the second digit  The ‘‘ERASE’’ function only clears the last PIN digit, not all digits  The system is unable to recognize a correct card past a certain location in the card database and every card after that point  The attending guard is unable to override the gate when extra entry time is required  There is no limit on how many times the erase function can be used within one entry transaction  No function has been developed to control the time interval between two cards within one entry transaction  No timing requirements are implemented  If by any means, two entries in the card database have the same social security number but different names and PINs, the condition goes undetected and those two PINs are good for both names  No audit dump function is implemented

 The function that controls the maximum number of erase functions which can be used in one entry is not implemented correctly  There is no function that controls the time interval between two cards within one entry transaction  Part of the timing requirements for key stroking are implemented incorrectly

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224 Table 4 Reliability estimates Version

Level 1

Level 2

SEI CMM 4 + Waterfall conventional development Formal methods

0.56

0.97

0.77

0.65

Table 5 Documentation delivered by the two processes Documentation delivered by X PACS Process Handbook (describing the software development process) PACS Process Management Handbook PACS Process Management Report (includes a description of process deviations) PACS Interface Instructions (describes the procedure (inputs, etc) to be used to execute the software) PACS Test Plan and Test Results PACS Program Management Plan PACS Measurements (includes measurements of size, effort as well as process and quality measurements such as number of defects found. The documentation also includes inspection logs. Finally, it provides budget expenditures) PACS Requirements specification PACS Design Specification PACS source code, executable and database

Documentation delivered by Y

223

and schedule constraints. Unfortunately, this would have been an opportunity to uncover any possible specification faults or inconsistencies. Testing to enable reliability estimations was not conducted, which consequently, would have also led them to discovery of the two simple temporal errors. However, the 7000 operational profile tests run by the University of Maryland on their 3986 lines of C++ code led to a reliability estimation of 0.77 (on the average 23 Level 1 failures within a 100 test case set). Reliability for Level 2 failures was estimated as 0.65. If the two errors found during the requirements review conducted with the user had been documented and removed, the reliability for Level 1 failure could have jumped to 98% instead of the measured 77%! The development process was poorly documented including the errors found with the customer during the requirements review already mentioned. Their engineering process was more of a research and development activity i.e. difficult to reproduce (a common short coming of most leading edge technologies). Tables 2 and 4 summarize the reliability estimates and performance metrics.

7. Conclusions Haskell specification

Specware specification C++ Code

The formal methods process was surprisingly thorough in the requirements analysis stage as evidenced by the amount of time expended in this area (reference Table 2 and Fig. 10). It was in this stage that both the incomplete and conflicting requirements in the customer supplied specification were found. One ground rule of the experiment was that the two subcontractors could not share information, such as shortcomings in the requirements provided by the customer. Thus, the SEI CMM Level 4 subcontractor never had the benefit of resolving the requirements inconsistencies uncovered by the formalists. There was also continuity throughout the development from one stage to another since the two scientists ‘‘did it all’’. The C++ code generation was a major undertaking, fraught with technical incompleteness found within the Specware tool. A simple assignment construct needed refinement by the Specware tool supplier.The code generation stage consumed time and resources beyond what was expected. Theorem proving was consequently not done on the specifications because of escalating cost

This experiment was initiated to determine whether or not state-of-the-art software engineering development methods can currently guarantee the development of applications of specific levels of reliability when set in realistic conditions of cost and schedule. This experiment is a first step in responding to this question. The experiment shows that the companies investigated were unable to satisfy the reliability requirements set by the user. Generalization of the results obtained can however not be performed at the current time. Further studies will need to investigate larger samples of organizations, the variations of the results with the application domain, with the size of the application, with the development technique used, etc. Developer shortcomings in this experiment could have been mitigated by performing a risk analysis on new technology when estimating cost and schedule for even seemingly easy tasks. Requirements analysis (and modeling) is a necessary step in any software engineering method. Time invested in this step is critical to success and was partially demonstrated here. In addition, education of the development teams on reliability testing would have helped them understand the requirements and address the latter adequately. On the other hand, to reinforce the reliability requirements, the customer in the future might consider the delivery of reliability metrics (such as reliability growth estimates) and operational profile during the course of software system integration testing.

224

C. Smidts et al. / The Journal of Systems and Software 61 (2002) 213–224

References Apfelbaum, L., 1997. Spec-based tests make sure telecom works. IEEE Spectrum. Barr, M., 1995. Category Theory for Computer Science. Prentice Hall, Englewood Cliffs, NJ. Gill, A., 1962. Introduction to the Theory of Finite-State Machines. McGraw-Hill, New York. Harmon, P., 1998. Understanding UML: The Developer’s Guide: With a Web-based Application in Java. Morgan Kaufmann, Los Altos, CA. Huang, X., 1998. A comparison between standard and formal mathematical development, Master’s Thesis. Department of Nuclear Materials and Reliability Engineering, University of Maryland. Hudak, P., 2000. The Haskell School of Expression: Learning Functional Programming Through Multimedia. Cambridge University Press, Cambridge. IEEE, 1998. IEEE Recommended Practice for Software Requirements Specifications. IEEE Std 830-1998. Juellig, R. et al., 1996, Specware: An advanced environment for the formal development of complex software systems. In: Proceedings of the 5th International Conference of Algebraic Methodology and Software Technology, Munich, Germany.

Mercury, 1997. WinRunner User’s Guide, Version 4.0, Mercury Interactive Corporation. Musa, J., 1999. Software Reliability Engineering: More Reliable Software, Faster Development and Testing. McGraw-Hill, New York. Musa, J., Widmaier, J., 1996. Software reliability engineered testing, crosstalk. J. Defense Software Eng. Musa, J., Widmaier, J., 1997. Software reliability engineering testing. J. Software Quality Assurance Inst. Nelson, E., 1978. Estimating software reliability from test data. Microelectron. Reliab. 17, 67. Prowell, S.T., Trammell, C.J., Linger, R.C., Poore, J.H., 1999. Cleanroom Software Engineering Technology and Process. In: The SEI Series in Software Engineering. Addison-Wesley, Reading, MA. Quatrani, T., 2000. Visual Modeling with Rational Rose and UML. Addison-Wesley, Reading, MA. Royce, W.W., 1987. Managing the development of large software systems: concepts and techniques. In: Proceedings of Wescon, August, Computer Society Press (also available in Proceedings of ICSE9). Software Engineering Institute, 1995. The Capability Maturity Model: Guidelines for Improving the Software Process. SEI Series in Software Engineering. Addison-Wesley, Reading, MA.