Generalized Discrete Software Reliability ... - Semantic Scholar

39 downloads 222 Views 303KB Size Report
say that the program size influences on the software relia- bility growth ...... reliability models, project management, reliability engineering, and quality control.
170

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 2, MARCH 2007

Generalized Discrete Software Reliability Modeling With Effect of Program Size Shinji Inoue, Member, IEEE, and Shigeru Yamada, Member, IEEE

Abstract—Generalized methods for software reliability growth modeling have been proposed so far. But, most of them are on continuous-time software reliability growth modeling. Many discrete software reliability growth models (SRGM) have been proposed to describe a software reliability growth process depending on discrete testing time such as the number of days (or weeks); the number of executed test cases. In this paper, we discuss generalized discrete software reliability growth modeling in which the software failure-occurrence times follow a discrete probability distribution. Our generalized discrete SRGMs enable us to assess software reliability in consideration of the effect of the program size, which is one of the influential factors related to the software reliability growth process. Specifically, we develop discrete SRGMs in which the software failure-occurrence times follow geometric and discrete Rayleigh distributions, respectively. Moreover, we derive software reliability assessment measures based on a unified framework for discrete software reliability growth modeling. Additionally, we also discuss optimal software release problems based on our generalized discrete software reliability growth modeling. Finally, we show numerical examples of software reliability assessment by using actual fault-counting data. Index Terms—Assessment measures, binomial process, discrete Weibull distribution, generalized discrete model, optimal release problems, software reliability.

I. I NTRODUCTION

A

SSESSING software reliability in a testing phase of a software development process is one of the important issues to develop a highly reliable software system. During the testing phase, an implemented software system is tested to detect and correct faults latent in the software system. Then, we can describe the software failure-occurrence or fault-detection phenomenon by analyzing the related actual data collected in the testing phase. A software reliability growth model (SRGM) [1]–[4] is known as a useful mathematical tool to describe the above phenomena in the testing phase and to assess software reliability quantitatively. As the role of software systems is expanding rapidly, the size, complexity, and diversification of software systems are growing drastically in recent years. Accordingly, we need to develop more feasible SRGMs, which enable us to assess software reliability more accurately. As one of the solutions, generalized approach for software reliability growth modeling has been proposed so far based on an order statistics [5], an infinite Manuscript received August 23, 2005; revised July 27, 2006. This work was supported in part by the Ministry of Education, Sports, Science, and Technology of Japan under a Grant-in-Aid for Scientific Research (C), Grant 18510124. This paper was recommended by Guest Editor H. Pham. The authors are with the Department of Social Systems Engineering, Tottori University, Tottori 680-8552, Japan. Digital Object Identifier 10.1109/TSMCA.2006.889475

server queuing theory [6], Markov processes [2], [7], [8] and so on. Especially, Langberg and Singpurwalla [5] have proposed a unified framework for software reliability growth modeling by using the assumption that the fault-detection times can be regarded as an order statistics. Moreover, they have also discussed the fact that several nonhomogeneous Poisson process models (NHPP models) can be classified by the fault-detection time distribution. This unified framework (or generalized SRGM) has a useful characteristic that we can easily obtain a suitable SRGM by reflecting the software failure-occurrence or fault-detection phenomenon to the generalized assumptions. Most of the generalized SRGMs, including the generalized SRGMs mentioned above, have been discussed in terms of continuous-time SRGMs, because the continuous-time SRGM is specifically applicable to the reliability analysis [9]. However, considering that there are discrete SRGMs to describe software reliability growth processes depending on discrete testing time such as the number of days (or weeks) and the number of executed test cases [10], we need to discuss a generalized discrete software reliability growth-modeling approach. In recent researches, Huang et al. [11] have discussed a unified scheme of discrete NHPP models by applying the concepts of weighted arithmetic, weighted geometric, or weighted harmonic means. Moreover, Okamura et al. [12] have discussed a unified parameterestimation method based on the expectation–maximization (EM) principle and investigated the effectiveness of the estimation method based on the EM algorithm by comparing with Newton’s method. These unified frameworks are only for discrete software reliability models, which are based on NHPPs. In this paper, we discuss a unified framework for discrete software reliability growth modeling in which the software failure-occurrence times follow a discrete-time probability distribution. Based on the framework, we then develop a generalized discrete SRGM following a binomial process, which enables us to assess software reliability in consideration of the effect of the program size. From the point of view of software complexity of the internal program structure, which may increases as the program size becomes larger, we can say that the program size influences on the software reliability growth process depending on the software complexity. Especially, based on our generalized discrete SRGM with the program size, we propose two types of specific discrete SRGMs in which the software failure-occurrence time distributions follow geometric and discrete Rayleigh distributions, respectively. After that, we derive several generalized software reliability assessment measures based on the concept of the generalization framework. Moreover, we discuss parameter-estimation based on the method of maximum likelihood for our generalized

1083-4427/$25.00 © 2007 IEEE

INOUE AND YAMADA: GENERALIZED DISCRETE SOFTWARE RELIABILITY MODELING

discrete SRGM. We then compare the performance of our proposed discrete SRGMs with the existing discrete SRGMs in terms of the goodness-of-fit. Additionally, we discuss optimal software release problems with simultaneous cost and reliability objectives based on our generalized discrete SRGM. Finally, we depict numerical illustrations of our generalized discrete model and its application to derived optimal release policies by using actual fault-counting data. II. G ENERALIZED M ODELING We discuss a unified framework for discrete software reliability growth modeling, in which the probability distribution of the software failure-occurrence (or the fault detection) times follow a discrete-time probability distribution. Based on the framework, we develop a generalized discrete binomial-process model with the effect of the program size. A. Unified Framework In a testing phase, the software failure-occurrence times can be regarded as an order statistics. Okamura et al. [12] have discussed a unified framework for discrete software reliability growth modeling based on the order statistics. The unified framework is based on the following assumptions. A1) Whenever a software failure is observed, the fault which caused it will be detected immediately, and no new faults are introduced in the fault-detection procedure. A2) Each software failure occurs at independently and identically distributed random times with the discrete i probability distribution P (i) ≡ Pr{I ≤ i} = k=0 pI (k)(i = 0, 1, 2, . . .), where pI (k) and Pr{A} represent the probability mass function for I and the probability of event A, respectively. A3) The initial number of faults in the software system, N0 (> 0), is a random variable and is finite. We can develop a generalized discrete SRGM based on the assumptions above. First, let {N (i), i = 0, 1, . . .} denote a discrete stochastic process representing the number of faults detected up to ith testing period. Then, the conditional probability that m faults are detected up to ith testing period given that N0 = n is derived as   n {P (i)}m {1 − P (i)}n−m . Pr{N (i) = m|N0 = n} = m (1) Accordingly, we have the probability mass function that m faults are detected up to ith testing period as n Pr{N (i) = m} = {P (i)}m {1 − P (i)}n−m m n × Pr{N0 = n} (m = 0, 1, 2, . . .). (2) The stochastic behavior of the software fault-detection or failure-occurrence phenomenon in the testing phase can be characterized by giving a suitable probability mass function of the initial fault content N0 . Okamura et al. [12] have discussed a generalized discrete Poisson process model for software

171

reliability assessment by assuming that the initial fault content N0 follows a Poisson distribution and proposed a parameterestimation method based on the EM algorithm. B. Generalized Discrete Binomial-Process Modeling In this paper, we propose a generalized discrete binomialprocess model for software reliability assessment by considering the case that the probability distribution of the initial fault content N0 follows a binomial distribution with parameters (K, λ), which is given as   K n λ (1 − λ)K−n Pr{N0 = n} = n (0 < λ < 1; n = 0, 1, . . . , K). (3) Equation (3) has the following physical assumptions. 1) The software system consists of K lines of code (LOC) at the beginning of the testing phase. 2) Each code has a fault with a constant probability λ. 3) Each software failure caused by a fault remaining in the software system occurs independently and randomly. These assumptions are useful to apply a binomial distribution as a probability mass function of the initial fault content in the software system to software reliability growth modeling and to incorporate the effect of the program size into software reliability growth modeling. Substituting (3) into (2), we can derive the probability mass function of the number of faults detected up to ith testing period as   K {λP (i)}m {1 − λP (i)}K−m Pr{NB (i) = m} = n (m = 0, 1, 2, . . . , K). (4) From (4), we can see that the number of faults detected up to ith testing period follows a binomial process if the probability mass function of the initial fault content follows the binomial distribution. C. Discrete Failure-Occurrence Time Distribution We need to specify a discrete failure-occurrence time distribution to develop an SRGM. In this paper, we assume that each software failure is observed according to a discrete Weibull distribution [13], [14]. The probability distribution function of the discrete Weibull distribution is given as β

P (i) = 1−(1 − p)i

(i = 0, 1, 2, . . . , β > 0, 0 < p < 1). (5)

In (5), p represents the probability that a software failure caused by a fault is observed per one testing period and β is the shape parameter. The discrete Weibull distribution can describe flexibly the stochastic behavior of the failure-occurrence times. That is, the discrete Weibull distribution has the following properties: decreasing software-failure rate (DFR) for 0 < β < 1, constant software-failure rate (CFR) for β = 1, and increasing softwarefailure rate (IFR) for β > 1.

172

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 2, MARCH 2007

In this paper, we focus on the cases that β = 1 and β = 2, respectively, as special cases for the discrete Weibull distribution in (5). When β = 1 in (5), the distribution becomes a geometric distribution i

P (i) = 1 − (1 − p)

(i = 0, 1, 2, . . . , 0 < p < 1)

(6)

which has the CFR. The geometric distribution means that a software failure-occurrence at any testing period decreases geometrically, which represents the case that the internal program structure is simple and the testing skill of test-case designers is high [15]. When β = 2, the distribution can be regarded as a discrete Rayleigh distribution 2

P (i) = 1 − (1 − p)i

(i = 0, 1, 2, . . . , 0 < p < 1)

(7)

which has the IFR. Applying the discrete Rayleigh distribution to a software-failure-occurrence time distribution means that the internal program structure is complex and the initial testing skill of test-case designers is low; however, their testing skill improve more and more as the testing period goes on [15].

respectively. We can see that Kλ in (10) represents the expected initial fault content when N0 follows the binomial distribution. B. Software Reliability Function A software reliability function is one of the well-known software reliability assessment measures. Given that the testing or the operation has been going up to ith testing period, the discrete software reliability function is defined as the probability that a software failure does not occur in the time interval (i, i + h](i, h = 0, 1, . . .) [10]. Accordingly, we can formulate the discrete software reliability function R(i, h) as  R(i, h) = Pr {N (i + h) = k | N (i) = k} · Pr{N (i) = k} k

=



{P (i)}k {1 − P (i + h)}−k

k

×

 n  n

k

 {1 − P (i + h)}n · Pr{N0 = n} (12)

III. G ENERALIZED R ELIABILITY A SSESSMENT M EASURES Software reliability assessment measures are well-known as useful metrics, which enable us to assess software reliability quantitatively. In this section, we derive several generalized software reliability assessment measures based on the unified framework of discrete software reliability growth modeling discussed in Section II. A. Expectation and Variance of the Number of Detected Faults Information on the current number of detected faults is one of the important metrics to estimate the degree of testing progress. Therefore, the expectation and variance of the number of detected faults are useful measures, because the number of faults detected up to ith testing period, N (i) in (2), is treated as a random variable. The expectation of the number of detected faults E[N (i)] is derived as n   n z {P (i)}z {1−P (i)}n−z ·Pr {N0 = n} E [N (i)] = z n z=0 = E[N0 ]P (i).

(8)

Moreover, its variance Var[N (i)] is also derived as   Var [N (i)] = E N (i)2 − (E [N (i)])2 = Var[N0 ] {P (i)}2 + E[N0 ]P (i) {1 − P (i)} . (9) Therefore, if N0 follows the binomial distribution in (3), they are given as E [NB (i)] = KλP (i) Var [NB (i)] = KλP (i){1 − λP (i)}

(10) (11)

by using (2). Therefore, if N0 follows the binomial distribution in (3), the discrete software reliability function can be derived as RB (i, h) = [1 − λ {P (i + h) − P (i)}]K

(13)

by using (12). C. Instantaneous and Cumulative Mean Time Between Software Failures (MTBF) Let F (i, h) be the probability that a software failure occurs in the time interval (i, i + h]. Then, we can see that an ordinary MTBF of the generalized SRGM discussed in Section II cannot be derived because F (i, h) of the generalized SRGM has the following properties: F (i, 0) = 1 − R(i, 0) = 0 F (i, ∞) = 1 − R(i, ∞)  =1 − {P (i)}n · Pr{N0 = n}.

(14)

(15)

n

That is, these equations above imply that the probability distribution function F (i, h) does not satisfy the properties of the ordinary probability distribution function. Accordingly, we need to utilize discrete instantaneous and cumulative MTBFs as substitutions for the ordinary MTBF. Using (8), we can formulate the discrete instantaneous MTBF as MTBFI (i) =

1 . E[N (i + 1)] − E[N (i)]

(16)

Moreover, the discrete cumulative MTBF can also given as MTBFC (i) =

i . E[N (i)]

(17)

INOUE AND YAMADA: GENERALIZED DISCRETE SOFTWARE RELIABILITY MODELING

By substituting (10) into (16) and (17), we can obtain specified instantaneous and cumulative MTBFs, respectively. IV. P ARAMETER E STIMATION We discuss parameter estimation for the generalized discrete binomial-process model in (4) based on the method of maximum likelihood. Suppose that we have observed N data pairs (ti , yi ) (i = 0, 1, 2, . . . , N ) with respect to the cumulative number of faults yi detected during a constant time interval (0, ti ](0 < t1 < t2 < · · · < tN ). The likelihood function l for the generalized discrete binomial-process model NB (i) can be derived as l ≡ Pr {NB (t1 ) = y1 , NB (t2 ) = y2 , . . . , NB (tN ) = yN } =

N

Pr {NB (ti ) = yi | NB (ti−1 ) = yi−1 }·Pr{NB (t1 )= y1 }

i=2

(18) by using the Bayes’ formula and the Markov property [16]–[18]. The conditional probability in (18), Pr{NB (ti ) = yi |NB (ti−1 ) = yi−1 }, can be shown as Pr {NB (ti ) = yi | NB (ti−1 ) = yi−1 } =

  K − yi−1 yi − yi−1

× {z(ti−1 , ti )}yi −yi−1 {1 − z(ti−1 , ti )}K−yi

λ{P (ti ) − P (ti−1 )} 1 − λP (ti−1 )

When we apply the discrete Weibull distribution in (5) to the software failure occurrence time distribution, the logarithmic likelihood function can be given as L = log K!−log{(K −yN )!} + yN log λ − +

N 

β

i=1

β

(yi − yi−1 ) log{(1 − p)ti−1 − (1 − p)ti }

i=1

  β + (K − yN ) log 1 − λ{1 − (1 − p)tN }

(23)

by using (22). In the case that the value of the parameter β in (5) is supposed, such as β = 1 or β = 2, we have to estimate the parameters λ and p if we can know the program size K. The simultaneous logarithmic likelihood equations with respect to the parameters λ and p can be derived as ∂L yN = + (K − yN ) ∂λ λ

1

β

· {(1 − p)tN − 1}

β

1 − λ{1 − (1 − p)tN }

=0

(24)

N  yi − yi−1 ∂L = β tβ ∂p i−1 − (1 − p)ti } i=1 {(1 − p) β

β

(19)

(20)

(K − yN ){tβN λ(1 − p)tN −1 } =0 − 1 − λ{1 − (1 − p)tN }

λ=

tβ N

1 − (1 − p)

K −yi−1 {z(ti−1 , ti )}yi −yi−1 {1 − z(ti−1 , ti )}K−yi yi − yi−1 (21)

by using (19), where t0 = 0, y0 = 0, and P (t0 ) = 0. Accordingly, the logarithmic likelihood function can be derived as

yN β

K{1 − (1 − p)tN }

.

(26)

Substituting (26) into (25), we can obtain the following equation: tβN yN (1 − p)tN −1



(25)

respectively. Solving (24) with respect to λ, we can obtain

β

l=

{(yi − yi−1 )!}

i=1

β

we can rewrite (18) as N 

N 

× {tβi (1 − p)ti −1 − tβi−1 (1 − p)ti−1 −1 }

by considering that we can regard ti−1 as the initial time and that the distribution range of NB (i) is 0 ≤ NB (i) ≤ K − yi−1 . In the above equation, setting z(ti−1 , ti ) =

173

N  = (yi −yi−1 )

i=1 β × {ti (1 −

1 tβ i−1

{(1−p)

β

β

− (1−p)ti } β

p)ti −1 − tβi−1 (1 − p)ti−1 −1 }.

(27)

Accordingly, we can obtain the maximum-likelihood estimates

and p of the unknown parameters λ and p, respectively, by λ solving the simultaneous likelihood functions in (26) and (27) numerically. V. M ODEL C OMPARISONS

log l ≡ L = log K! − log{(K − yN )!} −

N 

log{(yi − yi−1 )!}

i=1

+ yN log λ +

N 

(yi − yi−1 ) log{P (ti ) − P (ti−1 )}

i=1

+ (K − yN ) log{1 − λP (tN )} by taking the natural logarithm of (21).

(22)

We compare the performance of our discrete SRGMs in which the software failure occurrence time distributions follow geometric and discrete Rayleigh distributions in (6) and (7), respectively, with existing discrete SRGM, such as a discrete Gompertz curve [19], [20], a discrete logistic curve [20], [21], and geometric error detection rate (EDR) models [10], which is one of the discrete NHPP models, in terms of a mean-square error (mse) [3] by using actual fault-counting data. The value of mse is calculated by dividing the sum of squared vertical distance between the observed and estimated

174

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 2, MARCH 2007

TABLE I RESULTS OF MODEL COMPARISONS BASED ON THE mse

cumulative numbers of faults, yi and y (ti ), detected during the time interval (0, ti ], respectively, by the number of observed data pairs. That is, supposing that N data pairs (ti , yi ) (i = 1, 2, . . . , N, 0 < t1 < t2 < · · · < tN ) are observed, we can formulate the mse as mse =

N 1  [yi − y (ti )]2 . N i=1

(28)

The model having the smallest value of the mse fits best to the observed data set. We arrange the following six data sets to be used in the model comparisons. 1) DS1: (ti , yi )(i = 1, 2, . . . , 22, t22 = 22, y22 = 212), where ti is measured on the basis of weeks and the program size K = 1.630 × 105 LOC [22]. 2) DS2: (ti , yi )(i = 1, 2, . . . , 19, t19 = 19, y19 = 328), where ti is measured on the basis of weeks and the program size K = 1.317 × 106 LOC [23]. 3) DS3: (ti , yi )(i = 1, 2, . . . , 25, t25 = 25, y25 = 136), where ti is measured on the basis of CPU hours and the program size K = 2.170 × 104 LOC [24]. 4) DS4: (ti , yi )(i = 1, 2, . . . , 24, t24 = 24, y24 = 296), where ti is measured on the basis of weeks and the program size K = 1.972 × 105 LOC [22]. 5) DS5: (ti , yi )(i = 1, 2, . . . , 21, t21 = 21, y21 = 46), where ti is measured on the basis of days and the program size K = 4.000 × 104 LOC [23]. 6) DS6: (ti , yi )(i = 1, 2, . . . , 35, t35 = 35, y35 = 1301), where ti is measured on the basis of months and the program size K = 1.240 × 105 LOC [25]. Among the above data sets, DS1, DS2, and DS3 indicate exponential reliability growth curves, and DS4, DS5, and DS6 indicate S-shaped reliability growth curves, respectively. Table I shows the results of model comparisons based on the mse. As to the actual data sets DS1, DS2, and DS3 indicating exponential reliability growth curves, we can say that the geometric EDR model and our discrete SRGM in which the geometric distribution in (6) has been applied to the software failure occurrence time distribution (called a “geometrictype model” in Table I) have the best performance among the discrete SRGM discussed in this section. However, these two discrete SRGM have the same mse values, because these discrete SRGM have the same model structures, essentially, and the binomial distribution in (3) can be regarded as a Poisson distribution as the parameter K → ∞ and λ → 0. On the other hand, we can say that the our discrete SRGM, in

which the discrete Rayleigh distribution in (7) has been applied to the software failure occurrence time distribution (called a “Rayleigh-type model” in Table I), fits better to the S-shaped reliability growth curve data except for DS5. In the model comparisons by using DS5, the discrete logistic curve model has the best performance in terms of the mse. However, we should consider that the discrete logistic and Gompertz curve models have unsuitable properties from the point of view of describing the number of detected faults in an actual software testing. That is, these discrete models, such as the discrete logistic and Gompertz curve models, permit that there are detected (or detectable) faults at (or before) the test beginning [26]. From above results and consideration to the model comparisons based on the mse, we cannot see that our geometrictype model has the significant results for software reliability assessment through these model comparisons based on the mse due to the essentially same model structures between the geometric EDR and our geometric-type models. However, for the S-shaped reliability growth curve data cited in these model comparisons, such as DS4, DS5, and DS6, we can say that our Rayleigh-type model has better performance than the other discrete SRGM used in these model comparisons. VI. O PTIMAL S OFTWARE R ELEASE P ROBLEMS Software-developing managers have a great interest in how to develop a reliable software product economically and when to release the software to the customers [27]. In this section, we discuss discrete cost-optimal software release policies based on our generalized discrete binomial-process model in which the software failure occurrence time distribution follows the geometric distribution. Then, we also discuss discrete optimal software release policies with simultaneous cost and reliability requirements in consideration of software quality-control point of view. A. Cost-Optimal Software Release Policies We discuss cost-optimal software release policies based on our generalized discrete binomial-process model. First of all, the following notations are given the following definitions: 1) c1 : debugging cost per one fault in the testing phase; 2) c2 : debugging cost per one fault in the operational phase, where c1 < c2 ; 3) c3 : testing cost per constant period.

INOUE AND YAMADA: GENERALIZED DISCRETE SOFTWARE RELIABILITY MODELING

Let Z denote the software release period. Then, the expected total software cost C(Z), which indicates the expected total cost during the testing and operational phases, is formulated as C(Z) = c1 E [NB (Z)] + c2 (Kλ − E [NB (Z)]) + c3 Z.

(29)

The cost-optimal software release period is derived by minimizing the expected total software cost C(Z) in (29). From (29), we can derive the following difference equation by taking the forward difference with respect to Z: c3 − W (Z) (30) C(Z + 1) − C(Z) = (c2 − c1 ) c2 − c1 where W (Z) represents the expected number of detected faults during a Zth testing period. Moreover, we need to define the following notation to discuss the discrete software release policies:

[n], (if C([n]) ≤ C ([n] + 1)) n = (31) [n] + 1, (otherwise) where [n] represents the Gaussian symbol for any real number n. We discuss cost-optimal software release polices in the case that the software failure occurrence time distribution follows the geometric distribution in (6). We can confirm that the expected number of detected faults during a Zth testing period in the case that the geometric distribution is applied to the software failure occurrence time distribution WG (Z) has WG (Z) = Kλp(1 − p)Z

(32)

and also has the following properties:  WG (Z + 1) < WG (Z)  WG (0) = Kλp  WG (∞) = 0

(33)

for any nonnegative integer Z(≥ 0), since 0 < p < 1. That is, we can see that WG (Z) is a monotonically decreasing function in terms of the testing period Z(≥ 0). Therefore, we can obtain the cost-optimal software release policies as follows. Cost-Optimal Release Policy: Suppose that c2 > c1 > 0 and c3 > 0. 1) If WG (0) ≤ c3 /(c2 − c1 ), then the cost-optimal software release period is Z ∗ = 0. 2) If WG (0) > c3 /(c2 − c1 ), then we have an only solution Z = X0 minimizing (29), where X0 is given as   log (c2 −cc13)Kλp X0 = . (34) log(1 − p) Thus, the optimal software release period Z ∗ = X0 . The deriving of (34) is given in the Appendix. B. Cost-Reliability-Optimal Software Release Policies Further, we discuss the optimal software release problems which take both total software cost and reliability criteria into

175

consideration simultaneously. In the actual software development, the software-project manager has to spend and control the testing resources under both minimizing the total software cost and satisfying the software reliability requirement rather than only minimizing the cost. Now, let R0 (0 < R0 ≤ 1) be the software reliability objective. Using the discrete software reliability function in (13), we can discuss optimal software release policies, which minimize the total expected software cost in (29) with satisfying the software reliability objective R0 . That is, the cost-reliabilityoptimal software release problem can be formulated as follows: minimize C(Z) subject to RB (Z, h) ≥ R0 , Z ≥ 0

 .

(35)

Supposing h is a constant value, we can see that the discrete software reliability function RB (Z, h) is a monotonically increasing function in terms of the testing period Z when the software failure occurrence time follows the geometric distribution. Accordingly, if RB (0, h) < R0 , then we have only finite solution Z1 satisfying RB (Z − 1, h) < R0 and RB (Z, h) ≥ R0 . Furthermore, if RB (0, h) ≥ R0 , then RB (Z, h) ≥ R0 for any nonnegative integer. In such case, we only have to discuss optimal software release policies based on only the cost criterion. From the above discussion, the cost-reliability-optimal software release policies in the case that the geometric distribution has been applied to the software failure occurrence time distribution can be obtained as follows. Cost-Reliability-Optimal Release Policy: Suppose that c2 > c1 > 0, c3 > 0, 0 < R0 ≤ 1, and h ≥ 0. 1) If W (0) ≤ c3 /(c2 − c1 ) and RB (0, h) ≥ R0 , then the cost-reliability-optimal software release period Z ∗ = 0. 2) If WG (0) ≤ c3 /(c2 − c1 ) and RB (0, h) < R0 , then the cost-reliability-optimal software release period Z ∗ = Z1 . 3) If WG (0) > c3 /(c2 − c1 ) and RB (0, h) ≥ R0 , then the cost-reliability-optimal software release period Z ∗ = X0 . 4) If WG (0) > c3 /(c2 − c1 ) and RB (0, h) < R0 , then the cost-reliability-optimal software release period Z ∗ = max{ X0 , Z1 }. Cost-optimal and cost-reliability-optimal software release policies in the case that the discrete Rayleigh distribution has been applied to the software failure occurrence time distribution can be derived based on, essentially, the same methodologies as the above discussions. However, especially for the S-shaped discrete-time SRGM, it is very difficult to derive these optimal software release policies analytically. VII. N UMERICAL E XAMPLES We show numerical examples for our generalized discrete binomial-process model in (4) by using actual fault-counting data DS1 and DS4, which have been used in Section V. DS1 and DS4 show exponential and S-shaped reliability growth curves, respectively. Therefore, we use DS1 for our discrete SRGM in which the software failure occurrence time distribution follows

176

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 2, MARCH 2007

B (i)] and its 95% confidence limits. (a) Case I: The geometric failure occurrence time distribution Fig. 1. Estimated expected numbers of detected faults E[N (DS1). (b) Case II: The discrete Rayleigh failure occurrence time distribution (DS4).

B (i, 1). (a) Case I: The geometric failure occurrence time distribution (DS1). (b) Case II: The discrete Rayleigh Fig. 2. Estimated software reliability functions R failure occurrence time distribution (DS4). the geometric distribution and DS2 for the case that the distribution follows the discrete Rayleigh distribution, respectively. Fig. 1 depicts the estimated expected numbers of detected

B (i)], and its 95% confidence limits. The 100γ% faults, E[N

B (i)] are derived as confidence limits for E[N   [NB (i)]

[NB (i)] ± Kγ Var E (36) where Kγ indicates the 100(1 + γ)/2 percent point of the standard normal distribution [1]. Fig. 1(a) shows the estimated expected number of detected faults for the case that the geometric distribution in (6) has been applied to the software failure occurrence time distribution. As to Fig. 1(a), the estimates of

= the unknown parameters λ and p have been obtained that λ −2 −1 0.3122 × 10 and p = 0.2420 × 10 , respectively, by using the method of maximum likelihood discussed in Section IV. By using the estimates, the expected initial fault content can be

≈ 509. Moreover, Fig. 1(b) also shows the

·λ estimated as K estimated expected number of detected faults for the case that the software failure occurrence time distribution follows the

discrete Rayleigh distribution in (7). In Fig. 1(b), the estimates

= of the unknown parameters λ and p have been obtained that λ 0.1508 × 10−2 and p = 0.9180 × 10−2 , respectively. Based on these parameter estimates, we can estimate the expected initial fault content to be about 297. Fig. 2 shows the estimated software reliability functions  RB (i, 1) for the cases that the software failure occurrence time follow the geometric and discrete Rayleigh distributions, respectively, by using the parameter estimates. From the esti mated software reliability functions R B (i, 1), we can estimate the software reliability at the 125th testing period, in which the geometric distribution has been applied to the software failure occurrence time distribution, to be about 0.5620 from Fig. 2(a). Moreover, the software reliability at the 25th testing period, in which the discrete Rayleigh distribution has been applied to the software failure occurrence time distribution, to be about 0.7044 from Fig. 2(b). Fig. 3 depicts the estimated instantaneous MTBF B (i). In Fig. 3(a), we can estimate the instantaneous MTBF MTBF at the 125th testing period, in which the geometric

INOUE AND YAMADA: GENERALIZED DISCRETE SOFTWARE RELIABILITY MODELING

177

B (i). (a) Case I: The geometric failure occurrence time distribution (DS1). (b) Case II: The discrete Rayleigh Fig. 3. Estimated instantaneous MTBF MTBF failure occurrence time distribution (DS4).

TABLE II NUMERICAL EXAMPLES OF COST-OPTIMAL SOFTWARE RELEASE POLICIES

distribution has been applied to the software failure occurrence time distribution, to be about 1.7351 (in weeks) or to be about 291 h. Moreover, we also estimate one at the 25th testing period, in which the discrete Rayleigh distribution has been applied, to be about 2.8541 (in weeks) or to be about 479 h from Fig. 3(b). Next, we show numerical examples for the optimal software release problems discussed in Section VI. Table II shows numerical examples for the derived cost-optimal software release policies for the case that the software failure occurrence time follow the geometric distribution. From Table II, we can say that the cost-optimal software release period Z ∗ becomes large as the debugging cost per one fault in the operational phase takes large values from Table II. Accordingly, we can see that there is necessity of conducting the test more as the maintenance cost is set at large values. We discuss the cost-optimal software release policy for the case that c1 = 1, c2 = 32, and c3 = 10 according to the state of Fig. 4. In this

p = 1.2313 × 10 and case, we can calculate that W (0) = K λ

c3 /(c2 − c1 ) = 3.2258 × 10−1 so that we can see WG (0) > c3 /(c2 − c1 ). Therefore, we need to apply the Cost-Optimal Release Policy (2) to derive the cost-optimal software release period and calculate [X0 ] = 148, because X0 = 1.4869 × 102 by using (34). Consequently, we can estimate that the costoptimal software release period Z ∗ = X0 = 149 (in weeks), since C(148) = 2.4092 × 103 > C(149) = 2.4091 × 103 . Moreover, we then show numerical examples for derived cost-reliability-optimal software release policy. For the specific

Fig. 4. Optimum software release policy based on cost criterion for c1 = 1, c2 = 32, and c3 = 10 (DS1).

operational period h = 1, the reliability objective R0 = 0.8, and c1 = 1, c2 = 32, and c3 = 10, the cost-reliability-optimal software release problem can be discussed in the following according to the state of Fig. 5. We can estimate Z1 = 164,

178

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 37, NO. 2, MARCH 2007

Fig. 5. Optimum software release policy based on cost and reliability criteria for c1 = 1, c2 = 32, and c3 = 10.

because R(163, 1) = 0.7967 < R0 and R(164, 1) = 0.8011 > R0 . Since WG (0) = 1.2313 × 10 > c3 /(c2 − c1 ) = 3.2258 × 10−1 and R(0, 1) = 4.4892 × 10−6 < R0 , Z ∗ is estimated as Z ∗ = max{ X0 , Z1 } = max{149, 164} = 164 by using the cost-reliability-optimal release policy (4). In Fig. 5, we can understand an importance that the software project managers should estimate the optimal software release period by considering not only minimizing the total expected software cost but also satisfying the reliability objective. VIII. C ONCLUSION We have discussed a unified framework for discrete software reliability growth modeling. Based on the framework, we have then developed generalized discrete binomial-process model by assuming that the probability distribution of the initial fault content follows the binomial distribution, which enables us to assess software reliability in consideration of the effect of the program size. Especially, we have developed two types of discrete SRGM, such as the geometric-type and Rayleightype models in this paper. After that, we have derived generalized discrete software reliability assessment measures based on the concept of the unified framework. Then, parameter estimation based on the method of maximum likelihood for

our generalized discrete binomial-process model has been discussed. Moreover, we have discussed the performances of our discrete SRGM by goodness-of-fit comparisons using actual fault-counting data. Additionally, as one of the applications of our generalized discrete binomial-process model, we have discussed optimal software release problems with simultaneous cost and reliability objective for the geometric-type SRGM proposed in this paper. Finally, we have shown numerical examples for software reliability assessment based on our generalized discrete binomial-process models and its application to derived optimal software release policies by using actual faultcounting data. Our generalized discrete binomial-process model for software reliability assessment proposed in this paper enables us to obtain a suitable SRGM easily by analyzing the software failure-occurrence time distribution in the actual testing phase and applying its suitable probability distribution function to our generalized discrete model. However, our generalized discrete SRGM following the binomial process is suitable for software reliability assessment for a small or medium size software product theoretically, since the binomial distribution (3) representing the initial number of faults in the software system can be regarded as a Poisson distribution as the parameter K → ∞ and λ → 0. Additionally, in this paper, although we have applied the geometric and Rayleigh distributions as the software failure occurrence time distributions, respectively, we have been planning to develop a feasible software failure occurrence time distribution which enables us to describe the times distribution flexibly in the future. Then, we have to investigate the effectiveness and validity of our model by using suitable actual data sets collected from small or medium size software development projects in the future studies. A PPENDIX T HE D ERIVATION OF (27) If c3 /(c2 − c1 ) < WG (0), then we have only an only solution Z = [X0 ] satisfying

WG (Z) ≥ c3 /(c2 − c1 ) (37) WG (Z + 1) < c3 /(c2 − c1 ) from (33). Then

C(Z − 1) > C(Z) C(Z) < C(Z + 1)

(0 < Z ≤ [X0 ]) ([X0 ] < Z) .

(38)

X0 minimizing (29) can be derived as the solution satisfying the following equation: c3 /(c2 − c1 ) = WG (X) = Kλp(1 − p)X

(39)

from (30) and (32). Solving (39) with respect to X, we obtain X0 as   log (c2 −cc13)Kλp X0 = . (40) log(1 − p)

INOUE AND YAMADA: GENERALIZED DISCRETE SOFTWARE RELIABILITY MODELING

ACKNOWLEDGMENT The authors would like to thank M. Kimura (Hosei University) for the valuable comments. R EFERENCES [1] S. Yamada and S. Osaki, “Software reliability growth modeling: Models and applications,” IEEE Trans. Softw. Eng., vol. SE-11, no. 12, pp. 1431–1437, Dec. 1985. [2] J. D. Musa, D. Iannio, and K. Okumoto, Software Reliability: Measurement, Prediction, Application. New York: McGraw-Hill, 1987. [3] H. Pham, Software Reliability. Singapore: Springer-Verlag, 2000. [4] S. Yamada, “Software reliability models,” in Stochastic Models in Reliability and Maintenance, S. Osaki, Ed. Berlin, Germany: Springer-Verlag, 2002, pp. 253–280. [5] N. Langberg and N. D. Singpurwalla, “A unification of some software reliability models,” SIAM J. Sci. Stat. Comput., vol. 6, no. 3, pp. 781–790, 1985. [6] T. Dohi, T. Matsuoka, and S. Osaki, “An infinite server queueing model for assessment of the software reliability,” Electron. Commun. Jpn., vol. 85, no. 3, pp. 536–544, 2000. [7] J. G. Shanthikumar, “A general software reliability model for performance prediction,” Microelectron. Reliab., vol. 21, no. 5, pp. 671–682, 1981. [8] M. Kimura, S. Yamada, H. Tanaka, and S. Osaki, “Software reliability measurement with prior-information on initial fault content,” Trans. Inf. Process. Soc. Jpn., vol. 34, no. 7, pp. 1601–1609, 1993. [9] A. Fries and A. Sen, “A survey of discrete reliability-growth models,” IEEE Trans. Rel., vol. 45, no. 4, pp. 582–604, Dec. 1996. [10] S. Yamada and S. Osaki, “Discrete software reliability growth models,” Appl. Stoch. Models Data Anal., vol. 1, no. 1, pp. 65–77, 1985. [11] C. Y. Huang, M. R. Lyu, and S. Y. Kuo, “A unified scheme of some nonhomogeneous Poisson process models for software reliability estimation,” IEEE Trans. Softw. Eng., vol. 29, no. 3, pp. 261–269, Mar. 2003. [12] H. Okamura, A. Murayama, and T. Dohi, “EM algorithm for discrete software reliability models: A unified parameter estimation method,” in Proc. 8th IEEE Int. Symp. HASE, 2004, pp. 219–228. [13] T. Nakagawa and S. Osaki, “The discrete Weibull distribution,” IEEE Trans. Rel., vol. R-24, no. 5, pp. 300–301, Dec. 1975. [14] J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data, 2nd ed. Hoboken, NJ: Wiley, 2002. [15] S. Inoue and S. Yamada, “Testing-coverage dependent software reliability growth modeling,” Int. J. Reliab. Qual. Saf. Eng., vol. 11, no. 4, pp. 303–312, 2004. [16] S. Osaki, Applied Stochastic System Modeling. Berlin, Germany: Springer-Verlag, 1992. [17] S. M. Ross, Introduction to Probability Models, 6th ed. San Diego, CA: Academic, 1997. [18] K. S. Trivedi, Probability and Statistics With Reliability, Queueing and Computer Science, 2nd ed. New York: Wiley, 2002. [19] D. Satoh, “A discrete Gompertz equation and a software reliability growth model,” IEICE Trans. Inf. Syst., vol. E83-D, no. 7, pp. 1508–1513, 2000. [20] D. Satoh and S. Yamada, “Discrete equations and software reliability growth models,” in Proc. 12th IEEE ISSRE, 2001, pp. 176–184. [21] ——, “Parameter estimation of discrete logistic curve models for software reliability assessment,” Jpn. J. Ind. Appl. Math., vol. 19, no. 1, pp. 39–54, 2002. [22] T. Fujiwara and S. Yamada, “A new testing-path coverage measure—Testing-domain metrics based on a software reliability growth model,” in Proc. 13th IEEE ISSRE, 2002, pp. 71–75. [23] M. Ohba, “Software reliability analysis models,” IBM J. Res. Develop., vol. 28, no. 4, pp. 428–443, 1984.

179

[24] A. L. Goel, “Software reliability models: Assumptions, limitations, and applicability,” IEEE Trans. Softw. Eng., vol. SE-11, no. 12, pp. 1411–1423, Dec. 1985. [25] W. D. Brooks and R. W. Motley, “Analysis of discrete software reliability models,” Rome Air Develop. Center, New York, Tech. Rep. RADC-TR80-84, 1980. [26] S. Inoue and S. Yamada, “NHPP modeling based on discrete statistical data analysis models for software reliability assessment,” in Proc. Int. Workshop Rel. Appl., 2003, pp. 138–143. [27] S. Yamada and S. Osaki, “Cost-reliability optimal release policies for software systems,” IEEE Trans. Rel., vol. R-34, no. 5, pp. 422–424, May 1985.

Shinji Inoue (M’06) was born in Japan, on April 6, 1978. He received the B.S.E., M.S., and Ph.D. degrees from Tottori University, Tottori, Japan, in 2001, 2003, and 2006, respectively. He is currently an Assistant Professor with the Faculty of Engineering, Tottori University. His research interests include software reliability engineering, quality control, and project management. Dr. Inoue is a regular member of the Institute of Electronics, Information, and Communication Engineers, the Operations Research Society of Japan, the Japanese Society for Quality Control, the Information Processing Society of Japan, and the Society of Project Management.

Shigeru Yamada (M’87) was born in Japan, on July 6, 1952. He received the B.S.E., M.S., and Ph.D. degrees from Hiroshima University, Hiroshima, Japan, in 1975, 1977, and 1985, respectively. From 1977 to 1980, he worked at the Quality Assurance Department of Nippondenso Company, Japan. From 1983 to 1988, he was an Assistant Professor of the Okayama University of Science, Okayama, Japan. From 1988 to 1993, he was an Associate Professor at the Faculty of Engineering, Hiroshima University. Since 1993, he has been working as a Professor with the Faculty of Engineering, Tottori University, Tottori, Japan. He has published numerous technical papers in the areas of software reliability models, project management, reliability engineering, and quality control. He has authored several books entitled: Software Reliability: Theory and Practical Application (Soft Research Center, 1990), Introduction to Software Management Model (Kyoritsu Shuppan, 1993), Software Reliability Models: Fundamentals and Applications (JUSE, 1994), Statistical Quality Control for TQM (Corona Publishing, 1998), and Software Reliability: Model, Tool, Management (The Society of Project Management, 2004). Dr. Yamada is the recipient of the Best Author Award from the Information Processing Society of Japan in 1992, the TELECOM System Technology Award from the Telecommunications Advancement Foundation in 1993, the Best Paper Award from the Reliability Engineering Association of Japan in 1999, the International Leadership Award in Reliability Engineering Research from the ICQRIT/SRECOM in 2003, and the Best Paper Award from the Society of Project Management in 2006. He is a regular member of the Information Processing Society of Japan, the Operations Research Society of Japan, the Japan SIAM, the Reliability Engineering Association of Japan, the Japan Industrial Management Association, the Japanese Society for Quality Control, and the Society of Project Management.