An entropy-based measure of software complexity - IEEE Xplore

3 downloads 383 Views 482KB Size Report
error spans (average number of tokens between ermr occurrences). Index Terms-Information theory, soflware metrics, software project management, software ...
102s

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 18, NO. 11, NOVEMBER 1992

Concise Papers An Entropy-Based Measure of Software Complexity Warren Harrison

Abstract-Many different methods have been suggested for measuring “Software Complexity.” In this paper, it is proposed that the complexity of a program is inversely proportional to the average information content of its operators. A0 empirical probability distribution uf the operators occurring in a program is constructed and the classical entropy calculation applied:

The performance of this metric is assessed in the analysis of two commercial applications totalling well over 130 000 lines of code. The results suggest the new metric does a good job of associating modules with their error spans (average number of tokens between ermr occurrences).

Index Terms-Information theory, soflware metrics, software project management, software quality.

1. INTRODUC~ION

M

11. INFORMATIONTHEORY

A string of symbols drawn from an alphabet of symbols S I , . . . , sq can be considered a message. The field of information theory deals with the measure of the amount of information contained in a message 151. Information, in this context, is finding out something you did not already know. In other words, information is the amount of “surprise” conveyed by each symbol in the message. That is, when a symbol occurs that we were not expecting, the surprise and, therefore, the information, is greater than if a symbol we were expecting occurred. Thus the amount of information conveyed by a single symbol sI in a message is inversely related to its probability of occurring. The probability of symbol s, occurring is p ( s , ) = p , . This has been formalized so that the amount of information I,, in abstract units called bits, conveyed by a single symbol s , , with probability of occurrence, p , is

I, = -log,

Information is additive, that is, the amount of information conveyed by two symbols is the sum of their individual information content. It follows then, that an entire alphabet of symbols: S I . . . . sy would on the average provide

.

ANY different methods have been suggested for measuring

“Software Complexity” since the interest in metrics began in the mid-1970s [6]. In short, a complexity metric attempts to objectively associate a number with a program, based on the degree of presence (or absence) of certain characteristics of the software. It is hypothesized that an overly complex code (i.e., code with a high degree of “bad characteristics”) will be difficult to maintain and is likely to be unreliable due to a disproportionate number of programming errors. One of the ultimate goals of research in this field is to formulate a complexity metric which will allow programmers and their managers to accurately predict the number of errors in a program. If such a metric could be applied at the beginning of the software testing phase, much more effective testing resource allocation could be possible. Additionally, such information could help managers decide when testing is complete, and aid in deciding when a product is ready to ship. If we assume that minimizing errors is an important aspect of programming, complexity metrics could also serve as useful feedback tools to programmers as they write code as well as serve as an important part of code inspections. In this paper, we propose the complexity of a program is inversely proportional to the average information content of its operators. A similar approach has previously been suggested by [ l ] and [2], but to date no empirical evaluation of such a metric has been presented. This paper describes an extensive empirical study in which the information content of programs is related to error frequency with favorable results. From these results, we propose an ordinal measure of software complexity based upon a program’s information content.

Manuscript received October 1, 1991: revised August 1, 1992. Recommended by R. Selby and K. Toni. The author is with PSU Center for Software Quality Research, Portland State University, Portland, OR 97207-0751. IEEE Log Number 9203767.

p,.

H =-

X Plog,~

~t

Z=1

bits of information per symbol. This quantity is called the entropy of the information source, or language entropy. In the remainder of this paper, the term bits will refer to a unit of information as opposed to the definition ordinarily found in the computing literature. It can be shown that the maximum amount of information per symbol is provided by an alphabet whose symbols all occur with an equal probability. The average amount of information conveyed by each symbol in such an alphabet is logzq, for an alphabet having q symbols, each with an equal probability of occurring. The minimum amount of information is conveyed by an alphabet in which one symbol occurs with a probability one, and all others occur with a probability of zero. Such an alphabet has a language entropy of zero. 111. &’PLYING INFORMATION THEORY TO SOFTWARE If we consider the text of a program as a message, it would seem appropriate to apply the concepts of information theory to measure the amount of information contained in a program’s source code. This idea, (or variants), has been proposed a number of times previously [11-141. Our version of such a metric is based on an empirical distribution of operators within a program. As defined in [9] a special symbol, a reserved word, or a function call is considered an operator. We limit our study to operators because they have been found to have certain natural probability distributions [9]. The probability p , of the ith most frequently occurring operator is equal to the percentage of total operator occurrences it contributes, that is,

Pz = f,/N, where NI is the total number of (nonunique) operators used in the program, and fz is the number of occurrences of the ith most frequently occurring operator. Then, the average amount of information

0162-8828/92$03.00 0 1992 IEEE

1026

IEEE TRANSACnONS ON SOFTWARE ENGINEERING,

contributed by each operator in a program is

*=I

We will refer to this as the empirical program entropy. It would seem natural at this point to determine the total information delivered by the program (it., - f, log, p , ) . However, a major point of disagreement by researchers regarding other metrics involve those metrics’ relationship with overall software size. We felt that by omitting an explicit reference to the total number of operators in the expression, we could avoid an unnecessary relationship with size.

Iv.

A SOFTWARE COMPLEXITY METRICBASEDON EMPIRICAL PROGRAMENTROPY

Our basic hypothesis is that a program with a higher average information content should, on the whole, be less complex than another program with a lower average information content. The Average Information Content Class$cation (AICC) measure is computed by establishing the following parameters: XI is the number of total operator uses in the program, and fz, 1 5 i 5 71 is the number of times the ith operator appears in the source text. Then using this information to perform the following computation:

The result of this computation is then rounded to the nearest tenth to establish the module’s AICC Measure. The AICC Measure is an ordinal measure of software complexity. By ordinal, we mean that the metric is intended to order programs in relation to their complexity, but no conclusions can he drawn as to the “distance” between two measures. This implies, for instance, that the operations of addition and subtraction of AICC values are not meaningful. Likewise, we cannot use an arithmetic mean to represent the central tendency of a group of modules’ AICC metric, though we can use a median. Naturally, by considering the metric as an ordinal measure, we also restrict how we can use it. Because it is intended only to rank programs based on their complexity, we cannot infer how much more complex one program is than another, even though the metric tells us that one is more complex. Of course this is still useful information. Such a measure gives us guidance if we wish to identify the most complex 10% of our modules (perhaps we do not have enough resource to apply detailed inspections to every module), or perhaps we wish to compare two different implementations of the same function (perhaps we have the option of using one of two library implementations). At the same time, an ordinal measure is much easier to construct and validate than an interval- or ratio-scaled measure.

v.

ASSESSING THE PERFORMANCE OF THE NEW METRIC

Assessing how closely a new complexity metric is related to software complexity is an important part of the proposal of any new metric. A common approach is to analyze empirical data drawn from a commercial software development project to see if the proposed metric is related to some measure of Software Complexity. To assess the AICC metric, we analyzed data from a large data communications application consisting of well over 100 000 lines of code. Because the data in our study do not necessarily adhere to the three classical assumptions made by most statistical tests: normality, homogeneity, and continuity (for instance, the assumption of continuity is violated since the metrics are placed into discrete categories, based on their AICC Measure), as well as the AICC Metric’s ordinal nature, a nonparametric statistical test is used to assess the relationship

VOL.

18,

NO.

11, NOVEMBER 1992

between the AICC categories and Software Complexity called the Spearman rank-order correlation coeflicient. Nonparametric statistics do not make any assumptions about the population from which the sample was drawn. Spearman’s rank-arder test assesses the relationship between the ranks of two variables, as opposed to their actual values. In other words, it assesses the relationship between the ordering the two variables impose on the data. Therefore, such a test does not assume the variables are continuous, and does not require uniform units of measure. In attempting to relate a given metric with Software Complexity, we must establish a proxy metric for complexity. Such a metric attempts to indirectly measure a variable based on some other variable. The proxy metric we selected was Error Span. Error Span is simply the average number of tokens (Software Science .V) per coding error. In this case, “bigger is better” since larger values for Error Span reflect less frequent error occurrences. It is computed as N ES = Number of Errors’ We felt this was a reasonable measure of Software Complexity since most researchers agree that high levels of Software Complexity can result in an increased number of coding errors in a piece of software. We decided to use the Error Span as opposed to the absolute number of errors since research also tells us that bigger software tends to have more errors. Thus, we hoped to factor the impact of program size out of our analysis (the AICC Metric only has a 0.38 correlation with Non-Commentary Source Lines). We applied the AICC Metric to the 48 components of a 113 000 line commercial data communications application. The system, which was written in C, was divided into 48 logical components, each of which were comprised of an average of 45 C functions. Data on programming (as opposed to requirements, design, etc.) errors was available for each logical component. We compared the ordering of the categories by the AICC Metric with the ordering imposed by average error span for each category. Table I(a) summarizes this information. A Spearman Rank Correlation Coefficient of 0.92 was observed between the AICC Metric and average error span, which is significant at the 0.001 level (only a 0.001 probability exists that the two variables are not related at all). Table I(h) shows the average error spans observed for modules grouped by noncommentary Lines of Code (LOC). Each module was assigned to one of eleven groups, depending on its LOC count. The groups were organized in 500 LOC increments so that the first group included all modules up to 500 LOC, the second group consisted of all modules between 501 and 1000 LOC, the third group included modules of between 1001 and 1500 LOC, etc. A Spearman Rank Correlation Coefficient of -0.62 was observed between the ranks of these groups and their average error spans. Many studies have shown specific metrics to work well in a single situation, however metrics must possess a certain robustness (ie, will work across environments) in order to he useful. To corroborate our results, we also applied the AICC metric to the data from a 28 145 LOC commercial compiler project with 20 logical components. The compiler was developed by a corporation totally unrelated (by industry, geographic location, etc.) to the corporation that developed the Communications software. Nevertheless, the compiler project data exhibited results similar to the Communications System’s data. The information on the average error span for the 11 AICC classes the project’s components were assigned to is shown in Table II(a). A Spearman Rank Correlation Coefficient of 0.73 was observed between the AICC Metric and average error span. While weaker than the 0.92 Correlation observed with the Communications System data, the correlation is still significant at the 0.01 level. A Spearman Rank Correlation Coefficient of 0.43 was observed between the ranks of the

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL 18, NO 11, NOVEMBER 1992

TABLE I SPANS FOR THE COMMUNICATIONS SYSTtM AVERAGE ERROR (a) FORAICC CATEGORIES (b) FORLOC CATEGORIES AICC Class 5.1 5.0 4.9 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.0 3.8 3.1 2.9 2.0 1.9 1.7

Members 1 1 2 3 8 7 7

3 7

2 1 1 1 1 1 1 1

TABLE I1 AVERAGE ERROR SPANSFOR THE COMPILER SYSTEM (a) FOR AICC CATEGORIES (b) FOR LOC CATEGORIES

Avg. Error Span 16 587 9646 685 1412 2278 1051 1119 622 815 532 709 416 435 208 230 179 247

(a) LOC Class

Members 4 8 8 4 4 8 5 3 2 1 1

&500

501-1000 1001-1500 1501-2000 2001-2500 2501-3000 3001-3500 35014000 7001-7500 7501-8000 8501-9000

Avg. Error Span

378 495 814 525 3556 2883 1129 3999 1597 1076 2168

@)

1027

AICC Class 5.0 4.8 4.7 4.6 4.5 4.4 4.2 4.1 4.0 3.9

Members 1 5 1 2 1 3 2

LOC Class &500 501-1000 1001-1500 1501-2000 2001-2500 2501-3000

Members 2 8 2 3

Avg. Error Span 847 645 246 728 413 670 580 94 488 74

1

2 1

Avg. Error Span 643 492 852 389 289 531

1

4 (b)

15

14 13 12 11

10 9 8

7 6

LOC groups and their average error span. Not only is this quite low,

hut the coefficient actually changed signs between the two data sets! To further assess the relationships among the AICC Metric, Error Span, and other metrics, a regression equation was constructed that contained the five most significantly related metrics from the combined data sets. Because the relationship among the variables and error span was found to be nonlinear, the model was “linearized by transforming error span by its natural log (because the transformation does not change the relative rankings of the error spans, no such transformation was necessary in the earlier rank-rder correlations). This effort yielded the following equation with R = 0.71:

+ 0.24236 * AICC + 0.00958 * T ] ~ * .Vz + 0.00004 t A V-~ 0.00087 * LOC

In (Error Span) = 4.51553

+0.00034

where AICC is the AICC metric, 01 is the number of unique operators, .Yz is the number of total operands, .YI is the number of total operators, and LOC is the number of noncommentary lines of code. We can see that the AICC metric is the predominant term in the equation, with a coefficient many times that of any of the others. Fig. 1 illustrates the relationship between observed In( Errorspan) and that predicted by the regression equation. VI. DISCUSSION

From the standpoint of prior research, the fact that the AICC Metric is positively correlated with average error span is surprising. This implies that as the average information content increases, the average span between errors becomes larger (i.e., error density decreases). Previous work using metrics based on information content suggests the opposite. For instance, Davis [3] reports a correlation of 0.45

5 2

3

4

5

6

7

8

9

1

0

LnIError Span)

Fig. 1. Observed Iu(ErrorSpan) versus predicted ln(ErrorSpan) from regression equation.

(significant at the 0,001 level) was found between maximum “chunk” entropy (log n, for n chunks) and errors over 334 COBOL programs. Berlinger [ 11 implies a higher level of information content should yield a more complex program. However, the data that we have examined in the two projects suggest higher average information content per symbol is related to modules with “increased error span, implying less complex software. The key to the distinction is the fact that we had used average information content per symbol, and thus removed the impact of program size. Most other techniques address total information content of the program, and thus are highly affected by program size.

VII. PROPERTIES OF THE AICC MEASURE

In previous sections we discussed the behavior of the AICC metric when applied to two industrial software projects. However, in the evaluation of a proposed measure we should also address the generalized behavior exhibited by the metric. Important properties for complexity metrics have been suggested by Weyuker [7]. The question of whether or not a metric must adhere to these properties in order to be valid has not yet been determined by the research community. For instance, see [8]. However, the properties do provide a useful basis by which to evaluate the behavior of a metric.

1028

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 18, NO. 11, NOVEMBER 1992

These properties and their relationship to the AlCC Metric are paraphrased below. Note that complexity is inversely related to the AICC metric as the metric increase the complexity goes down. This is the opposite of most other metrics where higher values mean greater complexity. Property I : A complexity measure should not rate all programs as equally complex. The data from Tables I and 11 illustrate that the AICC metric adheres to this property. In the two tables, 20 different AICC values (5.1-3.5, 3.1, 2.9, 2.0, 1.9, and 1.7) are observed. Obviously, the AICC metric does not rate all programs as being equally complex. Thus the AICC metric does adhere to Property I . Property 2: There are only finitely many programs of a given complexity, c. For a given number of unique operators, 11, the entropy can range from 0 (1 operator occurs with probability 1, and the rest with a probability of 0) to log, ’II (each of the operators occurs with equal probability). Further, since the AICC metric rounds the program entropy to the nearest tenth, there is a finite number of AICC values for a program with a given number of unique operators. However, based on Weyuker’s assumptions, we can assume that an infinite number of programs can be constructed with rll unique operators. It follows then that there cannot be a finite number of programs of a given AICC complexity Thus as would be expected of a metric which is not sensitive to program size, the AICC metric does not adhere to Property 2. Property 3: There exist two different programs of the same complexity. As for Property I, we need only examine the data in Tables I and I1 to see that many modules have the same complexity. For instance, eight modules exhibit an AICC measure of 4.7in Table I and five programs illustrate an AICC measure of 4.8 in Table 11. Because these modules actually come from an industrial system, it is unlikely that they are the same program. Thus the AICC metric does adhere to Property 3. Property 4: Two different programs which compute the same function need not have the same complexity. A new version I“ of an existing program P that performs the same function as P can be created by adding additional statements to P that have no net effect (e.g, s = s 0). If the operators in P occur with equal probability, then the AICC measure for P is log 2 7 1 ~ .If the operators added to create I” are already in P , and P has other operators bebides these, then the operators are no longer equal, resulting in a smaller AICC measure for P‘ than for P. Thus the AICC metric does adhere to Property 4. Property 5: The complexity of program segments X and I- should each be less than or equal to the complexity of the composition of the two program segments: X : I?. The implication of this property is that as the size of a segment is increased, its complexity should also increase. Assume program segments X and 1. both contain the operators 01.02.03. and 04. However, o I and 02 occurs k times in X , but only 1 time in I?, and 0 3 and 04 occur 1 time in 9, but k times in 1’. Clearly in the composition of .IC and Y ,the resulting program will have k 1 occurrences of each of the four operators. Recall that the maximum average information content (and, therefore, the lowest complexity) is found when the number of symbols occur with equal probability. Thus the complexity of the composition of I and I’ will be less than the complexity of either I or I’ by themselves (note that the AICC metric for X and I’ will be less than the maximum since their operators do not occur with equal probability). Thus since the AICC metric does not reflect program size, it does not adhere to Property 5. Property 6: The resulting complexity of the composition of two program segments X : Z , is not necessarily the same as the complexity that results from the composition of two program segments Y :Z , even though the complexity of X is equal to the complexity of Y .

+

+

TABLE Ill. SUMMARY A N 0 COMPARISON OF THE

BEHAVIOR OF THE AICC METRIC

Property

Statement Count

Cyclomatic Complexity

Data

AlCC

Effort

Flow

Metric

1 2 3 4

YES YES YES YES YES NO NO YES NO

YES NO YES YES YES

YES YES YES YES

YES

YES

NO NO

YES NO YES YES

5

6 7 8 9

YES NO

NO

NO

NO

YES YES NO YES YES YES YES

YES YES NO YES

NO YES NIA

Assume that program segment X contains the operators o l , o a , on, 04, and program segment I’ contains the operators o5,ofi. o,, OS. Further assume that in each segment, the operators occur with equal probability so that both have the maximum AICC value for a 4 operator program: log,4 (i.e., 2.0), and that the number of occurrences of operators (Software Science .Y1) are also the same. If program segment Z also contains operators O I . O ~ . O ~ , Othey ~ , occur with the same frequency as they do in program segment X , and SIfor segment Z is equal to .TI for segment X , then the composition X: Z will yield the identical AlCC value as .IC by itself. On the other hand, the composition I?:Z will now have eight unique operators which occur with equal probabilities so that its AICC value will now be log, 8 (i.e.. 3.0). Thus the AICC metric does adhere to Property 6. Property 7: If the statements within a program are permutated, the complexity of the resulting program is not necessarily equal to the complexity of the original program. Because the AICC metric is not sensitive to the ordering of the operators, it will never be capable of distinguishing between a program and its permutation. Thus the AICC metric does not adhere to Property 7. Properry 8. If we uniformly rename the variables in a program, its complexity remains the same. Because the AICC metric considers only operators, the renaming of the variables has no effect on its measure. Thus the AICC metric does adhere to Property 8. Property 9. The complexity of the composition of two program segments X : Y may be greater than the sum of the complexities of X and I- taken separately. Because the AICC metric is an ordinal measure, it does not make sense to consider the addition of two measurements. Thus Property 9 is not applicable to the AICC metric. The behavior of the AICC metric is summarized, and compared with the behavior of other metrics (as determined in [7]) in Table 111. As can be seen, the behavior of the AICC Metric compares favorably to the behavior of most other metrics for the properties which are unrelated to program size and/or which are applicable to ordinal measures. VlII. CONCLUSIONS In this paper, we have described a new Software Complexity metric which is based on the average information content of each operator in a program’s source code. We have observed that this metric exhibits a good correlation with the error spans of a large collection of industrial software. While additional empirical field work must be done before this metric is adequately validated, the data to date suggest that information content based software measurement techniques show promise. REFERENCES 111 E. Berlinger, “An information theory based complexity measure,” in Proc. 1980 Nat. Computer Conf., pp. 773-779. [2] C. Cook, “Informationtheory metric for assembly language,” in Proc. Third Annual Oregon Workshop on Software Metrics, Mar. 1991.

IEEE TRANSACTIONS ON SOFIWARE ENGINEERING, VOL. 18, NO, 11, NOVEMBER 1992

[3) J. Davis and R. LeBlanc, “A study of the applicability of complexity measures,” IEEE Trans. Sofrware Eng., vol. 14, pp. 13661372, Sept.

1988. [4] M. Halstead, Elements of Sofrware Science. Amsterdam, The Nether-

[SI [6]

[7] [8) [9]

lands: Elsevier Science, 1976. R. Hamming, Coding and Information Theory. Englewood Cliffs, NI: Prentice-Hall, 1980. W. Harrison, K. Magel, R. Kluczney, and A. DeKock, “Software complexity metrics and their application to maintenance,”IEEE Computer, pp. 65-79, Sept. 1982. E. Weyuker, “Evaluating software complexity measures,” IEEE Trans. Sofrware Eng., vol. 14, pp. 1357-1365, Sept. 198% H. Zuse, Sofrware Complexity:Measures and Metrics, Berlin, Germany: Walter de Gruyter, 1991. S. Zweben and M. Halstead, “The frequency distribution of operators in PL/I programs,” IEEE Trans. Sofhvare Eng., vol. SE-5, pp. 91-95, Mar. 1979.

Correspondence Visualization Techniques for Analyzing and Evaluating Software Measures Christof Ebert

Abstract- One-dimensional statistical methods of scaling have been employed to present a distinct subjective criterion that is related to a measurable aspect of a software component. However, different aspects being measured and diffemnt software components being analyzed usually have some characteristics in common. Selected techniques for graphical representation permit a brief but nevertheless, a thorough view of complex relations among complicated sets of data. Several methods of visualizing and analyzing multidimensional data sets are presented and discussed. The underlying goals of such techniques are to find unknown structures and dependencies among measures, to represent different data sets in order to improve communication and comparability of distinct analyses, and to decrease visual complexity. For improved understandability of the statistical and related graphical concepts, a small set of design aspects from a real-world example is introduced. All the techniques illustrated in this article will be applied to the same set of data in order to allow them to be compared. Index Terms-Graphical display, multivariate statistical analysis, software measures, visualization.

1029

Well-designed charts are more effective in creating interest and in appealing to the viewer’s attention than huge textual tables. * Visual relationships are more easily grasped and remembered. * Graphical displays present many numbers in a small space, thus providing a comprehensive picture of a problem. Graphical displays encourage the eye to compare different sets of data simultaneously. Despite all these aspects an obvious disadvantage should be kept in mind: the application of normalization and information reduction might be used to distort unbiased viewers. Excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency [ 2 ] .Tools for visualization, therefore, should do the following: show the data (not more and not less); * avoid distorting what the data have to say, hence being as “unbiased” as possible; * make the viewer think about the data and their underlying relations rather than about the methodology of a specific technique for visualization; * he self-explanatory such that different types of diagrams still can be compared, * reduce the time necessary to incorporate the information inherent in the data sets being visualized. This paper gives an overview about the application of multidimensional statistical and graphical techniques to reduce visual complexity of huge data sets. It needs to be mentioned that this paper is not a description of distinct mathematical methods. It is rather a presentation and evaluation of different techniques for visualizing multidimensional data. The bibliography gives hints for a more detailed look at the mathematical concepts. Most computations are easy to understand. More sophisticated methods are described briefly and can be computed with common statistical packages, such as SPSSTMor SASTM.Section 11 gives some background information about measurement and the appropriate terminology. Section I11 describes the data situation which consists of seven hybrid measures. Sections IV and V apply several multidimensional techniques to this set of measures and compare the graphical outcome. Finally, Section VI evaluates the results and provides suggestions about which techniques to apply in which situations.

-

I I . BACKGROUND

I. INTRODUCTION The perception of measures and harmony is surrounded by a peculiar magic. - C a r l Friedrich Gauss

W

ITH the advent of computers, the attention paid by statisticians to graphical representation of data has increased enormously. Computers provide the capacity of producing immense sets of data that have to be interpreted. While in the past most information gained from data processing had been displayed as tables, current interest focuses on graphical methods that are well suited to human communication. The advantages are numerous [l], [2],however, a few should be mentioned: Manuscript received October 1, 1991; revised August 1, 1992. This work was supported by a grant of the German Science Foundation (DFG). Recommended by R. Selby and K. Torii. The author is with the Institute of Control Engineering and Industrial Automation, University of Stuttgart, W-7000 Stuttgart 80, Germany. IEEE Log Number 9203766.

In the course of our discussions the term object will be used to refer to attributes or entities of software components on which measurements are recorded. Measurements attach numbers to these attributes of software objects in a way that is exactly defined, thus being repeatable and automatable. A metric is a criterion to determine the difference or distance between two entities. Although the terms metric and measure are often mixed and used imprecisely, we will adhere to the correct definitions. Complexity measures try to show relationships between the characteristics of software components or products (including documentation) and the difficulty of performing the development process. Complexity measures thus offer great potential for controlling projects and their costs, if applied and evaluated seriously. However, the uncertainty of how these measures relate to one another and the process under investigation has led to a style of experimentation that is sometimes far from being scientific, as pointed out in [3]. Large numbers of experimental conditions, changed variables, and different projects support the likelihood of

0162-8828/92$03.00 0 1992 IEEE