An Abstract Type for in SIMULA Statistics Collection - Carl Landwehr

2 downloads 178 Views 1MB Size Report
facility for statistics collection that may make the use of SIMULA more attractive. A discussion ... ming, statistics collection, software design, software engineering.
An Abstract Type for Statistics Collection in SIMULA CARL E. LANDWEHR Naval Research Laboratory

Although the use of abstract types has been widely advocated as a specification and implementation technique, their use has often been associated with programming languages that are not widely available, and examples published to date are rarely taken from actual applications. SIMULA is a widely available language that supports the use of abstract types. The purposes of this paper are (1) to demonstrate the application of the concepts of data abstraction to a common problem; (2) to demonstrate the use of data abstraction in a widely available language; and (3) to provide a portable facility for statistics collection that may make the use of SIMULA more attractive. A discussion of the background of and requirements for an abstract type for statistics collectionis presented, followed by a specification for the type using traces. A SIMULA implementation, with examples of its use, is given. Finally, implementation of the abstract type in other languages is discussed. Key Words and Phrases: abstract types, data abstraction, simulation, SIMULA, software, programming, statistics collection, software design, software engineering CR Categories: 4.0, 4.12, 4.20, 4.22, 8.1

1. INTRODUCTION T h e use of abstract data types in the specification and implementation of programs has received m u c h attention in t h e computing literature over the last several years [9-11, 18, 21]. P r o g r a m m i n g languages recently developed or proposed often make a point of including facilities for type abstraction [1, 7, 8, 12, 17, 22]. One of the benefits of facilities of this nature should be t h a t libraries of useful abstract types could be constructed and used in a way corresponding to the way subroutine libraries are used to c o m p u t e c o m m o n l y used functions. C L U [17], for example, includes a library function for just this purpose. Although m a n y of the articles describing language features or specification m e t h o d s using abstract types include specifications for a few abstract types, these are generally in the nature of simple examples, not fully elaborated definitions of types t h a t would be appropriate for inclusion in an application program. In addition, few of these languages are widely available. S I M U L A [2, 4-6] includes m a n y of the features for type specification t h a t are advocated in more recent language designs; several authors [7, 11, 17, 22] cite S I M U L A as a source of ideas. T h e r e are compilers available for S I M U L A on a n u m b e r of widely available machines, including DEC, IBM, Univac, and Control D a t a mainframes. Despite its availability and the presence of some desirable features in the language, S I M U L A has not achieved as widespread use in the United States as have several other languages for p r o g r a m m i n g and simulation Author's address: Code 7522, Naval Research Laboratory, Washington, DC 20375. © 1980 ACM 0164-0925/80/1000-0544 $00.00 ACMTransactionson ProgrammingLanguagesand Systems,Vol.2, No. 4, October1980,Pages544-563.

An Abstract Type for Statistics Collection

545

that have considerably less flexibility. There are many reasons for this state of affairs, not the least of which is the fact that SIMULA, compared with GPSS, GASP, and SIMSCRIPT, provides fewer built-in functions for statistics collection and reporting. Other contributing factors include the lack of any commercial organization promoting the language, the lack of suitable documentation (at least until the publication of [6]), and the (false) perception that the language is useful only for simulation. This paper presents a set of related types defined in SIMULA suitable for the collection and reporting of statistics in simulation programs written in SIMULA. These types represent a practical application of the concepts of data abstraction to a common problem, using a widely available programming language that includes features to facilitate the use of data abstraction. These types have been used in several simulations of queuing models and are equally applicable to statistics collection in simulations generally. Our purposes are (1) to demonstrate the application of concepts of data abstraction to a common problem; (2) to demonstrate the use of data abstraction in a widely available programming language; and (3) to provide a portable facility that may make use of that language more attractive. The types presented share a common abstract specification. Each individual type differs in certain ways from the others, but all may be viewed as family members of a single abstract type. Statements applying to the family as a whole will refer to "the abstract type," while statements applying to the various individual types will refer to "the types." Before the implementations of the types are presented, the considerations that led to the development of these types are discussed, and a specification for the abstract type is provided. Following the description of the implementations, examples of their use are given. A final section discusses problems that occurred in the development and use of these types and notes how additional language features {some of which are included in CLU [17], ALPHARD [22], and ADA [1]) might improve the implementation. Information on using these types in other SIMULA programs appears primarily in Sections 4-6. A general familiarity with SIMULA syntax is assumed. 2. BACKGROUND

The project that led to the design of the abstract type described below was the construction of a simulator for modeling traffic flow over a broadcast channel on a communications satellite under a variety of communication protocols [13-16, 19]. An important design goal was to produce software that would be easy to modify to model alternative protocols and traffic distributions. To this end, it was decided to employ state-of-the-art software engineering techniques, including the use of abstract types. This decision led to the choice of SIMULA as the programming language, since it provides better facilities for user type specification than other available simulation languages (e.g., SIMSCRIPT, GPSS, GASP, and Fortran). Although SIMULA provides a rich structure for the definition of new types, it lacks some of the built-in facilities for statistics collection and reporting that are provided by GPSS and GASP. Palme [20] has suggested an approach that ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

546

Carl E. Landwehr

involves adding auxiliary variables and procedures for recording purposes. As an example, an integer variable is added to the declarations for queues in the program to record the current length of the queue. Each time a new instance of a queue is created, the associated statistics collection variables are created along with it. If the programmer desires to have some queues without statistics collection, two different declarations must be provided in the program--one for queues with statistics and one for queues without statistics. These types must have different names even though they represent essentially similar objects. Finally, Palme's solution does not simphfy reuse of the statistics gathering facility. Each new object type for which information is to be collected represents a new case.

Our approach is to consider the fundamental properties required of an item of statistical information--the information storage required and the operations desired--and to create a set of types suited to these requirements. The general goals are these: (1) Ease of use: Recording statistics for a new item or deleting statistics collection when no longer needed should be simple. (2) Encapsulation of statistical calculations: Code to calculate statistics for different variables should be located in a single place to reduce the possibility of errors and to simplify debugging. (3) Ease of reporting: Useful reports should be easy to obtain. No elaborate format specifications should be required. (4) Reusability: The facilities constructed should be usable in other SIMULA programs with few or no changes. 3. REQUIREMENTS

For many queuing simulations, two kinds of reports are desired: statistical summaries of the values of some variable and history traces or histograms of the values assigned to a variable throughout the run. The statistical summaries are usually limited to observations of the first two moments of the variable, since obtaining the required number of rephcations to generate statistically significant measures of higher moments is often difficult. Traces are most often used in debugging or in checking that the behavior of an indicator variable over simulated time is as expected; histograms may be applied to similar ends or may be used to characterize an observed probability distribution. In addition to supporting the collection of data for such reports, a mechanism is required to support the generation (printing) of the reports. Thus a name and textual description for each variable is needed, and there must be an access program that initiates the generation of a report. The required access functions, then, are the following: (1) Initialization: The name and description of the variable to be recorded must be supplied and the type of statistics desired (e.g., simple statistical summary, summary and histograms, trace) must be defined before collection can begin. (2) Measurement: Each time a significant change occurs in the variable to be recorded, the values of the corresponding statistics must be updated. The ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

An Abstract Type for Statistics Collection



547

author of the simulation program is responsible for deciding at what points in the program these updates should occur. (3) Reporting: Whenever reports are desired, the records must be computed and printed. Our next step in refining these general requirements is to specify in detail the statistics desired in the reports. We chose (1) (2) (3) (4) (5) (6)

number of observations of the variable; sum of all observations of the variable; times of first and last observation; minimum and maximum values observed for the variable; mean and variance of the variable based on equal weight per observation; mean and variance of the variable based on time-weighted observations.

For each instance of a statistics variable, all of these statistics are collected and reported. In many cases only a subset of them will be of interest, but by collecting and reporting the entire set the program spares the user from the burden of defining exactly which ones are desired. Also, in practice, some of the redundancy provided by the statistics has helped in debugging and in the recognition of unexpected phenomena. Each report requires only two lines of output, and the maintenance of these statistics consumes relatively little time or space. Generation of histograms can impose sizable additional storage and computing requirements on the simulator, so the user must request them explicitly. The request is a simple one: the type of histogram, the number of bins, and the bin boundaries must be specified. As in the fifth and sixth items above, the histogram may be generated weighting all observations equally or weighting each observation by the time until the next observation occurs. Traces can be handled with the histogram mechanism as well by using the time of occurrence as the observation and the value of the variable to be recorded as the weighting factor. The final requirement is that the implementation of this abstract type consume a minimum of computing resources. Simulations generally are heavy consumers both o f storage and of processing time, and if statistics collection and reporting consume too many resources, the simulation may have to be restricted in other areas (e.g., fewer runs will be made or less detailed models will be necessary). 4. SPECIFICATION AND VERIFICATION

Initial informal specifications for the statistics module are documented in [16]. This section provides a more formal specification using the terminology of traces developed by Bartussek and Parnas [3]. The review of traces terminology below is abstracted from [3]. A software module is viewed as a collection of access p r o g r a m s (O-functions and V-functions}, where the O-functions change the internal state of the module and the V-functions return values (representing part or all of the module's internal state). For each access program there is an applicability condition. If this condition holds, the program may be called. If the program is called when this condition does not hold, the module may trap, refusing to return through the normal exit. A trace of a module is a description of a sequence of calls to the access programs of the module. Only the externally visible behavior of a module ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

548

Carl E. Landwehr

is recorded by a trace. A given trace is legal if the execution of the corresponding sequence of calls would not cause a trap. If a legal trace terminates with a call to a V-function, the value returned by that V-function is called the value of the trace. A specification is complete if from it the value of any legal trace can be derived. It is consistent if only one such value can be derived. Traces specifications have two parts: syntax and semantics. The syntax part specifies the names of the O- and V-functions, the types of the arguments to each function, and (for V-functions) the type of the value returned or (for O-functions) the type of the object changed. The semantics part contains three categories of assertions about traces: legality assertions, equivalence assertions, and value assertions. Any trace that cannot be shown to satisfy the legality assertions is considered illegal. Two traces are considered equivalent if they have the same legality and if they have the same externally discernible effect on the future behavior of the module. A value assertion concerns the value returned by a Vfunction occurring at the end of a legal trace. Figure 1 displays a specification for the abstract type for statistics collection using traces. As will be evident in the discussion of the implementation, some details have been suppressed in the specification. The two O-functions, NEW and UPDATE, correspond to the first two access functions listed in the requirements section. The third required access function is represented by two V-functions, REPORT and HISTOGRAM, because the generation of a report or histogram need not change the internal state of statistics collection variables. The remaining V-functions provide the statistics listed in the requirements. The type designator {number) is used to indicate a value that may be either real or integer. Type (time) indicates a nonnegative real number. The initialization parameters for a (stat) variable, of type (statinitpars), will include character strings to name and describe the variable and may include the bin boundaries and type of histogram to be collected. Finally, (reportimage) and (histogramimage) represent the text generated by the REPORT and HISTOGRAM operations, respectively. These include text representations for the values generated by the other V-functions. The notation L(T) is used to indicate that trace T is legal. UPDATEN(xi, ti) indicates a sequence of N syntactically correct invocations of the UPDATE operation. Parameters to the ith invocation are (x, ti), and parameters to the N t h invocation are (XN, tN). The legality assertions define the trace NEW.UPDATE (and, therefore, NEW by itself) as legal and provide a rule for deriving additional legal traces. A legal trace must gtart with the initialization operation (NEW) and may be followed by an arbitrary number of UPDATE operations, so long as these are ordered by nondecreasing value of the time parameter. No subsequent initialization operations are allowed. The equivalences section expands the set of legal traces to include arbitrary sequences of Voperations after any NEW or UPDATE operation. The values section defines the value returned by a V-function occurring at the end of some legal traces. The values of the R E P O R T and HISTOGRAM operations are defined informally. The reports generated are two lines of text that record the values of the statistics V-functions converted to character strings. Histograms are generated based on the bin sizes and type of weighting specified in (statinitpars). Note that for N = 0 the values returned by the statistical operations are ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

An Abstract Type for Statistics Collection

549

Syntax

O-functions: V-functions:

UPDATE:

(stat) x (statinitpars) --) (stat) (stat) x (number) x (time) --) (stat)

VAL: NOBS: MIN: TMEAN: EMEAN: TFIRSTOBS: REPORT: HISTOGRAM:

(stat) (stat) (star) (stat) (stat) (stat) (stat) (stat)

NEW:

--* --~ --) --) --) --* --) --*

(number) (integer) (number) (real) (real) (time) (reportimage) (histogramimage)

SUM: MAX: TVAR: EVAR: TLASTOBS:

(stat) (stat) (stat) (st.at) (stat)

--) --) --) --) --)

(number) (number) (real) (real) (time)

Semantics

Legality: (1) For all x~, t~: L(NEW.UPDATE(x~, t~)) (2) For all traces T and & _ t~: L(NEW.T.UPDATE(x~, t~ )) L ( N E W . T . U P D A T E ( x l , tl ).UPDATE(x2, t2 )) Equivalences: (3) For any V-function a, N E W m N E W . a (4) For any trace T and V-function a, T . U P D A T E ( x . ti) -~ T . U P D A T E ( x l , t i ) . a Values: Let T = N E W . U P D A T E N ( x l , ti). T h e n L(T) ~ (5)

V(T.VAL) = XN,

(6)

V(T.NOBS) = N,

(7)

V(T.SUM) = ~ xl,

N>

(all of the following)

0

N_> 0 N> 0

i--I

(8)

V(T.MIN) = rain (xl),

(9)

V(T.MAX) = max (xl),

N > 0

O

O 0

1 N

(12) V(T.EMEAN) = - ~ i~1_ xi, N

N > 0

2

(13) V(T.EVAR) - ~ ' ' ~ x, - (1/N)(~N.a xi) 2 N-1 (14) V ( T . T M E A N ) =

Xi-l(ti

i-2

N > 1

ti-1)

tN -- t~

'

tN > t~

(15) V(T.TVAR) = Y'N'2 x~-x(t, - t,_~) - [~N.2 X,-~(t, -- t,-~)]Z/(tN -- tl) tN - t~

tN > tl

(16) V ( T . R E P O R T ) = N > 0: character string representing the values obtained from V-functions NOBS, SUM, MIN, MAX, T F I R S T O B S , TLASTOBS, E M E A N , and (ff N > 1) E V A R and (if tN > t l ) T M E A N and TVAR. N = 0: character string "No observations recorded" (17) V ( T . H I S T O G R A M ) = If any observations have fallen within the range specified by the (statinitpars) used in the N E W operation, return histogram image according to the type (e.g., weight by event, weight by time) requested in (statinitpars). Otherwise, return string "No observations recorded". Fig. 1.

Specification of abstract type for statistics collection using traces. ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

550

Carl E. Landwehr

undefined; the value of EVAR is undefined for N = 1 as well, and for t N = tl both T M E A N and TVAR are undefined. There are two ways these cases might be handled: traces that include calls to these functions in the undefined cases could be made illegal {i.e., the access programs would trap) or legitimate values could be defined for them. We prefer to leave the specification incomplete in this respect; this point will be addressed in the discussion of the implementation. The specification just given could be verified in two ways. First, it might be demonstrated that the module specified in Figure 1 meets the requirements stated informally in the previous section. Second, the implementation presented below might be verified to correspond to the specification. A key part of the first demonstration would be to show that the mathematical formulas in the Vfunctions are the correct ones for computing the statistics defined in the requirements. Th e corresponding concern in verification of the implementation would be that the code correctly computes the formulas given in the traces specification. These verifications are beyond the scope of this paper. 5. SIMULA IMPLEMENTATION

This section describes the abstract type for statistics collection as it has been implemented in SIMULA. Before describing the details of the implementation, we give a brief example of the use of the facility. To initialize a variable for recording statistics on the number of idle servers in a queuing system, for example, one writes nidleserver : - new STATINT("Nidleserver","Number of idle servers"); This statement both allocates and initializes a variable for collection of integer statistics. Th e name and meaning of the variable to be recorded are given as arguments to S T A T I N T so that they may be used later in the generation of reports. To record a new observation of the number of idle servers, one writes nidlleserver.update (nidle, time); Here we assume that the integer variable "nidle" has as its value the current number of idle servers and that the real variable "time" records the time of the observation. At the end of a simulation, a statistical summary for "nidleserver" is generated by the following statement: nidleserver.report; Th e report generated in this case is a two-line summary of the behavior of the variable during the simulation run, labeled with the name and description provided when nidleserver was initialized. Using any currently defined statistics variable, reports and histograms for all of the statistics variables in the simulation can be generated with a single request such as nidleserver.fullreport; A more detailed description of the usage of the facility is given below, but these four types of statements are the primary ones required of a user. ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

An Abstract Type for Statistics Collection

STATINTHIST

STATINT

STATREAL

SINTOPS ~

S

551

STATREALHIST

SREALOPS T

A

T

~

~

LINK Fig. 2. Relations among types: arrows point to the enlarged type from the type on which it is based. In the example above, nidleserver is an instance of the S I M U L A class S T A T I N T ; S T A T I N T is a type. T h e abstract type for statistics collection in fact includes four separate types: S T A T I N T , S T A T I N T H I S T , S T A T R E A L , and S T A T R E A L H I S T . E a c h of these types is defined in terms of lower level types: S T A T I N T and S T A T I N T H I S T are b o t h enlarged types (they include additional access functions) based on a n o t h e r type n a m e d S I N T O P S . Similarly, S T A T R E A L and S T A T R E A L H I S T are enlarged types based on S R E A L O P S . B o t h S R E A L O P S and S I N T O P S are enlarged types based on a type n a m e d S T A T . S T A T is itself based on a built-in S I M U L A type for linked lists called L I N K . Figure 2 displays these relationships graphically. Table I lists the type names and the access functions implemented for each type. T h e access functions available to users of a given type include those implemented by t h a t type and also those implemented by the type on which t h a t type is based. T h u s the access functions available for objects of type S T A T I N T include R E P O R T and T M E A N as well as U P D A T E . As the table shows, certain access functions are not intended to be called directly by users of the types. This intention could be enforced by using the H I D D E N and P R O T E C T E D features of S I M U L A , but in the case at h a n d (a one-person project) this added protection was not necessary. Table I. Types and access functions Type ( = CLASS name)

Access functions implemented by this CLASS

STATINT STATINTHIST

UPDATE UPDATE, OUTHIST

STATREAL STATREALHIST

UPDATE UPDATE, OUTHIST

SINTOPS

REPORT, EMEAN*, EVAR*

SREALOPS

REPORT, EMEAN*, EVAR*

STAT

TMEAN*, TVAR*

LINK

INTO*, OUT*: PRECEDE*, FOLLOW*

Purpose simple integer statistics integer statistics and histograms simple real statistics real statistics and histograms groups access functions common to integer statistics groups access functions common to real statistics groups access functions common to all statistics access functions for linked list elements

Note. Asterisk (*) denotes access function not intended to be invoked directly by programs using the types for statistics collection. ACMTransactionson ProgrammingLanguagesand Systems,Vol.2, No. 4, October1980.

P******************; !envelope class for all types of statistics; !code below defines storage and operations c o m m o n to all members; !of the abstract type for statistics collection;

t******************; link class stat(vname, vdesc}; value vname, vdesc; !name of variable being observed; text vname: text vdesc; !description of variable observed; virtual: procedure report; !procedure to print results; begin integer real

?n u m b e r of observations; ?time of first observation: !time of last observation; !time integral of observed value; !time integral of square of observed value; ?mean over time; !variance over time; !mean over n u m b e r of observations; ?variance over n u m b e r of observations; ?temporary;

nobs; tfirstobs, tlastobs, valtint, valsqtint, timemean, timevar, eventmean, eventvar, tint;

boolean procedure tmean;

?computes mean of time-weighted observations; ?leaves value in timemean; !if undefined, returns false;

if tlastobs > tfirstobs then begin t i m e m e a n := valtint/(tlastobs - tfirstobs); t m e a n := true; end else t m e a n := false; boolean procedure tvar;

!computes variance of time-weighted observations; ?leaves value in timevar; !if undefined, returns false;

if tlastobs > tfirstobs then begin timevar := (valsqtint - (valtint * (valtint/(tlastobs - tfirstobs)))) /(tlastobs - tfirstobs); tvar := true; end else tvar := false; procedure fullreport;

!routine to print reports for all statistics: ?variables linked into statpool. First, prints: !headings, then all summaries, then all histograms;

begin ref(stat) svar; !tempurary to point to current statistics variable; out p.out text ("Simulation Statistics"); outp.outimage; outp.outimage; !skip two lines: outp.outtext(" Variable # obs minimum time mean"); outp.outtextl" event mean first obs"); outp.uuttext(" sum maximum variance"); outp.uuttext(" variance last obs"): outp.outimage; outp.outimage; ?force out header and skip; svar : - statpool.first; !get the first statistics variable in pool; while svar = / = none do !this loop prints summaries only; !for all statistics variables; begin svar.report: ?generates two-line report; outp.outimage; !skip a line; svar :- svar.suc: !get next statistics variable; end: out p.linesperpage(-1): !suppress page skips for histograms; svar : - statpool.first: !reinitialize to generate histograms; while svar = / = none do begin inspect svar when statinthisl (t~J svar qua statimhist.outhist when statrealhist (h~ svar qua statrealhist.outhist: '.the abo~.e stalenl~,q~t ¢'ht*ck~-thal Ihe currenl svar is ot'a histogram: ',lype and (if so) p r i n t s It, USillt~ the t)peratlOll t)11" thai type: ~4val' . - s'~-al.~l.lu: '.~.+t the IleXl one; end. end of hdlreporI. Iqld I)l :~litI. Fig. 3.

Class S T A T .

!envelol)e class for statistics c u l l e c t i o n - - i n t e g e r variables: '.code below defines operations and sic)rage c o m m o n to variables: '.for collecting statistics on integer values, with or without histograms: ************************************************** star class Mntops: begin integer val, sum, ssq, max, rain:

'.initial value of v a r i a b l e to be logged: ~running sum of values: !sum of squares of values: Inlaxinlunl value observed: !rain\mum value uhserved:

boolean p r o c e d u r e emean:

if nobs > 0

t h e n begin e v e n t m e a n := s u m / h o b s : e m e a n := true: end else e m e a n := false: boolean p r o c e d u r e evar;

if nobs > I

Icomputes m e a n of e v e n t - w e i g h t e d obser',,'at i()ns; '.leaves result ill e v e n t m e a n : !ret urns tMse if value undefined:

!computes variance ol eventw e i g h t e d observations: !leaves result in eventvar; ! ret urns false if' value undefined:

t h e n begin e v e n t v a r := (ssq - (sum * qsum/nubs))}/(nobs - 1 ): e v a r := true: end else e v a r := false:

p r o c e d u r e report: begin integer nspaces, i, j:

!proc to print results of sinmlation:

nspaces := 20 - v n a m e . l e n g t h :

!generation of line I of s u m m a r y starts here: if nspaces < = 0 t h e n o u t p . o u t t e x t ( v n a m e . s u b ( 1 . 2 t ) D else begin: o u t p . u u f t e x t ( v n a m e ) ; outp.out t ext (blanks(nspacesD: end; if nobs = 0 t h e n outp.outtext('" no observations r e c o r d e d " l else begin outp.outinttnobs, 12); outp.outint(min, 12): ! # of observations and m i n i m u m : if t m e a n t h e n o u t p . o u t f i x I t i m e m e a n , 4, 12) !time average: else o u t p . o u t t e x t ( " undefined " k i f e m e a n t b e n o u t p . o u t f i x l e v e n t m e a n , 4, 121 !event average: else o u t p . o u t t e x t ( " undefined " h outp.outfix(tfirstobs, 3, 11 ); end: outp.outimage; !force uut line 1: nspaces := 20 - vdesc.length;

!generation of line 2 of s u m m a r y s t a r t s here: if nspaees < = 0 t h e n outp.outtextIvdesc.sub(I. 20D else begin o u t p . o u t t e x t I v d e s c k out p.out t ext (blanks(nspacesl I: end; if nobs ne 0 t h e n begin outp.uutintlsum. 12k outp.outintImax. 12h !sum and m a x i n m m : i f t v a r t h e n o u t p . o u t f i x I t i m e v a r , 4, 12) !time variance: else o u t p . o u t t e x W ' undefined " k i f e v a r t h e n o u t p . o u t f i x I e v e n t v a r , 4. 12) !event variance: else outp.uuttextC" undefined " k outp.out fixItlastobs, 3, 11 ): end; outp.outimage: !furce out line 2: if vdesc.length > 20 !is there m o r e to p r i n t ? t h e n begin !yes: j := 21): !print rest ol description: for i := 21 s t e p 211 until vdesc.length do begin: if i + j > vdesc.length t h e n j := vdesc.length - i + 1: out p.outt ext Ivdesc.subli, jD; outp.out image: end: end: end: '.init ializat ion nobs :=

sunl

:=

ssq

: = v a l : = O:

r a i n : = I(XI '); outp.outfLx(b(nb - 1), 4, 11); end; outp.outfix(a(i), 4, 12); outp.outchar(' I '); nx :ffi entier(a(i)/nscale); for j :ffi 1 step 1 until nx do outp.outchar('X'); outp.outimage; end; end; outp.outimage; outp.outimage; outp.outimage; end; Fig. 7. Procedure PHIST.

An Abstract T y p e for Statistics Collection



557

T a b l e II. S u b s t i t u t i o n s to generate S R E A L O P S from S I N T O P S , S T A T R E A L from S T A T I N T , a n d S T A T R E A L H I S T from S T A T I N T H I S T Replace integer outint( . . . . 12) sintops statint 1 0

With real outfix( . . . . 4, 12) srealops statreal 1.0 0.0

by STAT. The declaration of procedure REPORT as "virtual" indicates that this operation can be applied to objects of type STAT but the procedure will be specified at the higher levels of classes with the STAT prefix. The reason for this construction is explained below. Procedures TMEAN and TVAR compute the mean and variance for the time-based statistics collection and are only called from within the type. These procedures return a Boolean value indicating whether a valid mean or variance could be computed. The actual mean or variance computed is left in a variable accessible to all of the procedures defined within STAT. There are two types defined with prefix STAT: SINTOPS {Figure 4) and SREALOPS. As the names imply, SINTOPS provides the storage and operations for integer variables for which statistics are desired, and SREALOPS provides the corresponding services for real variables. (SIMULA defines types for real and integer variables much as they are in Algol or Fortran.) These operations cannot be provided directly at the STAT level because, for example, the value of the minimum observation of a real (integer) variable must be stored in a real (integer) variable. Thus, essentially identical sets of operations are provided, with differences only ih the types of the variables. The REPORT procedures for reporting summary statistics are provided at this level for the same reason: the output format requirements are different for integer and for real variables. Because of these similarities, the code for SREALOPS is not shown separately. It can be generated by making the substitutions indicated in Table II for the italicized portions of SINTOPS code. There are two types defined with prefix SREALOPS and two with prefix SINTOPS. The classes STATINTHIST (Figure 5) and STATINT (Figure 6) are prefixed by SINTOPS, and STATREALHIST and STATREAL are prefixed by SREALOPS. {Again, the code for STATREAL and STATREALHIST can be obtained by making the substitutions indicated in Table II for the italicized code in STATINT and STATINTHIST, respectively.) The division in this case is between variables for which histograms are to be collected and reported and those for which only statistical summaries are required. This division is motivated by the final requirement discussed in Section 3: that the implementation of the type require as few resources as possible. Since the storage needed for the data collected to generate a histogram or trace considerably exceeds that required for generating a simple statistical summary, different types are defined for the two ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

558

Fig. 8.

Carl E. Landwehr ref(head) statpool; ref(printfile) outp; ref(outfile) err; integer hevent, htime, htrace;

!list head used for connecting all statistics variables; !output file for reports; !output file for error messages; !global constants for histogram types;

statpool : - new head; outp : - sysout; err : - sysout; hevent := 1; htime := 2; htrace := 3;

!generate list head object; !use system output file for standard output; !same file used for error messages; !arbitrary constant for event-weighted histogram; !constant for time-weighted histogram; !constant for time-trace;

Declarations and initializations required in a scope enclosing CLASS STAT (required once per simulation).

cases. The update operations are provided at this level since the definitions of what statistics to save depends both on the type of the variable being recorded and on whether or not a histogram is to be generated. The built-in SIMULA function HISTO is used to record data for histograms. STATINTHIST and STATREALHIST contain special operations for generating histograms; in both cases, the operation OUTHIST simply calls a global histogram printing routine, PHIST (Figure 7), to do the actual output. This routine was made global to avoid duplicating the code for it within the STATINTHIST and STATREALHIST declarations. The code implementing the abstract type assumes the existence of two output files and one list head at a level of nesting that encloses the class STAT. The files are named OUTP and ERR; the list head is named STATPOOL. STATPOOL is used internally to link all of the instances of statistics variables together. OUTP is the standard device for normal output (such as the reports and histograms), while error messages are written on the device ERR. These may be assigned to the same system device (as in the example, Figure 8) or they may be assigned separately. A new instance of the built-in SIMULA object "head" must be assigned to STATPOOL. Finally, integer constants "hevent," "htime," and "htrace" are required; these are used to distinguish the possible histogram types. Figure 8 displays all of the code the abstract type requires beyond its own routines. With respect to the traces specification, the UPDATE and R E P O R T operations in STAT correspond to the like-named O- and V-functions. The OUTHIST operation in the implementation corresponds to the HISTOGRAM V-function, and the SIMULA object generation operator, NEW, corresponds to the Ofunction NEW. The specified V-functions that return statistical values are implemented indirectly: the R E P O R T function computes these values when they are needed for output based on information saved by the UPDATE operation. If a statistical function is undefined (e.g., N = 0 in the specification), R E P O R T simply prints a message that the value is undefined or that there were no observations. This sidesteps the problem of what the statistical V-functions should return when the requested value is undefined. The V-function VAL is available at the user interface, since the last value recorded by an UPDATE can be obtained from ACM Transactions on Programming Languages and Systems, Vot. 2, No. 4, October 1980.

An Abstract Type for Statistics Collection



559

o u t s i d e o f t h e class S T A T b y s i m p l y r e f e r e n c i n g it u s i n g t h e s t a n d a r d S I M U L A d o t n o t a t i o n (see t h e e x a m p l e b e l o w ) . Restrictions on legal traces concerning the time order of updates are implem e n t e d in e a c h o f t h e p r o v i d e d U P D A T E o p e r a t i o n s : I n s t e a d o f r e f u s i n g to r e t u r n , t h e o p e r a t i o n c a u s e s a n e r r o r m e s s a g e to b e p r i n t e d a n d t h e u p d a t e is i g n o r e d . T h e r e s t r i c t i o n o n m u l t i p l e N E W o p e r a t i o n s w i t h i n a t r a c e is e n f o r c e d in t h e s e n s e t h a t if t h e S I M U L A " n e w " o p e r a t o r is i n v o k e d , a n e w t r a c e b e g i n s a n d t h e p r e v i o u s t r a c e is e f f e c t i v e l y t e r m i n a t e d .

6. USAGE

T o e m p l o y t h e s e t y p e s in a s i m u l a t i o n is s t r a i g h t f o r w a r d . F o r e a c h v a r i a b l e a b o u t w h i c h s t a t i s t i c s a r e to b e c o l l e c t e d , a s t a t i s t i c s c o l l e c t i o n v a r i a b l e m u s t b e d e c l a r e d a n d a l l o c a t e d (see F i g u r e 9a). A t e a c h p l a c e t h e v a l u e o f t h e v a r i a b l e is to b e r e c o r d e d , t h e u p d a t e o p e r a t i o n m u s t b e i n v o k e d ( F i g u r e 9b). A t t h e e n d o f t h e

ref(statint) umsgbacklog;

!declaration for statistics collection variable; nmsgbacklog : - new statint ("nmsgbacklog", "Number of messages not yet transmitted"); !allocation and initialization of same variable; (a)

(code to g e n e r a t e a n e w m e s s a g e a n d q u e u e it)

nmsgbacklog.update(nmsgbacklog.val + 1, time);

!record the incremented value as observed at the current time; (b)

Fig. 9. Declaration, initialization, and use of type for collection of simple summary statistics. (a) Declaration, allocation, and initialization of a STATINT variable. (b) Example of recording an observation. (Note that in this case the statistics variable itself is used to record the value of the backlog.)

ref(statinthist) nmsgbacklog;

!declaration for integer statistics variable with histogram generation;

nmsgbacklog : - n e w statinthist ("umsgbacklog", "Number of messages not yet transmitted", 20, htime, 0, 1); !allocation and initialization for a histogram with 20 bins, using time-weighted observations (htime is a global integer variable with value 2), lower bound of the first bin is 0, and the increment for each bin is 1.; (a) nmsgbacklog.update(nmsgbacklog.val + 1, time);

!record observation; (b)

Fig. 10. Example of declaration, allocation, initialization, and use of a variable to collect both a statistical summary and a time-weighted histogram. (a) Declaration, allocation, and initialization of a variable to collect both statistical summaries and a histogram. (b) Example observation of variable that collects both summary information and a histogram. (See Figure 9b.) ACM Transactionson ProgrammingLanguagesand Systems,Vol. 2, No. 4, October 1980.

560

Carl E. Landwehr

run, the report operation can be used to generate the statistical summary, and, if a STATINTHIST or STATREALHIST variable was created, the OUTHIST operation will print a histogram when it is called. The F U L L R E P O R T operation produces summaries and histograms for all of the statistics variables in the simulation. It generates the same set of reports regardless of the particular statistics variable that is used to invoke it. Occasionally, a user may wish to get a histogram for a variable that was not previously recorded in this fashion. The only changes required to accomplish this are to alter the declaration slightly and to expand the initialization statement to include the histogram-dependent information (bin size, number of bins, lower bound, etc.; see Figure 10). No changes in the update instructions or printing routines are required. Conversely, to save the storage occupied by ~ihistogram no longer of interest, the user merely replaces the histogram declaration and initialization statements with their simpler counterparts. 7. DISCUSSION

The abstract type just presented has been used essentially without change in three versions of one simulator and in three separately developed simulators. It provides a useful facility in its present form. Experience indicates that the separation of histogram generation and simple statistical summaries is valid--if storage becomes tight in the simulation, a good deal can be saved by altering variables for which histograms had been generated to collect statistical summaries only. There are, however, some deficiencies in the SIMULA facilities for type specification that are emphasized by this example. The most noticeable of these is the inability to allow the type of a parameter itself to be a parameter in a class definition {i.e., the inability to define generics). If this were possible, there would be no need to separate the SREALOPS and SINTOPS classes; the same code could be used for both and would only need to appear once. The operations presently implemented in SREALOPS and SINTOPS would be placed in STAT and a parameter would be added to STAT to specify the type {integer or real} of the variable about which statistics would be collected. In fact, the STATREAL and STATINT classes could also be merged in such an environment, since, again, the only difference in the code between the two classes is in the types of the variables referenced. The difference between statistics collection variables that allow reporting of statistical summaries and those that allow histogram generation appears to be the only substantive one. A tree that outlines a revised type structure based on preserving only this distinction is shown in Figure 11. Notice that this revision combines STATINTHIST and STATREALHIST. This combination implies that PHIST no longer need be a globally defined subroutine--it can be included in its natural place as an operation on variables of type STATHIST without requiring two copies of the same code. STAT now includes an argument (vtype) to distinguish whether statistics are to be recorded for a real or an integer variable. This parameterization could allow all of the operations listed to be coded only once and would eliminate the level introduced by SREALOPS and SINTOPS. STATSUM would define a single version of the update operation for variables requiring only statistical summaries, ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

An Abstract Type for Statistics Collection STATSUM UPDATE

561

STATHIST(nbins, htype, lowerbd, inc) UPDATE OUTHIST

/

STAT(vname, vdesc, vtype) REPORT TMEAN* TVAR* EMEAN* EVAR* Fig. 11. Revised type structure: Asterisk (*) denotes access functions not intended to be called directly by user programs; arrows point to the enlarged type from the type on which it is based.

and S T A T H I S T would include storage and operations to record (via update) and print (via outhist) histograms as well. In the revised structure, STAT specifies a set of types, with one member for each possible value of vtype. S T A T S U M and S T A T H I S T are enlarged types based on STAT {instantiated with a given parameter value for vtype). The principal advantage of the revised structure is the elimination of several nearly duplicate sections of source code required by the constraints of SIMULA. This revision would, it appears, be feasible in ADA [1] through use of the overloading and generic capabilities. Both CLU [17] and A L P H A R D [22] include mechanisms intended for implementing parameterized types. The actual feasibility of an implementation in any language will depend on such details as whether or not implicit conversions are allowed, whether type-independent I/O for integers and reals is available, and so forth. Despite the advantages that the newer languages have in the implementation of such a facility, the user interface presented by the statistics collection type in the newer languages would probably not be substantially different from that of the present facility. Nor would the revised version be likely to require less computing time per call (although storage requirements might be slightly decreased). Viewed in this way, and considering the relatively wide availability of SIMULA, the abstract type presented above should be helpful in constructing simulation programs in its present form. 8. SUMMARY

In the preceding sections we have displayed an abstract type for statistics collection in SIMULA. The requirements for the type, a traces specification for it, the programs implementing it, and the use of the type in a SIMULA program have been presented and discussed. The implementation has been described in a consistent nomenclature, and limitations in SIMULA that prevent use of a more desirable implementation strategy have been noted. Despite these limitations, the use of abstract types for software design and implementation in conjunction ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.

562

Carl E. Landwehr

with the mechanisms provided by SIMULA has proved to be a useful technique in program construction. ACKNOWLEDGMENTS

It is a pleasure to acknowledge the assistance several of my colleagues at the Naval Research Laboratory provided in the preparation of this paper. The terminology used above for abstract types is based on an unpublished report by D. Parnas and J. Shore. L. Chmura, D. Weiss, and R. Johnson assisted with the traces specification. J. Shore and D. Parnas provided a thorough review of earlier drafts of the paper, as did D. Weiss, D. Baker, and J. Gannon. The comments of the referees led to numerous improvements in the presentation, including the specification for the abstract type using traces. REFERENCES 1. ADA REFERENCE MANUAL, PRELIMINARY. S I G P L A N Notices (ACM) 14, 6 (June 1979), part A. 2. ARNBORG, S., BJORNER, O., ENDERIN, L., ERGSTROM, E., KARLSSON, R., OHLIN, M., PALMS, J., WENNERSTRON, I., AND WIHLBORG~ C. Deesystem 10 S I M U L A language handbook, Part II, Dec. 1974. Available as NTIS PB-243 065, National Technical Information Service, Springfield, Va. 3. BARTUSSEK,W., AND PARNAS, D.L. Using traces to write abstract specifications for software modules. T R 77-012, Dep. Computer Science, Univ. North Carolina, Chapel Hill, 1977. 4. BIRTWISTLE, G., DAHL, O.-J., MYRHAUG, B., AND NYGAARD, K. S I M U L A begin. Auerbach Publishers, philadelphia, 1973. 5. BIRTWISTLE,G., AND PALMS, d. Decsystem 10 S I M U L A language handbook, Part I. Available as NTIS PB-243 064, National Technical Information Service, Springfield, Va., Sept. 1974. 6. FRANTA,W.R. Aprocess view of simulation. Elsevier-North Holland, New York, 1977. 7. GESCHKE, C.M., MORRIS, J.H. JR., AND SATTSRTHWAITE,E.H. Early experience with Mesa. Commun. ACM20, 8 (Aug. 1977), 540-553. 8. GOOD,D.I., COHEN, R.M., HOCH, C.G., HUNTER, L.W., AND HARE, D.F. Report on the language Gypsy. Certifiable Minicomputer Project, Inst. for Computing Science and Computer Applications, Univ. Texas, Austin, Sept. 1978. 9. GRIES, D., AND GEHANI, N. Some ideas on data types in high-level languages. Commun. A C M 20, 6 (June 1977), 414-420. 10. GUTTAG,J. Abstract data types and the development of data structures. Commun. A C M 20, 6 (June 1977), 396-404. 11. GUTTAG,d.V., HOROWITZ,E., AND MUSSER, D.R. Abstract data types and software validation. Commun. A C M 21, 12 (Dec. 1978), 1048-1064. 12. LAMPSON,B.W., HORNING, J.J., LONDON, R.L., MITCHELL,J.G., AND POPES, G.J. Report on the programming language Euclid. S I G P L A N Notices (ACM) 12, 2 (Feb. 1977), 1-79. 13. LANDWEHR,C.E. Performance studies of the distributed CPODA protocol in the Mobile Access Terminal network. NRL Memo. Rep. 4084, Naval Research Lab., Washington, D.C., Sept. 1979. 14. LANDWEHR, C.E. SIMULA and events. NRL Tech. Memo. 7503-113, Naval Research Lab., Washington, D.C., April 1978. 15. LANDWEHR,C.E. Construction and validation of the Satellite Communications Simulator. NRL Teeh. Memo. 5403-259, Naval Research Lab., Washington, D.C., June 1977. 16. LANDWEHR,C.E. Oil the design of a simulator for satellite communications. NRL Tech. Memo. 5403-85, Naval Research Lab., Washington, D.C., March 1977. 17. LxsKov, S., SNYDER, A., ATKINSON, R., AND SCHIFFSRT, C. Abstraction mechanisms in CLU. Commun. ACM20, 8 (Aug. 1977), 564-576. 18. LISKOV, B., AND ZILLES, S. Programming with abstract data types. S I G P L A N Notices (ACM) 9, 4 (April 1974), 50-59. 19. MELICH, M., LANDWEHR, C.E., AND CREPEAU, P. Alternative satellite channel management strategies. NRL Rep., Naval Research Lab., Washington, D.C., to appear fall 1980. ACM Transactions on Programming Languages and Systems, Voi. 2, No. 4, October 1980.

An Abstract Type for Statistics Collection

563

20. PALME, J. Putting statistics into a SIMULA program. Available as NTIS PB-243 785, National Technical Information Service, Springfield, Va., July 1975. 21. PARNAS, D.L., SHORE, J.E., AND WEISS, D.M. Abstract types def'med as classes of variables. NRL Rep. 7998, Naval Research Lab., Washington, D.C., April 1976. 22. WULF, W.A., LONDON, R.L., AND SHAW, M. An introduction to the construction and verification of Alphard programs. IEEE Trans. Softw. Eng. SE-2, 4 (Dec. 1976), 253-265.

Received July 1979; revised March and July 1980; accepted July 1980

ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980.