Describing Data Control in Programming Languages ... - CiteSeerX

7 downloads 9314 Views 147KB Size Report
degree of acceptance. On the one hand, programming language semantics. has been de ned by informal methods, such as is the case with Pascal's. early deĀ ...
Describing Data Control in Programming Languages Michael Oudshoorn

Chris Marlin

Department of Computer Science The University of Adelaide G.P.O. Box 498, Adelaide South Australia 5001

Abstract

One aspect of the semantics of a programming language concerns access to the data objects of the program (such as variables); language features relating to this aspect are known as data control features. Examples of such features include scope rules and parameter transmission. This paper describes a multi-pass, multi-layered model of the semantics of the data control aspect of programming languages, illustrating the model by using it to de ne the data control aspect of the language Pascal. The model is an information structure model in which the information structures are de ned in a relatively precise manner using algebraic speci cation techniques for abstract data types. The use of abstract data types is also the key to the layering of the description: the outermost layer describes the semantics of the language feature, the middle layer contains de nitions for the manipulation of the information structures used within the model, and the innermost layer contains precise descriptions of these information structures. The fact that the model is layered allows various potential users of the semantic description (programmers, compiler writers, language designers, and so on) to choose a level of detail appropriate to their needs.

1 Introduction

Programming language de nitions come in many di erent forms, with varying degrees of precision. The language syntax is often given in BNF[1], or a variant, and this technique is widely accepted and understood. No technique 1

for the de nition of the programming language semantics enjoys the same degree of acceptance. On the one hand, programming language semantics has been de ned by informal methods, such as is the case with Pascal's early de nition[2, 3], the de nition of Modula-2[4] and that for Oberon[5]; this style of de nition is acceptable to programmers wishing to learn a new language, but not so useful to compiler writers, as was illustrated by the number of early Pascal compilers that were inconsistent with regard to the semantics of a number of language constructs[6]. Informal techniques were improved by describing the language in a more detailed fashion, but this then makes the document dicult for programmers to follow, but more useful for compiler writers. This is seen in the current language de nitions for Pascal[7] and Ada [8]. On the other hand, formal semantic de nitions are also employed. The language standard for Modula-2[9, 10, 11] currently being developed by the British Standards Institute will utilize the Vienna Development Method (VDM)[12]. This will result in a document that precisely describes the semantics of Modula-2, and with its accompanying natural language description will provide a de nition that is usable by the general community. Such an approach may lead to inconsistencies between the formal de nition and the natural language description. Another diculty with this approach is that VDM has not itself been standardized, and was chosen in preference to an axiomatic approach because there was doubt that an axiomatic approach could handle a description of this size. The aim of this paper is to illustrate that an axiomatic approach is not only a viable technique, but when used in the manner described in this paper produces a language de nition which is clear, relatively easy to read and understand, and useful to diverse kinds of readers. Such a language de nition provides a detailed unambiguous statement of the language semantics, required by compiler writers and the like, whilst also providing a readable de nition for language users who have no need to know all the ner details. One aspect of the semantics of a programming language concerns access to the data objects of the program (such as variables); following Pratt[13], we classify features relating to this aspect of programming languages as data control features. Examples of such features include scope rules and parameter transmission mechanisms. This paper describes a multi-layer model of the semantics of the data control aspect of programming languages, as an example of the axiomatic approach mentioned above. The inadequacies of informal de nitions of the data control aspect of Ada is a registered trademark of the United States Government { Ada Joint Program Oce 

program test1; type T = boolean;

X = boolean;

procedure P1(A: T; T: X); begin f P1 g end; f P1 g begin f test1 g end. f test1 g (a)

program test2; type T = boolean;

X = boolean;

procedure P2(T: X; A: T); begin f P2 g end; f P2 g begin f test2 g end. f test2 g

(b) Figure 1. Two erroneous Pascal programs.

programming languages can be demonstrated by the following illustration. As part of an investigation of the correctness of commercial Pascal compilers with respect to data control matters, a collection of twenty small (approximately ten line) programs were written and tested on several di erent compilers (many of which were validated). None of the tested compilers correctly found all erronous programs in the collection and successfully compiled those programs which were correct. As an example, only one compiler in ten detected the error in both of the programs shown in Figure 1 { in both programs, the identi er \T" has con icting meanings at the inner level (and the outer meaning should not be used because of the de nition of \T" at the inner level). The model described in this paper is an information structure model[14], in which the information structures are de ned in a relatively precise manner using algebraic speci cation techniques for abstract data types. The use of abstract data types is also the key to the layering of the description: the outermost layer describes the semantics of the language feature, the middle layer contains de nitions relating to the manipulation of the information structures used within the model, and the innermost layer contains precise descriptions of these information structures. The fact that the model is layered allows various potential users of the semantic description (programmers, compiler writers, language designers, and so on) to choose a level of detail appropriate to their needs. This paper presents the model of data control and illustrates its application by using it to describe the data control features of Pascal[7]. Because the resulting description of the data control aspect of this programming language is uni ed, but still caters for various needs, a number of the pit-

falls of current language description techniques are avoided. Typical of the techniques in current use is the situation with regard to Ada: the language standard[8] is a technical, jargon-ridden document written in natural language and those attempts to capture the semantics in more formal models (such as [15]) can only be regarded as explications of the standard. Although natural language speci cations of programming language semantics have improved somewhat in recent years, ambiguities still arise in such speci cations. On the other hand, formal speci cations of programming language semantics are considerably more precise, but less useful to the largest group of potential users of the speci cation (i.e., programmers). There are various possible solutions to this dilemma between wanting precision in a speci cation of a language semantics and wanting the speci cation to be accessible to a wide variety of users. The attitude of some language designers appears to be that all classes of user should be thoroughly versed in formal descriptive techniques such as denotational semantics and attribute grammars; this approach appears to be somewhat unrealistic, to say the least. Another alternative is to improve the precision of natural language speci cations. The Pascal standard[7] perhaps represents the current extreme of this approach and is quite dicult to read, principally because it is necessary to give precise, technical meanings to a large number of terms which are then used elsewhere in the speci cation. Reading the Pascal standard is a matter of looking up such a meaning every few words { although it is true that the answers to de nitional questions (such as those relating to the programs in Figure 1) are usually to be found somewhere in the standard, it is also true that nding them is not easy. Another solution adopted by some standards committees (and which the Modula-2 standardization process[16, 17] appears to be heading towards) is to endorse several separate descriptions, say one informal, one formal and one implementationrelated, and then designate one (or more) as \the standard". The diculty with this approach is, of course, the question of consistency between the various descriptions, which is impossible to guarantee. By developing a model that consists of several layers, it is possible to have a single description that caters for the vastly di erent needs of various groups. Language designers and compiler writers can nd the precise de nitions that they require, whilst programmers and language bu s can simply read to the level most convenient to them. The model can also be used as the basis of a tool to automatically generate compiler components from a language de nition. Because the language de nition is a simple, integrated speci cation, these compiler components are generated from the same lan-

guage de nition as that used by the programmer; thus, the usual problems of inconsistency between a language's documentation and its implementation are avoided.

2 Embedding semantics within a syntactic description

As already discussed, the syntax of a language is readily and accurately described in some form of BNF. As the latter is widely understood, the model described in this paper mixes semantic de nitions with a syntactic de nition expressed using BNF to provide a comprehensive language de nition. Using the BNF syntax presented in [7], it is possible to embed a description of the e ect of scope rules into a syntactic de niton of Pascal; a fragment of such a mixed syntactic/semantic description is shown in Figure 2. In this gure, the invocation of a semantic routine is shown in bold face, and delimited by the symbols \%%". In the model, a semantic routine performs some manipulations on the relevant information structures. In the case of the semantic routine involved in Figure 2, it is intended to inherit all the known entities from the nearest textually enclosing block that have not been rede ned within the current block. One diculty with the approach used in Figure 2 is that it may not be possible to specify the semantics of language features using this approach if one-pass analysis of the source text is assumed. An illustration of this diculty in the context of Pascal is given in Figure 3. The reference to \P2" at line 6 (in the statement marked \f#g") should be regarded as invalid because the meaning of \P2" at the level of procedure \P"'s block is supplied by the declaration at line 8; the reference to \P2" at line 6 is invalid because the declaration occurs after the rst use. The correct semantics of Pascal's scope rules can only be captured by changing the description so that two passes are identi ed. The modi ed syntactic/semantic description is given in Figure 4, which shows the changes which must be made to the description of \" given earlier in Figure 2. The two semantic routines in Figure 4 have di erent purposes. The semantic routine \Scope-Rules-Pass-1" is used to inherit all the known names at that point from the nearest textually enclosing block. It simply takes those names that were declared local to the parent block and inherits them as nonlocal entities, providing that the name has not been rede ned within the current block. Hence, in the rst pass, names are only inherited from the nearest textually enclosing block. In the example in Figure 3, this would mean that at line 5, the name \P2" would not be inherited as it is not yet

= \;" = f( j ) \;"g = \;" j \;" j \:" = \;" j \;" j \:" = = = = %% scope-rules %% Figure 2. Enhancing syntax with semantics

1 2 3 4 5 6 7 8 9 10 11 12 13 14

program EXAMPLE; var P2: integer; procedure P; procedure P1; begin fP1g

P2 := 2 f#g end; fP1g procedure P2; begin fP2g end; fP2g begin fPg end; fPg begin fEXAMPLEg end. fEXAMPLEg Figure 3. A Pascal example.

= %% Pass 2: Scope-Rules-Pass-2 %% %% Pass 1: Scope-Rules-Pass-1 %% Figure 4. Improved semantic description

declared in the nearest textually enclosing block (\P") and names are not inherited from other blocks in the rst pass. However, by the time that line 11 is reached in the rst pass, procedure \P2" will be known to be local to procedure \P". The second pass makes use of information structures built during the rst pass and propagates nonlocal declarations into inner blocks. Consequently, it is the responsibility of the semantic routine \Scope-RulesPass-2" to inherit into each block all accessible names that were not declared local to the nearest textually enclosing block. There are many ways in which semantic issues such as scope rules can be handled correctly; using a multi-pass technique, such as that described above, is just one. Its use here highlights the advantages of a multi-pass model as it makes descriptions easier to write and illustrates the semantics in a clear fashion, so that all users and implementors of the language are aware of the complexities and the nature of that aspect of the language semantics.

3 ADT de nitions { the formal foundation

The rst (innermost) layer in the model consists of abstract data type (ADT) speci cations. This layer provides the necessary precise foundations for the model. The ADT's are speci ed using an algebraic technique similar to that adopted by Guttag et al.[18, 19], but making use of the initial algebra approach advocated by the ADJ group [20, 21]. As our use of ADT's is not new, the reader is refered to the appendix for an example. Proofs of consistency and sucient-completeness[20, 22, 18, 23] have been constructed for the ADT speci cations used and so these speci cations can be regarded as a solid foundation on which to build the remainder of the model. To avoid unnecessary complexity in the ADT speci cations, error conditions have been handled in a simple manner in this paper. A more precise and correct method of handling errors can be easily incorporated into the ADT speci cations; the interested reader is referred to [24, 25] for examples of how this can be done. This paper will only consider the case of sequential languages, although the model can be expanded to cater for concurrent languages by employing shared data abstractions (SDA's) as well as ADT's. SDA's[26, 27] have been used to describe the communication aspects of Ada in [28].

4 The information structures and operations on them

As mentioned earlier, the model presented in this paper is an information structure model. In such models, the semantics of language features are speci ed in terms of transformations on information structures representing (aspects of) the state of a program in the language. The transformations describing the semantics of data control features occur within the third and nal layer of our model. The role of the second layer is to specify the nature of the information structure and to provide some high-level primitives in terms of which transformations can be formulated; both the information structure and the high-level primitives are de ned using the ADT's speci ed in the rst layer. Before discussing the information structure required for the description of data control in Pascal, there is another important data structure which must be introduced. This data structure, known as the static environment, represents the static aspects of a Pascal program, recording the names of identi ers and associated attributes for each block; this data structure corresponds to what is normally called the symbol table in a compiler. Even though Pascal does not allow the user to overload identi ers, the language itself does precisely this in the case of functions. For each function de ned in a Pascal program, there is a corresponding function pseudo-variable with the same name. As a result of this inconsistency, it is necessary to represent the symbol table as a table, indexed by identi er name and where each element is a secondary table. This secondary table is indexed by the kind of the object (e.g., variable, function, function pseudo-variable, etc.) and stores the attributes associated with the object that has that name and that kind. This complicates the discussion of Pascal considerably, but highlights a conceptual diculty encountered by someone learning and using the language Pascal. The table representing the symbol table is itself a member of a structure called symbol table info; there is one of these data structures for each block in the program and, apart from the symbol table, it also records the name of the block and other information. As explained in the previous section, it is necessary for the model presented here to operate in a multi-pass fashion in order to be able to correctly capture the semantics of data control in Pascal. Thus, it is necessary to build a symbol table info object as each block is encountered and store the results somehow, so that the information is available in subsequent passes. As illustrated in Figure 5, the symbol table info objects are linked together into a tree structure representing the textual

program A; procedure B; procedure C; begin fCg A end; fCg 6 procedure D; begin fDg ? end; fDg B  - E begin fBg end; fBg 6 procedure E; ? begin fEg C  - D end; fEg begin fAg end. fAg Figure 5. A Pascal program and its static environment. structure of the Pascal program. Because of the nature of scoping in Pascal (which is oriented towards one-pass compilation), the tree representation used emphasises the ordering of siblings at a particular block level. It is this tree structure which is known as the static environment, and each node is said to be of type static information. The information structure used in the description of the semantics of data control in Pascal represents the data control aspects of an executing Pascal program and is called the dynamic environment. It is much simpler than the data structure used in the static environment, as it consists simply of a list of instances. Each instance records information necessary to identify the block to which it corresponds and a symbol table object. When an instance for any block is created as a result of the semantics of a language feature, the contents of the symbol table are initially an exact copy of the contents of the symbol table information stored in the static environment for that block. As other language features are encountered, the contents of the symbol table in the instance may change. In Pascal, some of the names accessible to a block instance do not correspond to objects introduced by that instance, but rather they stand for objects belonging to other instances. An example of this is the way in which formal variable parameters stand for the actual parameter variables with

which they are associated; another example occurs when access to an object is inherited via the scope rules. Such names are described by non-de ning objects in the model. On the other hand, names standing for objects introduced by the particular instance concerned are described in the model by de ning objects. All non-de ning objects are linked to some de ning object throughout the former's lifetime. Some of these links are established in the static environment, as it is known then which objects they stand for; this is the case for objects that are inherited into a block via scope rules. For other non-de ning objects, such as those for variable parameters, the links must be established in the dynamic environment. Various types of links are used, depending on the kind of access implied by the relationship between the non-de ning object and the de ning object. Two access rights to objects in a Pascal program can be distinguished: read (R) and write (W). The four subsets of these rights, namely RW = fR,Wg RO = fRg WO = fWg NA = f g are all useful in describing the kind of access which applies to a particular object in a particular block, and in describing the kind of access permitted by a link. Figure 6(b) shows the dynamic environment during the execution of the program in Figure 6(a), at the point corresponding to the line marked \f#g". Each of the instances contains a symbol table listing all entities known to each block (prede ned names being omitted from the gure for clarity). Symbol table entries labelled with \" represent de ning objects. Note that links emanate from all non-de ning entries and lead to a de ning entry in each case. The links depicted in Figure 6(b) show that, in this case, RW access is propagated for variables and RO access is propagated for procedures. As mentioned earlier, the second layer of the data control model includes some high-level operations (HLO's), which are used in the speci cation of the transformations in the third layer. The HLO's hide much of the detailed manipulation of the ADT's introduced in the rst layer from the view of a reader of the third layer; the result is much shorter semantic descriptions of the data control aspect of Pascal than would otherwise have been attained, but without any loss of precision overall. By providing some natural language narrative with each HLO, the majority of users of the speci cation will not need to examine the ADT speci cations in detail. The formal model should constitute the de nition of the programming language under consideration, and the natural language commentary should be regarded as an aid, nothing more.

program A; var

i, j: integer;

procedure B; var i: integer;

procedure C; var

k: integer; begin fCg f#g end; fCg begin fBg C; end; fBg begin fAg end. fAg

A

--  ji --  B B

- i - C j B

C

k

C i j B - RW link - RO link (a) (b) Figure 6. The dynamic environment for a Pascal program.

member of symbol table(a: static information; s: string) =

member of table(current block(return info(a)), s); Figure 7. The high-level operation member of symbol table The routines presented in the second and third layers of the model make use of several control constructs, such as selection and repetitive constructs. Each of these constructs should be de ned formally in order to retain the degree of formality so far attained. The if-then-else construct can be de ned axiomatically[29]; this then allows a formal de nition of the repeat-until, while-do and for constructs in terms of recursion and the if-then-else construct. In order to use a for loop construct e ectively over arbitrary ADT's, it is necessary to introduce an operation to return the size of the data structure, and a mechanism whereby each element within the ADT can be accessed by its position within the data structure. This can be clearly seen in the examples presented in Section 5. A data control model for a realistic language, such as Pascal, uses a great many higher level operations. As a result, only a brief indication of their usefulness can be presented in this paper. Section 5 uses the following HLO's in several places and consequently they have been selected as illustrations. The rst of the HLO's we will consider is member of symbol table, de ned in Figure 7. This HLO takes two parameters, the rst (\a") is an object of type static information, introduced in Section 4, and the second parameter (\s") is a string representing the identi er about which the enquiry is made. The HLO returns a boolean result, re ecting whether the string corresponds to one of the entries in the symbol table. From Figure 7, it can be seen that several ADT operations are used in the de niton of the HLO. The operation \return info" is applied to the parameter \a" to yield an object of type symbol table info. The operation \current block" is then applied to this object, giving a table object, namely the symbol table. Application of the operation \member of table", which takes a table and a string (the key in this case) and returns a boolean value, completes the HLO. Even from this simple example, it can be seen that the introduction of HLO's into the description can produce a more readable document than would otherwise be obtained. Another HLO used in the model is add new info, given in Figure 8. This HLO is used to add new information into the symbol table. It takes four

add new info(t: static information; s: string; k: kind; a: attribute type) = if not(member of symbol table(t, s)) then

de ne info(t, de ne current block(return info(t), insert into table(current block(return info(t)), s, insert into table(new block, k, a))))

else

de ne info(t, de ne current block(return info(t), alter table(current block(return info(t)), s, insert into table(associated attributes( current block(return info(t)), s), k, a))))

end if

Figure 8. The high-level operation add new info.

parameters: \t" of type static information, \s" of type string (representing the identi er to be inserted), \k" representing the kind of object to be inserted, and \a" for the attributes associated with an object of this name and kind. Since Pascal overloads function names in the manner described earlier, our model is more complex than it would be for a language such as Modula-2 which uses a return statement. Consequently, the HLO must rst ascertain if the identi er \s" is already present in the symbol table. If it is not, then it is inserted with relative ease; however, if it is already known to the symbol table, then the secondary table associated with the identi er in the symbol table must be altered to take the additional information; this is the case for overloaded identi ers.

5 The semantic descriptions

The third and nal layer of the data control model describes the semantics of the relevant features of the language concerned (Pascal, in this case). The data control aspect of Pascal covers local declarations, scope rules, value parameters, variable parameters, and procedure and function names passed as parameters.

5.1 Local declarations

During the rst pass of the description of Pascal, the analysis of declarations causes identi ers to be stored in a list until sucient information is known

about them to insert them into the symbol table associated with the current block. For example, by the time the semicolon is reached in \var i, j, k: integer;", it is known that the identi ers being declared are \i", \j" and \k", they are all variables (their kind) and are of type integer. It simply remains to insert the information into the symbol table. Inserting information into the symbol table means that we must know all of the attributes associated with that identi er, or at least as much as is possible at that point. In the case of forward declarations of procedures and functions, the information is recorded in the symbol table and the attribute \forward" is set. Later when the declaration is completed, the attributes associated with the identi er can be updated. This approach also allows error handling facilities to examine each symbol table at the end of the declarations within the current block and ensure there are no outstanding forward declarations. As the object is declared, its name, kind and type are easily obtained. All names declared as a local declaration are of course \local" to the block in which there are declared. However, depending on the kind of object, it may be either a de ning or a non-de ning entry. In fact, all objects except variable parameters, and functional and procedural parameters are declared as de ning enties. These forms of parameters are exceptions because the name is not associated with a storage location or the point of de ntion of the relative object; they simply are another name of an object declared elsewhere. Objects also have di erent access rights de ned for them, again related to their kind. Functions, procedures, and functional and procedural parameters are de ned as having RO access only, whilst variables are de ned as having RW access as their values may be read as well as altered within the block in which they are declared. The semantic routine shown in Figure 9 handles local declarations in Pascal. It is parameterized with respect to the name of the identi er being declared, its type, kind, de nition, access rights and a ag to indicate if the object is being declared forward or not. The routine begins by checking if the identi er, \s", is known to the symbol table for the current block. If not, then an attribute record is de ned that records all the necessary information; namely, the identi ers name, type, kind, access rights, whether it was declared local or nonlocal, whether it is a de ning or non-de ning entry, a link eld indicating what other entity it may be linked to if any, and nally a

ag indicating if it is a forward declaration. The current block, \this block", can then be updated. If the identi er was already present in the symbol table, then it may be a forward declaration that is being completed. This is checked by retrieving the attributes associated with the identi er. Since

Local-Declaration(s: string; t: type; k: kind; d: de nition; a: access rights; forward: boolean) =

-- Check to see if this identi er is already known in this block. if not(member of symbol table(this block, s))

then

-- As it has not already been declared, it may be added -- to the symbol table for this block. attributes de ne attributes(s, t, k, a, local, d, new link, forward); -- update the static data structure. add new info(this block, s, k, attributes);

else

tab info get info via ident from sym tab(this block, s); if member of info tab(tab info, k)

then if and(declared forward(get attributes(tab info, k)), not(forward))

then

attributes

de ne attributes(s, t, k, a, local, d, new link, forward); if match attributes(attributes, get attributes(tab info, k))

then

-- update the static data structure alter info(this block, s, k, attributes);

else

Error(\Attributes do not match.");

end if; else

Error(\Error in forward declaration.");

end if; else

Error(\Identi er prev. declared of a di erent kind.");

end if; end if;

Figure 9. Local declarations

this situation is only feasible for procedures and functions, and overloading is generally not allowed in Pascal, then the table of information associated with the identi er must contain only a single entry. If it was an identi er declared forward earlier, then the associated attributes are updated, otherwise an error message may be issued.

5.2 Scope rules

Perhaps the most obvious data control aspect in any language such as Pascal is that of scope rules. As explained earlier, a two-pass model is required to adequately describe the semantics of Pascal's particular form of implicit scope rules. Consequently, the de nition of the scope rules of Pascal is divided into the two stages depicted in Figures 10 and 11. The rst pass inherits all entities that were declared local to the parent block and the second pass inherits all the nonlocal entities that have been inherited into the parent block. The transformation Scope-Rules-Pass-1 (Figure 10) begins by locating the parent block in the static environment. If it has a non-empty symbol table, then each identi er in it is considered in turn. If the identi er currently being considered is not a member of the symbol table for the current block, then it may be inherited into the current block's symbol table after it has been modi ed to represent a nonlocal, non-de ning entity as far as the current block is concerned. The link eld is also modi ed so that it points to the object that is being inherited from the parent block. If the identi er under consideration is already known to the current block (\`this block") then it may only be inherited if this identi er represents a function pseudo-variable in the current block, whilst representing a de ning entry for a function in the parent block. This is only possible if both the tables associated with the identi er in parent and this block contain only a single entry. If this is so, then the attributes are modi ed and the link established. Otherwise, no action is taken; the identi er is simply not inherited. In Scope-Rules-Pass-2, shown in Figure 11, the parent block is located and each identi er in its symbol table is considered in turn. If the identi er is unknown to the current block and is a nonlocal entity in the parent block, then it is inherited. In this case, the attributes do not need to be altered. If the identi er is known to this block as a function pseudo-variable inherited as a result of Scope-Rules-Pass-1, and the identi er represents a nonlocal function in the parent block, then it can also be inherited.

Scope-Rules-Pass-1 =

-- Find the nearest textually enclosing block; this is called -- the parent block. locate parent; if not(empty symbol table?(parent))

then for i in 1 .. size of symbol table(parent) loop

-- Iterate through the symbol table of the parent block. identi er get ident from sym tab(parent, i); if not(member of symbol table(this block, identi er))

then

-- If the identi er is unknown to this block, then it -- may be inherited. tab info get info from sym tab(parent, i); for j in 1 .. size of info table(tab info)

loop

-- Set up the attributes, etc., properly. attributes get attributes from info table(tab info, j); kind get kind from info table(tab info, j); link link to(parent, identi er, kind, RW); attributes alter attributes(attributes, link, nonlocal, non-de ning); -- Modify this block to incl. newly inherited object. add new info(this block, identi er, kind, attributes); end loop;

else

-- The identi er is known to this block; this is -- acceptable only under certain conditions. tab info get info via ident from sym tab(parent, identi er); tab info 2 get info via ident from sym tab(this block, identi er); if and(equal(size of info table(tab info), 1), equal(size of info table(tab info 2), 1))

then if and(and(equal(get kind from info table(tab info, 1),

function), de ning entry?(get attributes from info table( tab info, 1))), and(equal(get kind from info table(tab info 2, 1), function pseudo variable), de ning entry?(get attributes from info table( tab info 2, 1))))

then

-- Prepare the information to be inherited. attributes get attributes from info table( tab info, 1); Figure 10. Scope rules { the rst pass.

kind get kind from info table(tab info, 1); link link to(parent, identi er, kind, RO); attributes alter attributes(attributes, link, nonlocal, non-de ning); -- Modify this block to incl. newly inherited object. add new info(this block, identi er, kind, attributes); end if; end if; end if; end loop; end if; Scope rules { the rst pass (continued).

5.3 Parameters

Pascal supports three distinct kinds of parameter transmission. These are variable, value, and functional and procedural parameters. The handling of each of these is suciently similar that they can be discussed together. The reader is referred to Figures 12, 13 and 14 for the de nitions of the relevant semantic routines; these de nitions highlight the essential di erences between the transmission modes. Each routine takes three parameters { an activation record in which the formal parameters reside, the attributes of the formal parameter being considered at present and the matching actual parameter. The principal di erence between the handling of parameters and the semantics of scope rules and local declarations is that parameters deal with the dynamic or runtime environment, as the formal parameter (which was handled initially by the routine for local declarations) can be linked to several possibly di erent actual parameters over the lifetime of the program. Parameters are handled by rst identifying the transmission mode of the actual parameter. This determines the course of action. If it is appropriate, the calling activation record is then located, as this contains the actual parameter. The actual parameter is then retrieved and checked to ensure it is of the appropriate kind. If it represents a de ning entry and the actual parameter is a RW object in the parent activation record, then the formal parameter is linked to the actual parameter. If the actual parameter in the parent activation record is a non-de ning entry, then the link leaving this is followed and the formal parameter is linked to the resulting de ning entry.

Scope-Rules-Pass-2 =

-- Find the nearest textually enclosing block; this is called -- the parent block. locate parent; if not(empty symbol table?(parent))

then for i in 1 .. size of symbol table(parent) loop

-- Iterate through the symbol table of the parent block. identi er get ident from sym tab(parent, i); tab info get info from sym tab(parent, i); if not(member of symbol table(this block, identi er))

then

-- If the identi er is unknown to this block then it -- may be inherited. for j in 1 .. size of info table(tab info)

loop

attributes get attributes from info table(tab info, j); if not(locally declared(attributes))

then

-- Modify this block to incl. newly inherited object. add new info(this block, identi er, get kind from info table(tab info, j), attributes); end if; end loop;

else

-- The identi er was known to this block; this is -- acceptable only under certain conditions. tab info 2 get info via ident from sym tab(this block, identi er); if equal(size of info table(tab info 2), 1)

then if and(equal(get kind from info table(tab info 2, 1),

function pseudo variable), not(locally declared?( get attributes from info table(tab info 2, 1))))

then if and(member of info table(tab info, function),

not(locally declared?( get attributes from info table(tab info, 1))))

then

-- Modify this block to incl. newly inherited object. Figure 11. Scope rules { the second pass.

add new info(this block, identi er, get kind from info table(tab info, 1), get attributes from info table(tab info, 1)) end if; end if; end if; end if; end loop; end if; Scope rules { the second pass. (continued)

6 Conclusions

As indicated in Section 1, many approaches to the de nition of programming languages have been tried, with varying degrees of precision. The syntax of a language is often given in extended BNF[30], or a variant, and this technique is widely accepted and understood. No one technique for the de nition of the programming language semantics enjoys the same degree of acceptance. The result is that there are many alternative techniques available and no one is familiar with all of them. The most common way of de ning the semantics of a programming language is that of a natural language description. As is well known, this approach su ers from many problems { ambiguities, omitted details, poorly de ned areas and the like. In order to reduce the risk and severity of these diculties, natural language descriptions have become more precise (at the cost of readability), but even this has not solved the problem completely. It took many man-years of e ort to produce relatively precise natural language descriptions for the programming languages Ada and Pascal { but neither is a formal de nition. The operational semantic model produced by the multi-layer technique described above is simpler to understand and read than, say, the semantic description of Pascal in terms of attribute grammars given by Kastens et al.[31]. An operational model is also capable of describing both the static and dynamic semantics of a language, whilst attribute grammars tend not to perform so well in a description of the dynamic semantics. At present, BSI are attempting to de ne the semantics of Modula-2 with VDM[12]; however, VDM is itself currently being standardized and will su er the problems typical of any programming language standard. Axioms used in an axiomatic

Variable-Parameters(act rec: activation record; attrib: attribute type; act param: actual param info) = if represent identi er?(act param) then parent activation get parent(act rec); tab info get info tab from activation(parent activation, return name(act param)); if equal(size of info tab(tab info), 1)

then

attributes get attributes from info table(tab info, 1); kind get kind i(tab info, 1); if or(or(equal(kind, variable parameter), equal(kind, value parameter)), equal(kind, variable))

then if de ning entry?(attributes) then if read write accessible(attributes) then link

link dynamic(parent activation, return name(act param), kind, RW); attrib alter attributes(attrib, link, local, non-de ning); update activation(act rec, attrib);

else

Error(\Violates principle of non-incr. priv.");

end if; else

-- Make a link to the de ning entry that the actual -- parameter is linked to. if and(read write accessible(attributes), read write accessible(return link(attributes)))

then

link return link(attributes); attrib alter attributes(attrib, link, local, non-de ning); update activation(act rec, attrib);

else

Error(\Cannot violate principle of non-incr. priv.");

end if; end if; Figure 12. Variable parameters

else

Error(\Actual parameter is of an inappropriate kind."); end if;

else

Error(\Variable formal parameter is incompatible with a function or psuedo variable."); end if;

else

Error(\Actual parameter is expected to be an identi er.");

end if;

Variable parameters (continued)

approach do not su er any such problem, as they have a rm mathematical foundation. Very few semantic models have concentrated on, or even provided adequate descriptions of, data control in programming languages. Exceptions include Smith's Accessing Graph Model[32], Johnston's Contour Model[33] and the earlier, less formal, versions of the model presented in this paper[34, 35, 25]. The application of ADT's to the description of the data structures used in the model lead to the development of a layered model that caters for programmers, compiler writers and language designers. Even though each group requires a di erent depth of understanding of the language, it is now possible to produce one document to satisfy each group, rather than write several documents, each aimed at a di erent group. It has also be demonstrated that the use of algebraic techniques in the de nition of a programming language is a viable proposition with promising results. The algebraic techniques gave the degree of formalism necessary to establish a suitable base from which to build a model.

References

[1] Volume 3. [2] [3] Volume 1, number 1. [4]

Value-Parameters(act rec: activation record; attrib: attribute type; act param: actual param info) = if represent identi er?(act param) then

parent activation get parent(act rec); tab info get info tab from activation(parent activation, return name(act param)); if equal(size of info tab(tab info), 1)

then

attributes get attributes from info table(tab info, 1); kind get kind i(tab info, 1); if or(or(equal(kind, variable parameter), equal(kind, value parameter)), equal(kind, variable))

then if de ning entry?(attributes) then if or(read write accessible(attributes), read only accessible(attributes))

then

link

link dynamic(parent activation, return name(act param), kind, RO); attrib alter attributes(attrib, link, local, de ning); update activation(act rec, attrib);

else

Error(\Violates principle of non-incr. priv.");

end if; else

-- Make a link to the de ning entry that the actual -- parameter is linked to. if and(or(read write accessible(attributes), read only accessible(attributes)), or(read write accessible(return link(attributes)), read only accessible(return link(attributes))))

then

link return link(attributes); attrib alter attributes(attrib, link, local, de ning); update activation(act rec, attrib);

else

Error(\Cannot violate principle of non-incr. priv."); end if; end if;

else

Error(\Actual parameter is of an inappropriate kind.");

end if;

Figure 13. Value parameters

else

Error(\Actual parameter may not be a function or a function psuedo-variable."); end if;

else

-- Actual parameter was a value attrib store(attrib, value of(act param)); update activation(act rec, attrib); end if; Value parameters (continued)

[5] [6] Volume 7. [7] [8] [9] Volume 7, number 1. [10] Volume 7, number 1. [11] Volume 7, number 1. [12] [13] [14] [15] [16] [17] Volume 7. [18] Volume SE{6. [19] [20] [21] Volume 24, number 1.

Proc-And-Func-Parameters(act rec: activation record; attrib: attribute type; act param: actual param info) = if represent identi er?(act param) then parent activation get parent(act rec); tab info get info tab from activation(parent activation, return name(act param)); if member of info table(tab info, function)

then

kind

function;

else if member of info table(tab info, procedure) then kind

end if; end if;

procedure;

attributes

get attributes via kind from info table(tab info, kind); -- Check that the actual and formal procedural or functional -- parameters are compatible. if match param info(attrib, attributes)

then if de ning entry?(attributes) then if read only accessible(attributes) then link

link dynamic(parent activation, return name(act param), kind, RO)); attrib alter attributes(attrib, link, local, non-de ning); update activation(act rec, attrib);

else

Error(\Violates principle of non-incr. priv.");

end if; else

-- Make a link to the de ning entry that the actual -- parameter is linked to. if and(read only accessible(attributes), read only accessible(return link(attributes)))

then

link return link(attributes); attrib alter attributes(attrib, link, local, non-de ning); update activation(act rec, attrib);

else

Error(\Cannot violate principle of non-incr. priv.");

end if; end if; Figure 14. Procedural and functional parameters

else

Error(\Actual and formal params. are not compatible."); end if;

else

Error(\Actual parameter is expected to be an identi er."); end if; Procedural and functional parameters (continued)

[22] [23] [24] [25] Volume 7, number 1. [26] [27] Volume 4, number 4. [28] [29] [30] Volume 20, number 11. [31] [32] [33] [34] [35]

Appendix

An example of an algebraic speci cation of an abstract data type (ADT) used in the model of data control is given in Figure 15. The speci cation given in the gure is for a list data type.

ADT LIST [object] sorts list / object, boolean syntax new list: add to list: head of list: tail of list: empty list?:

semantics declare

list  object list list list

! ! ! ! !

list list S object fERRORg S list fERRORg boolean

list list 1: obj 1, obj 2: object

axioms (1) (2) (3) (4) (5)

empty list?(new list) = true empty list?(add to list(list 1, obj 1)) = false head of list(new list) = ERROR head of list(add to list(new list, obj 1)) = obj 1 head of list(add to list(add to list(list 1, obj 1), obj 2)) = head of list(add to list(list 1, obj 1)) (6) tail of list(new list) = ERROR (7) tail of list(add to list(new list, obj 1)) = new list (8) tail of list(add to list(add to list(list 1, obj 1), obj 2)) = add to list(tail of list(add to list(list 1, obj 1)), obj 2) Figure 15. An abstract data type describing a FILO list