Speci cation and Veri cation of Compiler Frontend Tasks - CiteSeerX

2 downloads 0 Views 266KB Size Report
editors, Proc. Procomet'94 (IFIP TC2 Working Conference on Programming Concepts,. Methods and Calculi). North-Holland, 1994. Dij76]. Edsger W. Dijkstra.
Speci cation and Veri cation of Compiler Frontend Tasks: Semantic Analysis Andreas Heberle and Wolf Zimmermann and Gerhard Goos Universitat Karlsruhe Institut fur Programmstrukturen und Datenorganisation Postfach 6980, Vincenz{Prienitz{Str. 3, D{76128 Karlsruhe E-mail: [email protected]

Abstract

We describe a methodology for the speci cation and veri cation of the tasks of compiler frontends. The semantics of the language to be compiled is speci ed with Evolving Algebras. With respect to the semantics of the language we de ne the static semantics. The semantical analysis checks whether a program satis es the static semantics. We introduce a notion of correctness of semantic analysis and show how it can be proven. The technique will be demonstrated for a subset of real life imperative programming languages.

Keywords: Compiler speci cation, Compiler Veri cation, Evolving Algebra, Semantic Analysis

1 Introduction Compiler correctness is essential for correct software development. If the compiler that transforms high level language programs (source programs) to machine code (target programs) is erroneous, it can not be assured that the application works correctly. In [GDG+ 96] we described our notion of compiler correctness. We distinguish between the correctness of the speci cation of the mapping from source to target programs (the compiling veri cation ) and the correctness of the implementation (on a real machine) of the compiling speci cation (the compiler implementation veri cation ). In this paper we focus on one part of the frontend of a compiler, the semantic analysis and show how speci cation and its compiling veri cation can be performed. The compiler veri cation can be performed with standard proof techniques and is not further mentioned in this paper. We start from programs represented as abstract syntax trees. Thus we do not consider lexical and syntax analysis. The semantics of a program are determined by the description of the programming language. For formal reasoning about programs, it is necessary that this description is clean and unambiguous. Operational semantics are adequate for this purpose, because the view of states and state transitions support the intuitive understanding of compiler designers. Evolving Algebras, [Gur91, Gur95], are one formal approach to de ne semantics operationally. Their applicability for specifying semantics of existing programming languages is shown by several semantics speci cations, e.g [GH93], [Wal95] or [BDR94]. We use evolving algebra speci cations as starting point for our technique. But, in contrast to the speci cations above, we de ne abstract machines that can handle semantically incorrect programs as well. Though semantics of programming languages are usually de ned mostly dynamic, it is possible to check parts statically. We examine what semantic properties of programming languages can be checked statically and how they can be determined by considering the evolving algebra speci cation. Then we de ne a notion of correctness with respect to the abstract machine representing the 1

language semantics and the semantic analysis speci cation and show how the correctness proof can be structured. Finally we demonstrate the technique for some typical language constructs of imperative programming languages. The organization of this paper is as follows. Section 2 introduces Evolving Algebras. Section 3 describes the methodology in more detail. In section 4, we introduce a formal correctness de nition for semantic analysis. Section 5 describes an example for the application of the presented technique. In section 6, we draw conclusions and give an overview of future work. Appendix A describes the evolving algebra speci cation of a simple imperative programming language.

2 Introduction to Evolving Algebras Evolving Algebras are a method to describe semantics operationally. The concept was rst introduced in [Gur91]. An Evolving Algebra (EA) A is an abstract machine. The basic idea is, to describe the behavior of a machine by algebras, that represent states, and a set of transition rules. The signature  of A is a nite collection of function names, each of a xed arity. A state of A is a static algebra of signature , with a super-universe S representing the sorts and with interpretations of function names in  on S . These interpretations are called basic functions. The super-universe does not change when the state of A changes, the basic functions may. The super-universe contains distinct elements true , false and undef (?) that allow to deal with binary relations and partial functions. These three elements are logical constants. Their names do not appear in the signature. A universe U is a special type of basic function: a unary relation identi ed with the set fx : U (x)g. The universe BOOL = ftrue ; false g is another logical constant. When we speak about a function f from an universe U to an universe V , we mean that formally f is an unary operation on the super-universe such that f (a) 2 V for all a 2 U and f (a) = ? otherwise. A program of A 1 is a nite collection of transition rules of the form:

if Condition then Updates

endif

For example

if t0 then

f(t1 , : : : , tn ) := tn+1

endif

where t0 ; t1 ; : : : tn+1 are closed terms in the signature of A. The meaning of the above rule is: Evaluate all the terms ti in the actual state; if t0 evaluates to true then change the value of the basic function f at the value of the tuple (t1 ; : : : ; tn ) to the value tn+1 , otherwise do nothing. Evolving from one state to another means that in a given state an interpreter evaluates all relevant terms (interprets these terms) and then makes the necessary updates, respectively changes the 1 We call programs of A in the following EA{programs to avoid confusion with programs of a programming

language.

2

interpretations of function names. If several updates contradict each other (trying to assign di erent values to the same basic function at the same place) then one update is chosen nondeterministically. In the terminology of Evolving Algebras we distinguish the following classes of functions:  dynamic functions: the interpretation of a dynamic function is changed by transition rules, i.e. f is called a dynamic function if an assignment of the form f (t1 ; : : : ; tn ) := tn+1 appears anywhere in a transition rule.  static functions: the interpretation of a static function is never changed by a transition rule. For our purposes it is appropriate to introduce the notion of static and dynamic terms. Static term is de ned inductively: 1. Each static constant, static in the terminology of evolving algebras, is a static term. 2. Each term f (t1 ; : : : ; tn ), where f is a static function and t1 ; : : : ; tn are static terms, is a static term. Each term that is not static is called dynamic. We call a static function (static in the sense of EA's) that is applied on dynamic terms semi{static. Semi{static function will never be updated while the abstract machine evolves, but the result can change because it depends on dynamic arguments. Evolving algebras provide several structuring mechanism. The most important structuring mechanism are external functions that allow for interaction with the outside world. For external functions it is not necessary to specify them, only some requirements may be speci ed i.e. any interpretation of this function satis es at least these requirements. External functions are de ned outside the evolving algebra. They are syntactically static (are never changed by a transition rule), but have their values determined by an oracle. Thus an external function may have di erent values for the same arguments as the algebra evolves. At last, an evolving algebra speci cation has an initial state. In our semantics speci cation we use the data types list and stack without further de nition. For example, l : INT  de nes l as a list of integer values. x:l inserts the value x at the front of the list l. The stack operations are de ned as usual, except that we use a function bottom that accesses the element at the bottom of the stack.

3 Static Semantics and the Speci cation of Semantic Analysis Semantic analysis determines the static semantic properties of programs and veri es the consistency of these properties. In general, the rst task of the semantic analysis is name analysis, nding the de nition valid at each use of an identi er. Based upon this information, operator identi cation and type checking determine the operand types and verify that they are allowable for the given operator.

3.1 The Basic Idea

As mentioned in the introduction, our approach starts from an EA{speci cation of the semantics of an imperative language and the speci ed abstract machine can handle erroneous programs as well. That means, the machine indicates if a semantic error occurs during the interpretation of a program and indicates this error by a special error state. E.g. possible error states are DeclarationError, 3

TypeError or DivByZeroError. Such a speci cation re ects the static and the dynamic semantics of a language. The problem is to determine exactly which properties are static and which are dynamic. Static properties can be checked by the semantic analysis, dynamic properties can only be checked while executing the program. In an EA{speci cation the semantic properties are represented by functions. These functions could be dynamic or static in terms of EA's. The basic idea of our approach is, that the semantic analysis can precompute some of these functions to detect static semantic errors in programs. To decide what functions can be precomputed, we need to examine the rules of the corresponding EA{speci cation. We distinguish three cases: 1. semantic properties that are speci ed by a dynamic function: they can not be precomputed. E.g. the content function in A.4 is dynamic and therefore it is not possible to reason about the value of a variable in general. 2. semantic properties that are speci ed by static functions: this means that the interpretation of the function will never change when the abstract machine evolves. Therefore it is possible to reason about such properties statically. 3. semantic properties that are modeled by semi{static functions: for these functions it is not possible to decide ad hoc if they can be precomputed or not. But we will see that some semi{static functions can be transformed, without changing the semantic properties. Instead the new functions behave static with respect to the program context and can therefore be precomputed. Remark: here program context is a static term. The rst two points are clear, but we need to explain the third point in more detail. We want to replace a semi{static function f (t1; : : : ; tj ), by a function f 0(t01 ; : : : ; t0k ) where f 0 , t01 ; : : : ; t0k are static and for all program contexts: f (t1; : : : ; tj ) = f 0(t01 ; : : : ; t0k ). The function f 0 then can be computed instead of f , depending on the program context. In general this means that we specialize the EA{speci cation for concrete programs by making the program context explicit. We must verify that this transformation does not a ect the speci ed semantics of the language. This is a proof obligation.

Example: In a semantic speci cation of a program the information about identi ers is usually represented by the environment. If we model the semantics of a language by an EA, the environment exists as well. In rule 5 of appendix A the type function uses the environment env to check whether the types of the variable and the expression are equal in the current state. env is a dynamic structure and can change if variables are declared or procedure calls are executed. The environment is used by a type function which determines the type of identi ers. The type function itself will never be updated, though its results may change because of changes of the environment. The type function is semi{static. In fact, for type checking the de nition table which holds the static information about declarations and de nitions, is an adequate abstraction of the environment. The de nition table can be precomputed by processing declarations and procedure de nitions. Then it is possible to replace the function type by a new function type 0 that refers to the de nition table depending on the concrete program context. Rule 1 shows a specialization of rule 5 which depends on the context 0 name 0 . In general: 0 name 0 2 f0 main 0 g [ fprocedure names g. DefTab is a precomputed function. 4

Rule 1 (Specialized assignment) if Cmd(CurTask) = ass(v, e) and CurContext = 'name' then if coercible(type'(e, 'name', DefTab), type'(v, 'name, DefTab)) then if type(e, 'name', DefTab) =6 type'(v, 'name, DefTab) then content(bind(v,env)) = intto t(eval(e,env)) else content(bind(v,env)) = eval(e,env) endif CurTask := NextTask(CurTask) else CurTask := TypeErrorTask endif endif

Remark: We must save the current context if a procedure call is executed because it must be restored when the procedure execution is nished.

3.2 Specifying the Semantic Analysis

The originator of the semantic analysis is a syntactical representation of a program: we assume that programs are represented as abstract syntax trees. The semantic analysis consists of two parts. As mentioned in the previous section, for semantic analysis purposes we abstract from the environment that exists while executing a program. Instead we use a de nition table, that is generated while processing declarations and procedure de nitions. The informations of the de nition table are used to analyze the statements of a program. Example: Consider e.g. rule 5 in appendix A. The abstract machine behaves as follows: if the current task is an assignment statement and the types of the variable and the expression are coercible then the content of the variable is replaced by the value of the expression. Perhaps a type coercion is required if the types are not equal. If the type of the expression on the righthand side is not coercible to the type of the variable on the lefthand side, the machine stops in an error state indicated by TypeErrorTask. The function coercible is de ned according to the language de nition. We propose to specify the semantic analysis by predicates representing the possible error states. We apply these predicates on the abstract representation of programs and use the precomputed information of the de nition table to detect errors. It is clear that the predicates heavily depend on the semantics of the language. The behavior of the de nition table is determined by the language rules for declaration of variables, de nition of types and procedures.

4 Correctness Requirements In this section we want to de ne a correctness notion for semantic analysis. For this purpose we rst analyze what are the correctness requirements for the context dependent specialization of the type function used in an EA{speci cation.

4.1 Correctness of Specializing the Type Function

In the following let A0 () be the specialized EA{speci cation, for a concrete program , of the evolving algebra that speci es the semantics of a programming language in general. 5

For reasoning about the specialized EA{speci cation, it is necessary to compare states of A and A0. The state of an evolving algebra is determined by the current interpretation of the functions, ref. section 2. The di erence between A and A0 is, that we introduced additional functions in A0 :

CurContext , context , type 0 and DefTab . Besides this, the state remains the same. We call a state q0 of A0 equivalent to a state q of A (q  q0) if the basic functions in A are equal in A0 . The context dependent abstraction of the environment by a de nition table or in other words, the specialization of the type function is allowed, if the following condition is ful lled.

For all programs  and all states q 2 A, q0 2 A0 (), q  q0 : if in a transition rule of A the function type is used then type (id ; env ) yields the same result as type 0 (id ; DefTab (`context `)) in the corresponding rule of A0 .

(1)

4.2 Correctness of Semantic Analysis

The semantic analysis checks if a program is correctly typed according to the semantics of the programming language. This in general includes checking for declaration errors, undeclared identi ers and type compatibility of expressions, as well as parameters in procedure calls. In the previous section we proposed to specify the semantic analysis by predicates re ecting the properties above. These predicates represent the error states of the abstract machine that determines the semantics of the programming language. We distinguish between errors in declarations and type errors. Declaration errors indicate errors in the de nition table, e.g multiply declared identi ers or unde ned types. Type errors can occur while analyzing the statements of a program. With respect to the abstract machine that speci es the semantics of the programming language correctness of the semantic analysis means: (Init !  q ^ (Interpretation (q ; CurTask ) = DeclErrorTask _ Interpretation (q ; CurTask ) = TypeErrorTask ))

)

DeclarationError (ComputeDefTab [ ]]) _ :CorrectlyTyped [ ]]("; ")

and

(2)

DeclarationError (ComputeDefTab [ ]]) _ :CorrectlyTyped [ ]]("; ")

) (A executes all program paths of  ) (Init !  q ^

(3)

(Interpretation (q ; CurTask ) = DeclErrorTask _ Interpretation (q ; CurTask ) = TypeErrorTask ))) Init is the initial state of the abstract machine, before executing the program  and Interpretation(q, CurTask) stands for the current basic function in the state q that interprets CurTask. The state transitions depend on the program . The predicates CorrectlyTyped and DeclarationError specify the errors detected statically by the semantic analysis. Remark: When the abstract machine interprets a program, it is possible that parts of the program will never be executed, e. g. if the condition of a loop is always false, the loop body will never be executed. Because it is not decidable for arbitrary programs which program paths will be executed, we require that the semantic analysis detects all the static semantic errors that can occur if all program paths would be executed. This is expressed by condition (3).

6

4.3 Evolving Algebra Abstraction

For proving the correctness of the semantic analysis we need to reason about EA{rules. But, it is not necessary to reason about all semantic properties a rule re ects, we can abstract of semantic properties that are not essential for the proof. This leads implicitly to a simpler EA-speci cation. We call this technique EA{Abstraction and explain it in section 5 by an example.

5 Correctness Proofs for SL In appendix A, we speci ed the semantics of a simple programming language SL by an evolving algebra. In this section, we show the correctness proofs for semantic analysis. First we verify that condition (1) is ful lled for the speci cation of SL . Then we use this result for the veri cation of condition (2) and (3). Because of space limitations, we only sketch the proof ideas and do not describe the proofs in full detail.

5.1 The Computation of the De nition Table

The computation of the de nition table is a well understood problem in compiler construction. There exist standard solutions. We therefore do not specify the creation process in detail and for simplicity assume that the de nition table for a program is computed correctly. Furthermore, we assume that the de nition table is represented as a list of frames and we demand that there exist some auxiliary functions which are correct as well. Figure 1 shows the speci cation. DEFTAB =^ FRAME FRAME =^ (ID  (ID  TYPE) ) ComputeDefTab : PROG ! DEFTAB IsIn : ID  (ID  TYPE) ! TYPE [ f"g Globals : DEFTAB ! (ID  TYPE) Locals : ID  DEFTAB ! (ID  TYPE) type 0 (id ; context ; DefTab ) =^

if context = `main ` ! IsIn (id ; Globals (D )) [] context = 6 `main ` ^ IsIn (id ; Locals (context ; D )) = " ! IsIn (id ; Globals (D )) [] context = 6 `main ` ^ IsIn (id ; Locals (context ; D )) 6= " ! IsIn (id ; Locals (context ; D )) Figure 1: Speci cation of the de nition table IsIn checks whether the identi er appears in a list of declarations. Globals respectively Locals yields the declaration list of global respectively local declarations. Remark: The global declaration list contains the global information about procedures. In our example simply the procedure name. type 0 is de ned recursively over expressions. The behavior is similar to type and not mentioned here. The only di erence is how the type of an identi er is determined. Remark: It is a proof obligation that the de nition table ful lls the informally described requirements.

7

5.2 Specialization of the Environment for Name and Type Analysis

In the following, we call the general evolving algebra speci cation of SL A and the specialization for a program  A0(). We prove condition (1) by induction on the number of state transitions. We have to consider the rules that change the environment. The declaration rules (rules 4 and 7 add a new declaration to the frame on the top of the environment. The rules are the same in both machines. While the machines interpret declaration commands, they make equivalent state transitions and do not use the type respectively type 0 function. The rules 8 and 9 change the environment stack by pushing or popping a procedure frame. Proof (Sketch) Basis: n = 0 In the initial state, there isn't any call of type therefore condition (1) is true. Induction: n ! n + 1 We can ignore commands other than call (id ) and proc end because only procedure calls crucially a ect the environment. Though declarations add new variables to the top of the environment, this is not essential because the type function is not used while processing declarations, 4. 1. call (id ): Rule 2 shows one specialization of rule 8. For each procedure in  exists a specialization of rule 8 procedures in . Rule 2 (Specialized Procedure Call) if Cmd(CurTask) = call(id) and CurContext = `name' then if type'(id, `name', env) = proc type then env := Push(env, AddReturnTask(NewFrame(id))) CurTask := FirstTask(id) CurContext := id context := Push(context, `name', DefTab)

else

CurTask := TypeErrorTask

endif endif A new frame is pushed on the environment when the command call (id ) is executed. Additionally, in A0 () the current context is saved on a context stack and CurContext is updated to id . From that point, type 0 inspects Global (DefTab ) and Local (id ; DefTab ), ref. section 5.1. When the local declarations were processed the two machines made equivalent transitions and the top of the environment holds the same information as Local (id ; DefTab ). The bottom of the environment is similar to Global (DefTab ). This is true because of the correctness of the de nition table. Therefore condition (1) holds for procedure calls. 2. proc end : Rule 3 shows the specialization for rule 9. Rule 3 (Specialization Procedure End) if Cmd(CurTask) = proc end then env := Pop(env) CurTask := NextTask(ReturnTask(Top(env))) CurContext := Top(context) context := Pop(context)

endif

8

The top of the environment will be deleted when the command proc end is executed, 9. In A0(), the new context will be restored from the top of the context stack and the stack will be updated. Remember that each call of a procedure saves the old context on the context stack. Therefore the new current context corresponds to the top of the environment stack. This implies that the valid declarations for type and type' are equal when a procedure ends.

}

Remark: If the language allows variable initialization, e. g. a : INT := b where b is a local variable, then the context handling of A0 () needs to be re ned.

5.3 Semantic Analysis

5.3.1 Declaration Errors In SL , the only possible error in declarations is that identi ers are multiply declared. Therefore we de ne a predicate DeclarationError that checks the global and local declaration lists of the de nition table to detect multiply declared identi ers. Figure 2 shows the speci cation. DeclarationError : DEFTAB ! BOOL UniqueIds : (ID  TYPE) ! BOOL ExistsId : ID  (ID  TYPE) ! BOOL DeclarationError (") DeclarationError (frame:D) UniqueIds (") UniqueIds ((id ; t ):l ) ExistsId (id ; ") ExistsId (id1 ; (id2 ; t ):l )

=^ =^ =^ =^ =^ =^

false

:UniqueIds (snd(frame)) _ DeclarationError (D ) true :ExistsID (id ; l ) ^ UniqueIds (l ) false (id1 = id2 ) _ ExistsId (id1 ; l ) Figure 2: Speci cation of DeclarationError

Theorem 1 The correctness conditions (2) and (3) of section 4.2 hold for declaration errors. Proof (Sketch)

The proof can be done by induction on the structure of programs. The abstract machine A can only change to a state that represents a declaration error if the current command is either a variable declaration (rule 4) or a procedure declaration (rule 7). The computation of the de nition table is assumed to be correct, it contains all declarations of a program. The predicate DeclarationError analyzes the de nition table completely and checks each procedure frame separately for multiply declared identi ers. The separated check re ects the fact, that in SL local declarations hide global declarations. Condition (3) holds as well, if all program paths will be executed. }

5.3.2 Type Errors The predicate CorrectlyTyped speci es the type check of the semantic analysis of SL . Figure 3 shows the speci cation. CorrectlyTyped uses an auxiliary function coercible which is semantical equal to the one used in A. 9

CorrectlyTyped : PROG  ID  DEFTAB ! BOOL type' : EXPR  ID  DEFTAB ! BOOL coercible : TYPE  TYPE ! BOOL CorrectlyTyped [ prog VarDecls ; ProcDecls ; Stats ] ("; ") =^ CorrectlyTyped [ ProcDecls ] (`main `; ComputeDefTab [ prog VarDecls ; ProcDecls ; Stats ] ) ^ CorrectlyTyped [ Stats ] (`main `; ComputeDefTab [ prog VarDecls ; ProcDecls ; Stats ] ) CorrectlyTyped [ proc id V arDecls ; Stats ; ProcDecls ; ] (`main`; D) =^ CorrectlyTyped [ Stats] (id; D) ^ CorrectlyTyped [ ProcDecls] (`main`; D) CorrectlyTyped [ Stat ; Stats] (context; D) =^ CorrectlyTyped [ Stat ] (context ; D ) ^ CorrectlyTyped [ Stats] (context; D) CorrectlyTyped [ Id := Expr] (context; D) =^ coercible (type 0 (Id; context; D); type 0 (Expr;context; D)) CorrectlyTyped [ if C then S1 else S2 ] (context; D) =^ type 0 (C; context; D) = bool type ^ CorrectlyTyped [ S1 ] (context; D) ^ CorrectlyTyped [ S2 ] (context; D) CorrectlyTyped [ while C do S ] (D) =^ type 0 (C; D) = bool type ^ CorrectlyTyped [ S ] (D) CorrectlyTyped [ call id ] (D) =^ type 0 (id; `main`; D) = proc type CorrectlyTyped [ "] (D) = true

Figure 3: Speci cation of CorrectlyTyped

Theorem 2 The correctness conditions (2) and (3) of section 4.2 hold for type errors. Proof (Sketch) We proved in 5.2 that for a concrete program  the specialization A0 () re ects the same semantics as A applied on . Therefore we are allowed to verify the correctness of CorrectlyTyped with respect to A0 (). We can use the result that the functions type 0 of A0 and CorrectlyTyped behave equal.

The proof can be done by structural induction on the abstract syntax tree combined with an induction on the length of statement list. Remember: the task structure of A0 is based on the abstract syntax tree of a program. Though parts of the program may be executed several times, a type error will be detected at the rst execution because the type properties of SL are static. Because of space limitations, we only show the local correctness for the assignment and the conditional statement.

Basis: Assignment statement

We need to compare the de nition of CorrectlyTyped with the specialization of rule 5.

Proof

If the abstract machine A stops in state TypeErrorTask while processing an assignment statement this means that the types of the variable v and the expression e are not coercible. Considering the de nition for the type of expressions there are the following reasons for this: a. v is not declared, b. an identi er in e is not declared or c. the types of v and e are well{de ned but not coercible. 10

The type of an expression is unde ned if there are subexpressions that can not be coerced. The predicate CorrectlyTyped checks Id and Expr as well, in terms of A v and e. If the types are unde ned or not coercible this will be detected. For the proof of condition (3) it is essential that each program path is executed. If the semantic analysis detects a type error in an assignment statement of a procedure that is never executed this wouldn't lead to an error state in A0 . } Remark: The preceding proof is an example for EA{abstraction. The rule 5 changes the content function. But, we can ignore this property because it is not essential for the proof. Induction: We need to examine the following cases 1. Statement list: Proof ): The proof for arbitrary statement lists is done by induction on the length of statement lists. Basis: StatList := " Theorem 2 is true for the empty statement list because no semantic error can occurr. Induction: StatList := Stat ; Stats Remember that the function NextTask of the abstract machine is de ned over the abstract syntax tree. The statements are processed linearly. In some cases a statement list is processed several times (loop statement) The function CorrectlyTyped checks each statement of a program once and detects all static semantic errors. Therefore theorem 2 is true i it holds for Stat and Stats. } 2. Conditional statement: We need to compare the de nition in gure 3 with rule 6. Proof Condition (2): Several errors can cause that the machine stops in state TypeErrorTask while processing a conditional statement. (a) The expression is not of type bool type or (b) while processing TrueTask respectively FalseTask a type error occurs. Figure 4 illustrates the state transitions ending in an error state. q !c q0 describes a state transition of the abstract machine from state q to state q0 processing the expression c. Case a : CorrectlyTyped yields false if the expression is not of type bool type because in c there are undeclared identi ers in subexpression, two subexpressions are not coercible or c is well de ned but not of type bool type . Case b1 and b2 : If the type of the expression is bool type then the abstract machine processes TrueTask or FalseTask , which depends on the value of the expression. CorrectlyTyped investigates both statement lists of the conditional statement and detects semantic errors even if the statement list will never be processed. Therefore if the program ends with CurTask = TypeErrorTask then CorrectlyTyped (S1 ; D) or CorrectlyTyped (S2 ; D) yields false . Condition (3): With the assumption that each program path is executed, the abstract machine will detect type errors in the then{ and the else{part of a conditional statement. } 11

q

C

TypeErrorTask

a). Processing the condition q

TrueTask

q’

* S1

TypeErrorTask

b1 ). Processing statement list S1 q FalseTask

q’

* S2

TypeErrorTask

b2 ). Processing statement list S2

Figure 4: State transitions that lead to an error state

}

6 Conclusions The speci cation of programming language semantics describes dynamic and static properties of a language. For the speci cation of semantic analysis, the static semantic properties must be determined exactly. For veri cation purposes, it must be possible to reason that the semantic analysis detects all static semantic errors of a program. In this article we presented an approach to bridge the gap between the formal language speci cation and the speci cation of semantic analysis. The originator of the technique is an evolving algebra speci cation of the programming language semantics. The static semantic properties of a language can be determined by considering the basic functions of the EA{speci cation. The problem arised that some static semantic properties are not explicitly static in the speci cation. One result of this paper is that it is possible to make such properties explicit for concrete programs by transforming the general language speci cation. This leads to specialized context dependent speci cations which are very close to the speci cation of the semantic analysis. This additional step produces a new proof obligation but simpli es the correctness proof in general. First, the compiling veri cation needs to assure that the specialization is valid and second, it must be proven that all static semantic errors of a program will be detected. We demonstrated our technique by a simple example language which is a subset of real life imperative programming languages. The applicability of the technique is not restricted to semantic analysis. It can be used in other parts of the compiler where static semantic properties will be used. E.g., the technique could be used to prove the correctness of optimizations and specify the correctness of data ow analysis. Furthermore the correctness of the de nition table is used to generate intermediate code. This is part of our general methodology for the construction of correct compilers. For each construction step it is assumed that the preceding steps of the compiling process were correctly done. A general advantage of evolving algebras is that the formalism is easy to use because of the simple concepts. Since the informal speci cation of programming languages is usually operationally, the formal EA{speci cation is easy to create. This minimizes the probability of speci cation errors when the formal language speci cation is created. The semantics speci cation is easy to create out of the informal language description because of the operational semantics of evolving algebras. 12

A future goal is to support the veri cation techniques by tools. In that context there have already been e orts to formalize evolving algebras in PVS [ORS92] and do the proofs respectively parts of the proofs by machine support. Acknowledgements: This work is performed in the context of the Veri x project. Veri x is a joint project on Compiler Veri cation of the University of Karlsruhe together with the Universities of Kiel and Ulm.

A The Simple Language SL This section describes an evolving algebra speci cation (EA{SL ) for a simple language SL that has some typical language constructs of imperative programming languages. The language allows basic types, assignment, conditional and loop statements and procedures without parameters. It is a subset of the language IS , described in [GHZ95]. In the following, we omit sometimes the function signature, if it is clear from the context.

A.1 The Task Structure

A program is divided in commands. Each command corresponds to an action (here:task). This representation is adopted from [GH93]. The commands correspond to the syntactic units, the task to the action to be performed by this unit. Figure 5 shows the static functions representing commands and introduces the used sorts. COMMAND : instructions according to the language de nition TASK : set of tasks ass while if decl proc decl proc end call

: ID  EXPR ! COMMAND : EXPR ! COMMAND : EXPR ! COMMAND : ID  ID ! COMMAND : ID ! COMMAND : ! COMMAND : ID ! COMMAND

Figure 5: Sorts and commands for SL {programs The sort ID denotes identi ers, the sort EXPR denotes expressions. The tasks correspond to some nodes in the syntax{tree. FirstTask is the rst task of the program execution. Cmd computes the command w.r.t. a task. The function NextTask speci es the next action according to the language description. For loops and conditional statements the functions TrueTask and FalseTask are the next tasks to be executed if the outcome of the condition is true and false , respectively. The task structure respectively the functions can be completely computed from the syntax tree. For more detailed information see [HZ95]. TypeErrorTask and DeclErrorTask are special tasks re ecting errors.

A.2 Basic Types

The language de nition de nes the data types INT , FLT and BOOL with the standard operations and relations. Similar to C, SL de nes that integers and oating point numbers are implemented 13

as on the machine. Therefore the functions representing the operations are static external functions of the evolving algebra for the processor. E.g. intplus has the signature: intplus : INT  INT ! INT The data type BOOL is equal to the prede ned sort BOOL which is part of the evolving algebra system.

A.3 States

The interpretation of a program successively executes commands as speci ed by the functions FirstTask , NextTask , TrueTask and FalseTask . CurTask represents the current task to be executed.

A.4 The Transition Rules

For each command a transition rule is de ned. The environment env is used to maintain informations about variables. We need a sort OBJ for the objects associated with a variable. The sort TYPE represents the basic types of SL . There are two external functions for creating an object of a given type, and for returning the type of an object. Figure 6 shows the signature and the properties of the two functions. Signature: create obj type obj

: TYPE ! OBJ : TYPE ! OBJ

Properties: For any type t : type obj (create obj (t)) = t

Figure 6: Functions to create objects SL allows procedure calls, therefore we need to handle global and local de nitions. We specify the environment as a stack, where we can access the bottom element which represents the global de nitions. The elements of the stack are named lists. The environment can be de ned by: ENV =^

STACK (ID ; (ID ; OBJ ; TYPE ) )

In SL global variables are hidden by local variables. We model this property by the way of accessing variables in the environment env : ENV . First, the local information, the top of the stack, is accessed. If this fails, the global de nitions, the bottom of the stack, is considered. Figure 7 shows the sort VALUE and the functions that are used to handle variable information. The function content returns the value of an object. bind returns the object associated with an identi er using the global and local informations of the environment; type returns the type of an object associated with the variable. VALUE =^ INT ] FLT ] BOOL content : OBJ ! VALUE bind : ID  ENV ! OBJ type : ID  ENV ! TY PE Properties: content (o) 2 T , type obj (o) = T type obj (bind (x; env )) = type (x; env )

Figure 7: The sort VALUE and corresponding functions 14

For declarations, SL de nes that declarations need to be locally unique. Therefore the rule for declarations is: Rule 4 (Declaration) if Cmd(CurTask) = decl(x,T) then if DoubleDeclaration(x, Top(env)) then CurTask := DeclErrorTask else

env := Push(Pop(env), AddVar(Top(env), x, create obj(T),T)) CurTask := NextTask(CurTask)

endif endif

The function AddVar adds a new element (id; object; type) to an existing list of de nitions. The predicate DoubleDeclaration checks whether the identi er is already declared in the current procedure. It is de ned by: DoubleDeclaration (id ; ") = false DoubleDeclaration (id1 ; (id2 ; o ; t ):list ) = id1 = id2 _ DoubleDeclaration (id1 ; list )

The execution of an assignment statement requires that the types on the left-hand and the righthand side are compatible. Type compatibility is de ned as usual. The transition rule for the assignment uses the functions eval and type which are de ned recursively over expressions. coercible implements the type compatibility. Rule 5 (Assignment) if Cmd(CurTask) = ass(v, e) then if coercible(type(e, env),type(v,env)) then if type(e,env) =6 type(v,env) then content(bind(v,env)) = intto t(eval(e,env)) else content(bind(v,env)) = eval(e,env) endif CurTask := NextTask(CurTask) else CurTask := TypeErrorTask endif endif The execution of a conditional statement evaluates the conditional expression. If the result is true then the rst statement list is executed. If the result is false then the second statement list is executed. If the result is not of type bool type then a type error occurs and the program terminates erroneously. Rule 6 (Conditional) if Cmd(CurTask) = if(c) then if type(c,env) = bool type then if eval(c,env) then CurTask := TrueTask(CurTask) else CurTask := FalseTask(CurTask) endif else CurTask := TypeErrorTask endif endif 15

The rule for the while command is similar and is omitted because of space limitations. In SL , procedures without parameters are allowed. In the phase of processing declarations, only the procedure header is executed. The procedure body is executed when a procedure is called. NextTask assures that, if a procedure declaration is processed, the next task is the following procedure declaration or the rst statement of the main program. The execution of a procedure declaration adds an identi er of type proc type to the global de nitions of the environment. The local declarations were added to the environment, if the procedure body will be executed. The transition rules are: Rule 7 (Procedure Declaration) if Cmd(CurTask) = proc decl(id) then if DoubleDeclaration(x, Top(env)) then CurTask := DeclErrorTask else env := Push(Pop(env), AddVar(Top(env), x, " , proc type) CurTask := NextTask(CurTask)

endif endif Rule 8 (Procedure Call) if Cmd(CurTask) = call(id) then if type(id,env) = proc type then

env := Push(env, AddReturnTask(NewFrame(id))) CurTask := FirstTask(id)

else CurTask := TypeErrorTask endif endif NewFrame (id ) creates an empty list of de nitions, named by id . AddReturnTask saves the current task as return point in speci ed frame. If the current command is proc end (means, the procedure ends) then the local de nitions are deleted from the environment and the next task is the command after the current procedure call. Rule 9 (Procedure End) if Cmd(CurTask) = proc end then env := Pop(env) CurTask := NextTask(ReturnTask(Top(env)))

endif

A.5 Expressions

Expressions are evaluated by evaluating their constituent primaries in any order and then applying the operators with the according priorities. Observe, that the priorities are already considered in the syntax-tree. It is therefore irrelevant to de ne it in the expression evaluation. We use a syntax similar to Dijkstra's guarded commands [Dij76] for describing the functions eval and type . An alternative command has the form:

if B1 ! S1

[] B2 ! S2



[] Bn ! Sn



16

Bi are boolean expressions (guards) and Si are commands. Bi ! Si is called a guarded command.

The execution of the alternative command is as follows: nd one true guard and execute the corresponding command. If there exists several true guards then execute one of them. We extend the alternative command by the otherwise construct. In the alternative command above otherwise corresponds to :(B1 _ B2 _ : : : _ Bn). Therefore otherwise ! S will be executed if none of the other guards yields true. We only give some examples for the evaluation of expressions. The evaluation of the other expressions is similar. eval (e1 + e2 ; env ) = if type (e1 ; env ) = int type ^ type (e2 ; env ) = int type ! intplus (eval (e1 ; env ); eval (e2 ; env )) [] type (e1 ; env ) = int type ^ type (e2 ; env ) = t type ! tplus (intto t (eval (e1 ; env )); eval (e2 ; env )) [] type (e1 ; env ) = t type ^ type (e2 ; env ) = int type ! tplus (eval (e1 ; env ); intto t (eval (e2 ; env ))) [] type (e1 ; env ) = t type ^ type (e2 ; env ) = t type ! tplus (eval (e1 ; env ); eval (e2 ; env )) [] otherwise ! ?



The evaluation of an identi er is de ned by: eval (x; env ) = if bind (x; env ) = ? ! ? [] otherwise ! content (bind (x ; env ))

The function type is de ned similar to eval , however for identi ers it has the property described in gure 7.

A.6 Initialization

To complete the evolving algebra, we have to describe the initialization of some functions. The initialization of the task structure depends on the program to be executed. It is precomputed according to the interpretation of the abstract syntax tree. The environment only holds an empty frame for the global de nitions. env = Push(CreateStack; NewFrame(main)) The current task is the rst task of the program, according to the task structure. CurTask = FirstTask

References [BDR94] Egon Boerger, Igor Durdanovic, and Dean Rosenzweig. Occam: Speci cation and Compiler Correctness.Part I: The Primary Model. In U. Montanari and E.-R. Olderog, editors, Proc. Procomet'94 (IFIP TC2 Working Conference on Programming Concepts, Methods and Calculi). North-Holland, 1994. [Dij76] Edsger W. Dijkstra. A Discipline of Programming. Prentice-Hall, 1976. 17

[GDG+ 96] W. Goerigk, A. Dold, T. Gaul, G. Goos, A. Heberle, F. von Henke, U. Ho mann, H. Langmaack, H. Pfei er, H. Ruess, and W. Zimmermann. Compiler Correctness and Implementation Veri cation: The Veri x Approach. In Compiler Construction, volume 1060 of LNCS. Springer, 1996. Poster Session, International Conference on Compiler Construction 1996. [GH93] Y. Gurevich and J. Huggins. The Semantics of the C Programming Language. In CSL '92, volume 702 of LNCS, pages 274{308. Springer-Verlag, 1993. [GHZ95] T.S. Gaul, A. Heberle, and W. Zimmermann. De nition of the language IS. Working paper, University of Karlsruhe, July `95, 1995. [Gur91] Y. Gurevich. Evolving Algebras; A Tutorial Introduction. Bulletin EATCS, 43:264{284, 1991. [Gur95] Y. Gurevich. Evolving Algebras: Lipari Guide. In E. Borger, editor, Speci cation and Validation Methods. Oxford University Press, 1995. [HZ95] A. Heberle and W. Zimmermann. An evolving algebra speci cation of the operational semantics of the While Language IS0 . Working paper, University of Karlsruhe, July `95, 1995. [ORS92] S. Owre, J. M. Rushby, and N. Shankar. PVS: A Prototype Veri cation System. In Deepak Kapur, editor, Proceedings 11th International Conference on Automated Deduction CADE, volume 607 of Lecture Notes in Arti cial Intelligence, pages 748{752, Saratoga, NY, October 1992. Springer-Verlag. [Wal95] C. Wallace. The Semantics of the C++{Programming Language. In E. Borger, editor, Speci cation and Validation Methods. Oxford University Press, 1995. [WG84] William M. Waite and Gerhard Goos. Compiler Construction. Springer Verlag, 1984.

18