Coherence and Transitivity of Subtyping as Entailment

38 downloads 0 Views 320KB Size Report
Then, intuitively, (eq appl2) says that if these two types (of outputs) have ... This will be formalized in our system as as the rule (eq appl2 co)) and that 'coercion ...
Coherence and Transitivity of Subtyping as Entailment GIUSEPPE LONGO, LIENS(CNRS) and DMI, Ecole Normale Sup´erieure, 45 rue d’Ulm, 75005 Paris, France. E-mail: [email protected] KATHLEEN MILSTED, NET, France T´el´ecom, 28 chemin du Vieux Chˆene, BP 98, 38243 Meylan Cedex, France. E-mail: [email protected] SERGEI SOLOVIEV, IRIT, Universit´e de Toulouse-III, 118 route de Narbonne, 31062 Toulouse, France. E-mail: [email protected] Abstract The relation of inclusion between types has been suggested by the practice of programming as it enriches the polymorphism of functional languages. We propose a simple (and linear) sequent calculus for subtyping as logical entailment. This allows us to derive a complete and coherent approach to subtyping from a few, logically meaningful sequents. In particular, transitivity and anti-symmetry will be derived from elementary logical principles. Keywords: Second-order logic, Gentzen sequent calculus, polymorphism, coercive subtyping, cut-elimination.

1 Introduction The word ‘coercion’ occurs in varying contexts. For example, in imperative and objectoriented programming languages such as C and C++, coercions are known as ‘casting functions’, whereby variables of one datatype (e.g. Boolean) are ‘converted’ or ‘cast’ to another datatype (e.g. integer). In some languages, such conversions may actually change the underlying representation of the variable’s contents, in which case the conversion is done at run-time when the contents are known. More semantic interpretations characterize coercions with respect to identity functions, which do not imply a change to underlying representations. Coercions are used in this sense in [24] for example, where they are known as ‘retyping functions’. In practice, the conversions implied by such coercions are performed statically at compile-time, for type-checking, etc. This paradigm is very relevant in functional programming. In recent years, several extensions of core functional languages have been proposed to deal with the notion of subtyping; see, for example, [9, 24, 4, 2, 8, 6, 25, 30, 31, 10, 11, 22, 16]. These extensions were suggested by the practice of various programming styles. In particular, they were inspired by the notion of inheritance as used in object-oriented programming languages, or by other concrete implementations of the following form of polymorphism: data living in a type  , which is a subtype of  , may also be seen as living in type  , in some suitable sense. So, an integer is also a real, modulo an obvious ‘almost identical’ coercion

J. Logic Computat., Vol. 10 No. 4, pp. 493–526 2000

c Oxford University Press

494 Coherence and Transitivity of Subtyping as Entailment from integers to reals. However, subtyping, in the presence of functional arrow ! (and second-order universal quantification 8) presents some semantic and logical problems. In all functional approaches to subtyping, arrow is formalized as being contravariant or antimonotone in the first argument. More formally:

   0  !    ! 0  is read ‘ is a subtype of  ’. (!)

where   The contravariant behaviour of ! (on the left) intuitively fits with the categorical notion of (contravariant) Hom functor, as well as with the intuitive understanding of programs as transformations acting on inputs: if a program M acts on inputs N in  , then it can take as input any element in a subtype  of  . This is where the challenge arises: is it possible to give a precise mathematical meaning to the (!) rule and universal quantification, in the sense, say, of denotational semantics or of logical calculi? In this paper, we propose a sequent calculus of subtyping, as a fragment of intuitionistic (linear) second-order propositional calculus. In particular, we focus on the introduction and elimination rules for 8, which are at the core of second-order systems, and on a ‘cutelimination’ theorem. This is a relevant property for the partial order of subtyping, as the ‘cut’ rule corresponds to transitivity. The idea is that one can give an obvious logical (constructive) understanding of ‘ is a subtype of  ’ as ‘ implies  ’, or more precisely, as ‘ entails  ’ ( `  ). Note first that, with this interpretation, the (!) rule makes perfect sense: if  entails  and  entails 0 , then  !  entails  ! 0 . Moreover, if terms in  may also be in  , then this embedding should be described by some sort of effective transformation: either the identity or a ‘suitable’ coercion, as an effective map from  to  . Thus, by the Curry–Howard isomorphism, subtyping is a special case of intuitionistic implication: a computation from  to  is a proof of  `  . But which special case? And how to characterize it? As we shall see, a characterization in terms of erasure (that is, after erasing all type information, coercions become identity maps in the untyped calculus) also has some semantic flavor. The main results of this paper (except the part about base types) appeared without proofs in a previously published paper [20] on a calculus for subtyping as logical entailment. The complete proofs related to that presentation may be found in a technical report [21], where normalization for the simple, but impredicative, logical system presented here is dealt with by a direct argument — see the discussion in Section 6.4. Those proofs have been deeply revised and simplified in this paper. Since its publication, the calculus of subtyping in [20] has been studied and developed by several authors (see, for example, [30, 31, 10, 11]). Another approach to coercive subtyping, where ‘almost arbitrary’ functions may be taken as coercions, was suggested by Zhaohui Luo and further investigated with the participation of one of the authors of the present paper [22, 16]; that approach is more general but, because of its generality, it lacks the semantic motivation and the connection between coercions and logical deductions at the heart of the work presented here.

1.1 Subtyping as restricted linear implication The logical frame we use here is intuitionistic second-order propositional logic. The intended meaning of  `  is that  is a subtype of  . An obvious axiom and the contra(co)-variance

Coherence and Transitivity of Subtyping as Entailment 495 rule for ! are the first requests for a logic of subtyping:

0 `   ` 0  !  ` 0 !  0 Consider now a logical interpretation of second-order 8. Assume that  contains X free and that from a specific instance of  (with  substituted for X say), one can deduce  . Then, from 8X: one can, a fortiori, deduce  . This is the (8 left) rule of Gentzen’s sequent (ax)

`

(!)

calculus. A semantic understanding of this second-order deduction can be given in the PER model of subtyping (see Section 6.2 for a discussion and references): informally, if a specific instance of a family of types is a subtype of  , then, in the model, the intersection of the entire family is a subtype of  . (8 left)

[=X ] `  8X: ` 

where [=X ] stands for the type resulting from the (capture-avoiding) substitution of type  for X in type  . Moreover, if  entails  , and  does not contain X free, then  also entails 8X: . This is Gentzen’s (8 right) rule. Semantically, if  is a subtype of  and  does not depend on X , then  is also a subtype of the intersection of all  s over X : (8 right)

 for X not free in 

`  ` 8X:

Recall now that the principal idea here is that the embedding of a type into another should be very simple: indeed, as close to the identity as possible in a typed language. Identities are linear maps, to say the least, as our system will be a fragment of the linear sequent calculus. Even more so: we allow only one premiss in a sequent  `  , as even the swapping of inputs is forbidden. Thus, in order to deal with nested implications, we generalize (8 right) to: (8n0 right)

 for X not

 ` 1 ! : : : (n !  ) : : :)  ` 1 ! : : : (n ! 8X: ) : : :)

free in  nor in 1 ; : : : ; n

(8n right) is a family of rules indexed by n  0. Note that, if more than one premiss was allowed, (8n right) would be the curried variant of (8 right) with n premisses. These four rules are all we need. Observe that 8 is introduced to the left and to the right of entailment by two separate rules while ! is symmetrically introduced both to the left and to the right of entailment by a single rule, the familiar (!) rule defined previously. The reader may wonder what happened to a fundamental property of subtyping, that is, to transitivity. Indeed, we will prove that ` is a partial order, thus, in particular that it is transitive and anti-symmetric. But transitivity is just a (cut) rule: (cut)

`  `

`

Proving transitivity will thus be the proof of admissibility for the rule (cut), that is, each time the premisses are derivable, the consequence is derivable too. Or, equivalently, that the

496 Coherence and Transitivity of Subtyping as Entailment system extended with (cut) has the cut-elimination property. Notice that the proof that one can ‘eliminate cuts’ is non-trivial for weak systems (as we will discuss in Section 6.4) as, in general, these results and proofs are not inherited by subsystems. Cut-elimination is a fundamental property in constructive logical systems. It guarantees consistency as it shows that each derivation can be given a ‘minimal’ structure. In the various lambda-calculi, it relates deductions to computations as cut-elimination corresponds to reduction. In those systems, (cut) is a non-primitive rule, as it corresponds to the sequential application of (!-introduction) and (!-elimination) rules; moreover, it is usually considered as a (side) consequence of normalization. In contrast to this, for the purposes of subtyping, (cut) is as basic as transitivity. The seemingly simple logical system above will be shown to be complete and coherent as a logic for deriving subtyping relations. By completeness, we mean that  `  is derivable iff there is a term of type  !  that erases to the identity (cf. Section 3.1). Such a term is a coercion from  to  . By a result in [24], this characterization also guarantees completeness with respect to subtyping in all PER models, in the sense of [4]. Coherence will mean that derivability of  `  implies the existence of a unique coercion from  to  . One of its consequences will be anti-symmetry. Coherence will be easily shown in the simple four-rule system, while it requires cut-elimination if proved in the system extended with (cut). Once the formal system is fully written down, with proof-terms displayed (Section 2.2), the next thing to be described is term equality. The notion of equality we use here may be viewed as the ‘generalized dual’ of an early result of Girard[12]: in system F, there is no definable term that discriminates between types. Namely, there is no definable term J such that J applied to type  is 1 if  = , and is 0 if  6= . This idea was taken up in [19] by extending system F with the following axiom: (Axiom C)

 for X not

M : 8X: M1 = M2

free in 

Intuitively, as there are no type discriminators, (Axiom C) forces terms of universally quantified type, whose outputs live in the same type, to be constant. System F extended with (Axiom C) satisfies the Genericity Theorem [19], which states that if two second-order functions (of the same type) coincide on an input type, then they are, in fact, the same function. Equivalently, types are generic inputs to second-order functions. (Axiom C) was proposed following [12] and independently of [8]. The latter defines the system F , which includes the following inference rule, a generalization of (Axiom C) to subtyping: (eq appl2)

M : 8X:

[1 =X ]   [2 =X ] M1 = M2 : 



This rule is sound within a context with Cardelli’s rule of subsumption [9]: if N :  and    then also N :  . Observe that M gives outputs in (possibly) different types [1 =X ] and [2 =X ] . Then, intuitively, (eq appl2) says that if these two types (of outputs) have a common supertype , then the outputs are equal when seen as elements of . Thus, in particular, if  does not contain X free, one obtains (Axiom C). Note that (Axiom C) is valid in all proper models of system F, in particular in all ‘parametric models’ in the sense of Reynolds [23, 1]. Moreover, (eq appl2) holds in the only semantic

Coherence and Transitivity of Subtyping as Entailment 497 models of system F with subtyping, namely in PER models with the intended coercions (see Section 6.2). The relevance of (eq appl2) is that it allows one to prove the (categorical) universality of key definable constructs in System F (binary products, coproducts, existentials, etc.). However, (eq appl2) relies implicitly on the subsumption rule, i.e., if M :  and    , then M :  . And, as we will discuss in Section 6.2, subsumption has neither type-theoretic nor categorical meaning, even though it may have solid practical motivations and intuitive meaning. Subsumption may be avoided in (eq appl2) if coercions are used explicitly along the lines of the following informal rule where the terms N1 ; N2 represent coercions:

M : 8X:

y1 : [1 =X ] ` N1 :  y2 : [2 =X ] [M1 =y1 ]N1 = [M2 =y2]N2

` N2 : 

This will be formalized in our system as as the rule (eq appl2 co)) and that ‘coercion version’ of (eq appl2) will be used in our type-theory as an interplay between subtyping and equality. In conclusion then, our logic of subtyping will be based on the simple four-rule sequent calculus presented previously, and the proof terms will satisfy the usual equational rules plus a coercion version of (eq appl2). The paper is organized as follows. In Section 2.1, we recall System F and describe our logic of subtyping, referred to as System Co ` (where ‘co’ stands for coercions). We also prove in this section some lemmas about derivations in F and Co ` to help establish the relationship between these two systems, and notably that Co ` can be regarded as a subsystem of System F. Some aspects of the definition of cut (transitivity) in System Co ` are discussed and some syntactic conventions used in the paper are established. Section 3 then presents the main results of paper: completeness of System Co ` with respect to the semantics provided by interpretation in the type-free calculus, admissibility of transitivity, and coherence. We postpone to this section a discussion about equality of coercions since it will be more natural after the completeness result is established. At the end of this section, the consequences for the relation of bicoercibility are also considered. In Section 4, Co ` is proved equivalent to a system of subtyping formulated by Mitchell [24] in the sense that a subtyping judgement is derivable in Mitchell’s system iff the corresponding entailment is derivable in Co ` . Section 5 describes system Co ` extended with base types, and proves relevant properties of the extended system including conservativity, completeness, admissibility of transitivity and coherence. Finally, Section 6 describes related work concerning other proof-theoretic analyses and models of subtyping, along with a discussion of equality and a discussion of second-order cut-elimination proofs. Section 7 ends with some conclusions.

2 System F and coercions 2.1 System F

We first recall System F [12]. Then we define a sequent calculus of subtyping Co ` that may be regarded as a subsystem of System F. The language of System F has two kinds of expressions, types and terms, defined by the following syntax: (Types) (Terms)

 ::= X j  !  j 8X: M ::= x j x : :M j MN

j X:M j M

498 Coherence and Transitivity of Subtyping as Entailment We will use: ; ; ;  for types M; N; P; Q; R for terms X; Y; Z for type variables x; y; z for term variables An environment is a set of term variables with their types. We write ; x :  to extend with a new term variable x of type  , where x must not already occur in . We use the notation `F M :  for type assignment in system F. The following rules define valid type assignments. System F (ax)

! intro)

(

8

`F

`F

( intro)

`

; x: F M :  x : :M : 

`F

!

M :

X:M

:

 for X not free in the type of any free term variable in M

8X:

`

; x: F x : 

! elim)

(

`F

8

! `F `F MN :  `F M : 8X:

M :

`F

( elim)

M

N :

: [ =X ]

We write F V (M ) for the set of free type and term variables in M , and [N=x] and [=X ] in prefix position for term and type substitution, respectively, with usual renaming of bound variables to avoid capture. Reduction of terms is defined as usual by the closure of the following rules: ( 1 ) (1 )

(x : :M )N

! 1 [N=x]M

x : :Mx !1 M  for x 2= F V (M )

( 2 ) (2 )

(X:M )

! 2 [=X ]M

X:MX !2 M  for X 2= F V (M )

We will write !  for the transitive closure of all four reductions, ! for the closure of just 1 and 2 , and ! for the closure of 1 and 2 . We will also write ‘  -nf’ for normal form with respect to all four reduction relations, and ‘ -nf’ or ‘nf ’ for normal form with respect to 1 and 2 . Different equality relations on terms are defined by the symmetric, reflexive and transitive closure of the different reduction relations. In particular, we will write M =  N for equality with respect to !  , and M = N for equality with respect to ! . We reserve the notation M  N for syntactic identity of terms up to -conversion, and for types, equality, written  =  , is just syntactic identity up to -conversion.

2.2 System Co `

We now define our sequent calculus of subtyping, referred to as System Co ` or just Co ` . We will use `co for entailment in this system. We give two presentations of Co ` . The first presentation gives only the types involved in each judgment, which are of the form  `co  . This presentation emphasizes the intended subtyping relation between types ( `co  can be read ‘ is a subtype of  ’) but, of course, the calculus may be considered independently of this interpretation by referring just to its logical structure.

Coherence and Transitivity of Subtyping as Entailment 499 System Co ` (unlabelled)

`

(ax)

 co 

8

[=X ]

( left)

`  `co  0  !  `co 0 !  0

0 co 

!)

(

`co  8X: `co 

8

( n right)



`

! : : : (n !  ) : : :)  `co 1 ! : : : (n ! 8X: ) : : :)  co 1

 for 0 n, X not free in  nor in 1 ; : : : ; n

In the second presentation of the system, we label each type with a term, yielding judgments of the form x :  `co M :  . We will refer to the first presentation as the ‘unlabelled’ system, to the second as the ‘labelled’ system. System Co ` (labelled)

`

x :  co x : 

(ax)

` y :  `co N :  0 x :  !  `co x0 : 0 : [xM=y]N : 0 !  0 y :[=X ] `co M :  x : 8X: `co [x=y]M :  x0 : 0 co M : 

!)

(

8

( left)

8

 

( n right, 0 k n)  for X not free in  nor in 1 ; : : : ; n , M not of the form y:M 0 , and xk+1 ; : : : ; xn fresh

8



( n right, 0 n < k )  for X not free in  nor in 1 ; : : : ; n , M not of the form y:M 0 and  n+1 : : : (k



!

`

x :  co x1 : 1 : : : xk : k :M

`

: 1

! : : : (n !  ) : : :)

x :  co x1 : 1 : : : xk : k : : : xn : n :X:Mxk+1 : : : xn : 1 : : : (n X: ) : : :)

!

`

x :  co x1 : 1 : : : xk : k :M

`

: 1

!8

! : : : (n !  ) : : :)

x :  co x1 : 1 : : : xn : n :X:xn+1 : n+1 : : : xk : k :M : 1 : : : (n X: ) : : :)

!

!8

!  0) : : :)

The rule 8n -right has two cases in the labelled system.1 In both cases, the premiss has exactly the same form but we will shortly see that the two cases are disjoint. In the rest of the paper, we will refer to judgments of either presentation (labelled and unlabelled) of Co ` as ‘sequents’. We will use S for sequents and ; for derivations of sequents. Obviously a sequent  `  is derivable in unlabelled system Co ` iff  ` M :  is derivable 1 Tiuryn,

8

in [30], presents ( n right) in another, slightly simpler form.

500 Coherence and Transitivity of Subtyping as Entailment in the labelled system for some term M . If only the derivability of subtyping judgements is studied, then it is enough to consider the unlabelled system. Many of the proofs in this paper are by induction on the size of a derivation. This notion of size is defined as the total number of applications of rules in a derivation. In this work, a coercion is defined as follows: D EFINITION 2.1 (Coercion) A sequent x :  `co M :  is a coercion from  to  iff it is derivable in Co ` . As a linguistic shorthand, we will also refer to the term M as a coercion when the sequent `co M :  is derivable. To illustrate Co ` , observe that ? = 8X:X (the empty type) is provably a subtype of all types. The corresponding coercion is obtained from the following derivation:

x:

y :[=X ]X `co y :  (8 left) x : 8X:X `co x  :  2.3 Some structural lemmas L EMMA 2.2 (Structure of coercions) Assume that x :  `co M :  . Then, M has exactly one free term variable which is x, and x occurs exactly once in M and always at head position. Furthermore, each bound term variable occurs exactly once in the body of M . P ROOF. By induction on the size of the derivation of x :  `co

M : .

This lemma shows that coercions are linear terms (in the usual sense). L EMMA 2.3 (Coercions are functions of System F) If x :  `co M :  then x :  `F M :  . P ROOF. By induction on the size of the derivation of x :  `co L EMMA 2.4 (Coercions are in -nf) If x :  `co M :  then M is in -nf. P ROOF. By induction on the size of the derivation of creates a -redex.

x:

M : .

`co M :  , as no inference rule

This lemma shows that the terms labelling the two cases of (8n right) are unambiguously defined since M , in the assumption, must be in -nf. We need the following lemma about subterms in System F to prove two invertibility lemmas below. L EMMA 2.5 Let ` P :  . Consider an occurrence of a subterm Q of P . Let 0 be the environment that consists of all term variables free in Q. (Observe that 0 includes free variables of P that occur in Q, as well as bound variables of P bound by a  with scope including this occurrence of Q; the types of these free and bound variables of P are defined by and by the corresponding  binders). Then: 1.

0

`F Q :  with  uniquely determined by

0 and Q.

Coherence and Transitivity of Subtyping as Entailment 501 2. Let P 0 be some term containing at least one free occurrence of y . If P  [Q=y ]P 0 , then, since substitution excludes capture of free variables, 0 `F Q :  and 1 ; y :  `F P 0 :  , with 0 consisting of all free variables of Q, 1 consisting of all free variables of P 0 excluding y , and  uniquely determined by 0 and Q.

` P :  . In some places, we P ROOF. By structural induction on some F-derivation of use also the well-known facts of admissibility of weakening and strengthening of F-contexts. Uniqueness of  follows from the uniqueness of typing in F. Base case for 1 and 2: axiom ; x :  ` x : . Then,   ; Q  P  x, 0  x : . Inductive step for 1: if P  Q then 0  ;    , and we use strengthening of F-contexts. If not, consider the last rule of the derivation. (i) (! elim): `F M :  !  `F N :  `F MN : 

Since we consider a single occurrence of Q, it belongs to M or N . the inductive hypothesis. (ii) (! intro):

0 is not changed. Apply

; x :  `F M :  `F x : :M :  !  is not changed. If x 2 F V (Q) the reason for its inclusion in 0 is changed, 0

Observe that else it doesn’t belong. Apply the inductive hypothesis. (iii) (8 intro):

`F M :  `F X:M : 8X:

 for X not free in the type of

any free term variable in M

Q is a subterm of M , (iv) (8 elim):

0 is not changed. Apply the inductive hypothesis.

`F M : 8X:

`F M : [=X ]

Since  is not a subterm, Q is a subterm of M . 0 is not changed. Apply the inductive hypothesis. Inductive step for 2: by 1) applied to each occurrence of Q,  is determined by 0 . If P  Q then   ; P 0  y , and the derivation 1 ; y :  ` y :  is an axiom. If not, we consider the last rule of the derivation as above. (i) (! elim):

`F M :  !  `F N :  `F MN :  M and N have the structure M  [Q=y]M 0; N  [Q=y]N 0 and in at least one of two

cases, the substitution is not dummy. 0 is not changed. By inductive hypothesis, we obtain 0 `F Q : . Let   1 and   1 be the lists of free variables of M 0 ; N 0 respectively. Note that if one of M 0 ; N 0 does not contain y , it is identical to M; N , respectively. If y 2 F V (M 0 ); F V (N 0 ), then by inductive hypothesis, ; y :  `F M 0 :  !  , ; y :  `F N 0 : . Using weakening and (! elim), we have 1 ; y :  `F M 0N 0 :  .

502 Coherence and Transitivity of Subtyping as Entailment If one of the substitutions M  [Q=y ]M 0 ; N  [Q=y ]N 0 is dummy, we don’t need the inductive hypothesis for this term but for the other, proceed as above. (ii) (! intro):

; x :  `F M :  `F x : :M :  !  0 is not changed. In this case, because x cannot belong to F V (P ). Apply the

Observe that inductive hypothesis and proceed as above. (iii) and (iv) (8 intro) and (8 elim): proceed as in the previous cases.

The following lemmas show more clearly the relationship between derivations in system F and in Co ` .

L EMMA 2.6 (F-invertibility with respect to the rules of Co ` ) Let x :  ` M :  be a sequent and assume that it is obtained from some other sequent(s) S1  y1 : 1 ` M1 : 1 ; : : : ; Sn  yn : n ` Mn : n by application of one of the rules of Co ` , where yi does occur and is the only free variable of Mi . (Note that Si are not necessarily derivable in either Co ` or F.) Then, if x :  ` M :  is derivable in F, then S1 ; : : : ; Sn are also derivable in F. P ROOF. First, note that if x :  ` M :  is derivable in F, then M cannot contain any free variables other than x. Proceed by case analysis of the three System Co ` rules. (i) (!):

x0 : 0 ` M :  y : ` N : 0 x :  !  ` x0 : 0 : [xM=y]N : 0 !  0

By assumption, the conclusion of this rule is derivable in System F, and thus also the following sequent: x :  ! ; x0 :  0 `F [xM=y ]N :  0 . Apply Lemma 2.5 to this sequent taking P  [xM=y]N , Q  xM and P 0  N . 0 = fx :  ! ; x0 : 0 g, 1 = ; and x :  ! ; x0 : 0 `F xM :  , y :  `F N :  0 . Lemma 2.5 may be applied again to x :  ! ; x0 :  0 `F xM :  taking M as Q. Note that x0 : 0 is the only free variable of M , and we have x0 : 0 `F M : . (ii) (8 left):

y :[=X ] ` M :  x : 8X: ` [x=y]M : 

By assumption, the conclusion of this rule is derivable in System F. Apply Lemma 2.5 taking P  [x=y]M , Q  x and P 0  M . 0 = fx : 8X:g, 1 = ; and x : 8X: `F x : [=X ], y :[=X ] `F M :  . (iii) (8 right): it suffices to use properties of -introduction in F; Lemma 2.5 is not needed. L EMMA 2.7 (One case of invertibility in Co ` ) If x :  `co x1 : 1 : : : xk : k :S : 1 ! : : : (n

! [=Y ] ) : : :) then x :  1 : : : xk : k :S : 1 ! : : : (n ! 8Y: ) : : :) P ROOF. By induction on the Co ` derivation of the judgement containing S.

`co x1 :

Base case: since the axiom does not have the form described in the assumption of the lemma, the lemma is vacuously true. Inductive step: analysis of the rules of Co ` shows that x :  `co x1 : 1 : : : xk : k :S : 1 ! : : : (n ! [=Y ] ) : : :) cannot be the conclusion of 8n right.

Coherence and Transitivity of Subtyping as Entailment 503

(!), then its right premiss must be of the form y :  `co x2 : ! : : : (n ! [=Y ] ) : : :). Apply the inductive hypothesis and the

If it is the conclusion of

2 : : : xk : k :S 0  : 2

same rule. If it is the conclusion of (8 left), then if S is different from the variable x, apply the inductive hypothesis and the same rule. Suppose S  x. The term x1 : 1 : : : xk : k :y cannot be obtained by one of the rules of Co ` if the prefix is non-empty. Thus, the prefix is empty and the premiss of (8 left) is the axiom y : [=X ] `co y : [=Y ] . The judgement x : 8X: `co x : 8X: is also an axiom. This proves the lemma. Clearly, Co ` is strictly weaker than System F. Consider, for example:

x : 8X:X but

`F x(8X:X ! 8X:X )x : 8X:X

x : 8X:X 6 `co x(8X:X ! 8X:X )x : 8X:X

Even restricting terms to linear ones, and environments to those containing exactly one variable, is not enough to produce a coercion. For example,

x :  ! (

! ) `F y : :z : :(x z y) :  ! ( ! )

is not a coercion. Theorem 3.3 (completeness) will characterize those System F functions that are coercions. By the Curry–Howard isomorphism, Co ` is thus a proper subsystem of intuitionistic (linear) second-order propositional logic. This is only natural since coercions are intended to represent ‘inclusions’ of types. Clearly, there is no reason why arbitrary F-entailment (or even isomorphisms of types) should be interpreted as inclusion.

2.4 Special forms of derivations D EFINITION 2.8 (Pure variable derivation) A pure variable derivation is a derivation in which no type variable occurs both free and bound in the same sequent. This notion of a pure variable derivation is comparable with that used by Kleene [17]. L EMMA 2.9 Every derivation in Co ` is equal, up to =, to any derivation obtained by safely renaming bound type variables in types and terms, without capturing free type variables. P ROOF. By induction on the size of the derivation. In the case of (8 left), use the identity does not occur free in  (otherwise X 0 would be captured), and in the case of (8n right), uniformly substitute [X 0 =X ] in the derivation of the premiss and then use (8n right) with X 0 instead of X (the side-condition is used). For (!), it is enough to use the induction hypothesis.

[=X ] = [=X 0 ]([X 0 =X ]) where X 0

L EMMA 2.10 Every derivation in Co ` is equal, up to =, to a pure variable derivation. P ROOF. By induction and using the previous lemma (always choosing fresh bound variables).

504 Coherence and Transitivity of Subtyping as Entailment To illustrate the utility of pure variable derivations, consider the following derivation:

r l y :  `co N :  0 (80 right) x0 : 0 `co M :  y :  `co X:N : 8X: 0 (!) x :  !  `co x0 : 0 : X : [xM=y]N : 0 ! 8X: 0 By the side-condition on (80 right), we know that X is not free in  . Our aim now is to permute the applications of (80 right) and (!) as follows: l r 0 0 x :  `co M :  y :  `co N :  0 (!) x :  !  `co x0 : 0 : [xM=y]N : 0 !  0 (81 right) x :  !  `co x0 : 0 : X : [xM=y]N : 0 ! 8X: 0 However, this second derivation is only possible if the variable X is not free in  nor in  0 , as required by (81 right). This is the case if the first derivation is a pure variable one, for then the X bound in 8X: 0 would be fresh.

From now on, we will work exclusively with pure variable derivations, referring to them as just ‘derivations’. L EMMA 2.11 The following pairs of derivations are equal in Co ` up to =:

1.

2.

(8n right) S1 (8 left) S2 l S2

=

r (8n right) S1 (!)

(8 left) S10 (8n right) S2 =

l

r (!) 0 S1 (8n+1 right) S2

P ROOF. By simple consideration of the types and terms involved in each sequent. Case 1 applies to any kind of derivation whereas case 2 requires a pure variable derivation. Indeed, the proof of case 2, permuting (8n right) for n = 0 and (!), is given by the examples of pure variable derivations above. It is checked directly that the terms labelling the end sequents of both the left and right derivations above, are identical (thus, the derivations are equal up to =). Note that (8n right) becomes (8n+1 right) when it is permuted downwards with (!). Note also that case 1 applies only when (8n right) is permuted downwards with (8 left), not conversely. D EFINITION 2.12 (Atomic derivation) An atomic derivation is one in which all axioms are of the form x : X `co x : X .

L EMMA 2.13 Every derivation in Co ` is equal, up to 2 It

= , to some atomic derivation 0 .2

will be shown in the next section that coercions are closed under  -conversion.

Coherence and Transitivity of Subtyping as Entailment 505 P ROOF. Let be a derivation of x :  `co M :  . If is not atomic then an axiom of the form x :  `co x :  where  is not a type variable, is used at a leaf. It suffices to prove the thesis for such axioms by induction on the length of types. The base case, when  is a type variable, is obvious. Inductive case:  = 8X:0 . Construct the following derivation:

y :[X=X ]0 `co y : 0 (8 left) x : 8X:0 `co xX : 0 (80 right) x : 8X:0 `co X:xX : 8X:0 where X:xX = x and apply induction to y : 0 `co y : 0 . Inductive case:  = 1 ! 2 . Construct the following derivation: y : 1 `co y : 1 y0 : 2 `co y0 : 2 (!) x : 1 ! 2 `co y : 1 :xy : 1 ! 2 where y : 1 :xy

= x and apply induction to y : 1

`co y : 1 and y0 : 2 `co y0 : 2 .

In the following lemma, we ‘transform’ atomic derivations to a useful form (for later purposes) by permuting rules as appropriate. L EMMA 2.14 Let be an atomic derivation of x :  `co M : 1 ! (: : : (n ! 8X: ) : : :). Then, there exists an atomic derivation 0 = , with and 0 of equal size, and where (8n right) is the last rule used in 0 . P ROOF. Proceed by induction on the derivation of . Clearly, (ax) cannot have been used, and if (8n right) was used last in , then we are done. In the other two cases, use Lemma 2.11 to permute rules as follows: Case: (8 left) used last. Apply the induction hypothesis to the premiss then permute (8 left) with (8n right) (Lemma 2.11, case 1). Case: (!) used last. Apply the induction hypothesis to the left premiss then permute (!) with (8n 1 right), the latter becoming (8n right) in the final derivation (Lemma 2.11, case 2). L EMMA 2.15 Let 1 and 2 be two atomic derivations of x :  `co Then, there exist atomic derivations 01 = 1 and 02

M :  and x :  `co N :  respectively. = 2 such that 01 and 02 end with the

506 Coherence and Transitivity of Subtyping as Entailment

2.5 The transitivity problem In section 3.3, we will show that `co is a transitive relation. That is, in the unlabelled system, the rule: (cut)



`co 



`co

 

`co 

is admissible. Its labelled version needs some discussion. A ‘naive’ variant would be:

x:

`co M :  y :  `co N :  x :  `co [M=y]N : 

But note that the term [M=y ]N may not be a coercion; in particular, it may not be in nf (cf. Lemma 2.4). However, without making use of the strong normalization property of System F, we can prove that [M=y ]N normalizes: L EMMA 2.16 Assume that x :  `co M :  and y :  `co N : . Then, there exists a unique, finite path of -reductions from [M=y]N . Consequently, there exists a unique -nf of [M=y]N . P ROOF. By the linearity of coercions (Lemma 2.2), M and N are linear, and so the term [M=y]N is also linear (remember that the only free term variable of M is x and of N is y). Observe then that, because M and N are in -nf (Lemma 2.4), only one -redex may exist in [M=y ]N . And each subsequent -reduction from [M=y ]N preserves linearity and may introduce only one new -redex. Furthermore, each such -reduction decreases the number of s, say n, originally in [M=y ]N . Thus, there is a unique path, with at most n steps, of -reductions from [M=y]N . The existence and unicity of the -nf of [M=y ]N now motivates the following alternative version of the labelled (cut)-rule, where nf ([M=y ]N ) means the -nf of [M=y ]N : x :  `co M :  y :  `co N :  (cut) x :  `co nf ([M=y]N ) :  Our aim will be to show that x :  `co nf ([M=y ]N ) :  is actually a coercion (i.e. derivable in Co ` ). In other words, that the rule above is admissible. Note that (cut) is a ‘cut’ rule in the usual sense of sequent calculi. Thus, proving the admissibility of (cut) is equivalent to proving a cut-elimination theorem for the extended system Co ` plus (cut). Observe too that subject reduction is vacuously true in the extended system: all terms in Co ` plus (cut) are in -nf.

3 Completeness, transitivity and coherence 3.1 Completeness of Co `

The standard notion of erasure, defined as follows, will serve to characterize coercions. D EFINITION 3.1 (Erasure) The erasure of a polymorphic term to a type-free term is defined by:

erase(x)  x erase(x : :M )  x:erase(M ) erase(X:M )  erase(M )

erase(MN )  erase(M )erase(N ) erase(M )  erase(M )

Coherence and Transitivity of Subtyping as Entailment 507 By straightforward induction on derivations, one shows that the erasure of a coercion  reduces to the identity (in the type-free -calculus). L EMMA 3.2 If x :  `co M :  then erase(M )

! x.

Conversely, we will now show that any System F term in -nf whose erasure  -reduces to a term variable is a coercion. This will give a complete characterization of coercions. Moreover, it will be a key step in the proof of transitivity of `co and in relating our system to that of Mitchell [24] (Section 4). The proof proceeds by syntactic analysis of the normal form. T HEOREM 3.3 (Completeness) Let M be a term in -nf such that x :  `co M :  .

x:

`F M : 

and

erase(M )

! x.

Then,

P ROOF. By the assumption that M is in -nf, M must have the following structure:

 Y~1 : y1 : 1 : : : : Y~k : yk : k : Y~k+1 : x0 ~1 M1 : : : ~n Mn ~n+1 with each subterm Mi , for 1  i  n, in -nf. Furthermore, the assumption erase(M ) ! x implies that x0  x, k = n and erase(Mi ) ! yi for 1  i  n. The fact that erase(Mi ) ! yi means that yi is the only free term variable in Mi . So, by lemma 2.5: yi : i `F Mi : i . (The concrete form of i may be found but we don’t need it.) We thus have that the subterms Mi , 1  i  n, are in -nf with erase(Mi ) ! yi and yi : i `F Mi : [~1 =X~ 1 ; : : : ; ~i =X~ i ]i . The proof now proceeds by induction on the size of M

M.

Base case: M  x is trivial. Inductive step: three cases as follows. ~k is non-empty. M then has the form (a) Case: at least one of the lists Y

M  y1 : 1 : : : : yi 1 Y~i : yi : i : : : : Y~n :yn : Y~n+1 : x~1 M1 : : : ~n Mn ~n+1 ~i is the leftmost of such lists. The type  of M is where Y  = (1 ! : : : ! 8Y~i :(i ! : : : 8Y~n :(n ! 8Y~n+1:0 ) : : :): ~i and the type  0 of M 0 : Consider the term M 0 obtained by erasing only Y  0 = (1 ! : : :

! (i ! : : : 8Y~n :(n ! 8Y~n+1:0 ) : : :): The judgement x :  ` M :  is derivable from the judgement x :  ` M 0 :  0 by repeated (8i right). By Lemma 2.6, x :  `F M :  implies x :  `F M 0 :  0 . Obviously erase(M 0 )  erase(M ) ! x. Thus, by inductive hypothesis, x :  `co M 0 :  0 . Now, by (8i right), x :  `co M :  . ~i are empty but ~1 is non-empty. The term M has the structure (b) Case: all Y M

 y1 : 1 : : : yn : : x~1 M1 : : : ~n Mn ~n+1 :

Notice that x :  `F x~ 1 : 0 for certain 0 since x~1 is a subterm of an F-term (Lemma 2.5). 1 : 0 (it may be derived by 8left in Co ` ). Let M 0 be obtained by Moreover, x :  `co x~ replacement of x1 by a fresh variable z .

508 Coherence and Transitivity of Subtyping as Entailment The judgement x :  `F M :  is derived by (8 left) from z : 0 ` M 0 :  . By Lemma 2.6, z : 0 `F M 0 :  . Obviously, erase(M 0 ) ! z because erase(M ) ! x. By inductive hypothesis, z : 0 `co M 0 :  and by (8 left), x :  `co M :  . ~i are empty and ~1 is empty but n  0 (else we have the base case). The type (c) Case: all Y  of M is

 = (1 ! : : :

! (n ! 0 ) : : :):

 0 = (2 ! : : :

! (n ! 0 ) : : :):

We need also the type  0

By inductive hypothesis, we may assume that the subterm M1 is a coercion: y1 : 1 `co

1 .

M1 :

Consider the term M 0 obtained by erasure of y1 and replacement of xM1 by a fresh variable z ; the type of z is easily derived since xM1 is an F-term. The type of x must be some   1 ! 10 and x :  `F xM1 : 10 . The judgement x :  `F M :  is derived by the rule ! from z : 10 ` M 0 :  0 . Thus, by Lemma 2.6 z : 10 `F M 0 :  0 . Obviously, erase(M 0 ) ! z . By inductive hypothesis, z : 10 `co M 0 :  0 and by the rule (!), x :  `co M :  . Since  -reductions and expansions do not affect the property erase(M ) the following corollary.

C OROLLARY 3.4 For M in -nf, assume that x :  `co M 0 :  .

x:

`F M : 

and

M

! x, we have

! M 0 . Then, x :  `co M : 

iff

Since coercions are in -normal form, this amounts to the subject reduction property for coercions.

3.2 Equality =co

Equality of coercions is defined, essentially, by  -equality plus a coercion version of the F [8] rule (eq appl2). Since coercions are in -normal form, -equality need not be considered and we showed previously that coercions are closed with respect to  -equality. The corecion version of (eq appl2) is motivated by Mitchell’s system and semantical considerations based for example on PER models (see Section 6.3). Let us write x :  `co M =co N :  to mean that M and N are equal coercions from  to  . This relation is defined as follows. D EFINITION 3.5 (Equality of coercions)

=co is the least equivalence relation generated by -convertibility and the two rules x :  `co M :  x :  `co M 0 :  M = M 0 (eq  ) x :  `co M =co M 0 :  (eq appl2 co)

y1 : [1 =X ] `co N1 :  y2 : [2 =X ] `co N2 :  x : 8X: `co [x1 =y1]N1 =co [x2 =y2 ]N2 : 

Coherence and Transitivity of Subtyping as Entailment 509 plus two other rules, (eq !) and (eq 8n right), which state, respectively, that (!) and (8n right) preserve equality of coercions. (eq !) and (eq 8n right) are given in full in the appendix. Note that (eq appl2 co) implies that (8 left) preserves equality of coercions: given y :   and N1  M and

[=X ] `co M =co N :  , apply (eq appl2 co) with 1  2 N2  N to obtain x : 8X: `co [x=y]M =co [x=y]N :  .

The rules (eq !) and (eq 8n right) are derivable in system F using -convertibility. Rule (eq appl2 co), on the other hand, is not derivable in system F as it equates terms that are not -convertible, as shown by the following instance of the rule.

`co y1 : Y y2 : [2 =X ]Y `co y2 : Y (eq appl2 co) `co [x1 =y1 ]y1 =co [x2 =y2]y2 : Y Thus, x1 =co x2 whereas x1 = 6  x2 in general, i.e. they are not equal in System F. y1 : [1 =X ]Y x : 8X:Y

Clearly, though, x1 and x2 are equal in F [8] using the corresponding ‘non-coercion’ rule (eq appl2); see Section 1.1. Here are two examples to illustrate =co (recall that

F

510 Coherence and Transitivity of Subtyping as Entailment

3.3 Transitivity of

`co

We now prove that `co is a transitive relation. More formally, we will prove that:

x :  `co M :  y :  `co N :  x :  `co nf ([M=y]N ) : 

(cut)

is an admissible rule in Co ` . L EMMA 3.7 Assume that x :  `co M :  and y :  `co N : . Then erase (nf ([M=y ]N )) = nf (erase ([M=y ]N )).

P ROOF. Note first that, by an argument similar to the proof of Lemma 2.16, there exists a unique -nf of erase([M=y ]N ), i.e. nf (erase([M=y ]N )) does indeed exist. The only difference is that there will be no 2 -reductions in the path from erase([M=y ]N ). Next, by Lemma 2.16 directly, nf ([M=y ]N ) exists. Finally, the erasure of nf ([M=y ]N ) is clearly -equivalent to the unique -nf of erase([M=y]N ). T HEOREM 3.8 (Transitivity) (cut) is an admissible rule in Co ` .

P ROOF. Assume that x :  `co M :  and y :  `co N : . By Lemma 2.16, nf ([M=y ]N ) exists. Since x :  `F [M=y ]N : , then, by subject reduction in System F, x :  `F nf ([M=y ]N ) : . Furthermore, since coercions erase to the identity (Lemma 3.2), erase(M ) ! x and erase(N ) ! y. Then:

erase(nf ([M=y]N )) =



nf (erase([M=y ]N )) by Lemma 3.7 nf ([erase(M )=y ]erase(N ))

= x:

Finally, by completeness of Co ` (Theorem 3.3), x :  `co nf ([M=y ]N ) : .

Notice that proving the admissibility of (cut) for the system Co ` is equivalent to proving a cut-elimination theorem for the extended system Co ` +(cut). See Section 6.4 for a discussion of second-order cut-elimination proofs. As pointed out in the introduction, if we had taken (cut) as a primitive of the system, then we would have had to eliminate it in any case in order to prove coherence. Moreover, (cut) as primitive would imply that coercions could ‘compute on themselves’ (even without inputs), whereas the coercions provided by Co ` are guaranteed to be in -nf. Thus, they only transform an element of a type to a supertype without any other kind of computation.

3.4 Coherence of Co `

By coherence, we mean that a coercion from type  to type  is independent of its derivation in Co ` in the sense that, if a coercion from  to  exists, then it is unique, up to =co . We are now in a position to prove the coherence of Co ` derivations. Note that this is where =co is used, and thus the strength of the rule (eq appl2 co); we display it first in a lemma. L EMMA 3.9 Let 1 and 2 be two derivations of x :  `co Then, 1 =co 2 .

M :

and x :  `co

N :

ending with (8 left).

Coherence and Transitivity of Subtyping as Entailment 511 P ROOF. Let   8X: 0 , 2 are respectively:

M

 [x1 =y1 ]N1 and N  [x2 =y2 ]N2 . The final steps in 1 and

l y1 :[1 =X ]0 `co N1 :  (8 left) x : 8X:0 `co [x1 =y1]N1 : 

r y2 :[2 =X ]0 `co N2 :  (8 left) x : 8X:0 `co [x2 =y2 ]N2 : 

Now, use (eq appl2 co) on both premisses:

l r 0 y1 :[1 =X ] `co N1 :  y2 :[2 =X ]0 `co N2 :  (eq appl2 co) x : 8X:0 `co [x1 =y1 ]N1 =co [x2 =y2]N2 :  Hence, by definition, 1

=co 2 . T HEOREM 3.10 (Coherence of Co ` derivations) Let 1 and 2 be two derivations of x :  `co M :  1 =co 2 .

and x :  `co

N :

respectively. Then,

P ROOF. By Lemma 2.13, we may assume 1 and 2 to be atomic derivations, up to = . The proof then proceeds by induction on n = maxfsize( 1 ); size( 2 )g. By Lemma 2.15, there exist atomic derivations 01 = 1 , 02 = 2 such that 01 and 02 end with the same rule and have the same size as 1 and 2 (respect.). Notice that, if the final rule applied is (!) or (8n right), then the premisses in both derivations are the same. Base case, n = 0: Both derivations are atomic axioms (trivial). Inductive step, n  1: There are three subcases: Case: (8 left) is used last. Then, Lemma 3.9 suffices. Note that, in this case, we do not need to apply the induction hypothesis. Cases: (!) or (8n right) are used last, the same in both derivations. Then, the respective premisses of the last rule in both derivations coincide. Use then the induction hypothesis on these premisses, followed by the equality rules (eq !) or (eq 8n right). Thus 1 =co 2 . It will be an easy corollary of coherence and the transitivity of `co (Section 3.3) that x : `co M :  and x :  `co N :  implies that  is isomorphic to  . In other words, `co is anti-symmetric up to isomorphism.



3.5 Bicoercibility

Consider now the term model j Co ` j of Co ` , i.e., the structure whose objects are types and arrows are coercions. j Co ` j is a category. Indeed, by (ax), it contains all identities. By the transitivity of `co , coercions (arrows) compose: just observe that if

x :  `co M :  and y :  `co N :  then there exists P such that x :  `co P :  and P is unique by coherence. As for associativ-

ity, this is again given by coherence. j Co ` j is even a partial order: by the corollary below, anti-symmetry of `co is a consequence of coherence and transitivity of entailment. Define first the following relation of bicoercibility between types:

512 Coherence and Transitivity of Subtyping as Entailment D EFINITION 3.11 (Bicoercibility) =b  , iff Two types  and  are bicoercible, written  



`co  and  `co .

=b 8X: for X not free in  . Now, recall that, in a category, two objects For example,   A and B are isomorphic, A  = B , if there are maps f : A ! B and g : B ! A such that g Æ f = id and f Æ g = id. Thus, one can prove the following: C OROLLARY 3.12 (Anti-symmetry) =b  then   If   =  in j Co ` j.

P ROOF. By assumption, x :  `co M :  and y :  `co N :  . By (cut), we obtain x :  `co nf ([M=y ]N ) :  and y :  `co nf ([N=x]M ) :  , then, by coherence, nf ([M=y ]N ) =co x and nf ([N=x]M ) =co y. Note that bicoercibility is strictly stronger than isomorphism: the type  ! ( ! ) is isomorphic to  ! ( ! ) (see [28, 3] for a characterization) but it is clearly not a subtype, and so not bicoercible. Tiuryn in [29] has shown that bicoercibility is decidable, while Tiuryn and Urzyczyn [31] have shown that coercibility `co , that is, subtyping, is undecidable. As pointed out, j Co ` j is a category and a partial order. This allows a preliminary observation on adding base types (int, real, etc.) with axioms introducing `co between these types (e.g. int `co real). In short, one obtains the freely generated partial order, from these base types, by our axioms and rules. A proof-theoretic analysis of this fact will be given in Section 5.

4 Mitchell’s subtyping system In [24], a ‘retyping function’ is defined as a typed term in System F whose erasure  -reduces to the identity. In [24, Lemma 9], it is then shown that  is a subtype of  in all ‘simple inference models’ (as defined in [24, Section 4.2]) if and only if there is a retyping function from  to  . Thus, our Theorem 3.3 also yields semantic completeness, in the sense of Mitchell, for Co ` . In this section, we give a direct comparison of Co ` to Mitchell’s axiomatic approach to subtyping, presented here in a revised (but clearly equivalent) way. Mitchell’s Subtyping System

(ax)



(!)

0     0 ( !  )  (0 !  0 )

(8 intro)

 for X not



(trans)

 8X:

  

(8 subst)

 8X:  8X:

(8 elim)

8X:  [=X ]

free in 

(8! distr)



8X:( !  )  (8X:) ! (8X: )

Coherence and Transitivity of Subtyping as Entailment 513 It is not hard to show that the unlabelled system Co ` and Mitchell’s system are equivalent. T HEOREM 4.1 iff  `co



.

Proof. Clearly, rules (ax) and (!) are identical in the two systems (with `co for ). For the implication from left to right, the rule (trans) in Mitchell’s system above corresponds to the Co ` rule (cut) which, by Theorem 3.8, is admissible for Co ` ; the other cases in this direction are proven as follows: Case: (8 intro) is derivable in Co ` since X is not free in 

`co  (8 right)  `co 8X: 0 

Case: (8 elim) is derivable in Co `

[=X ] `co [=X ] 8X: `co [=X ] (8 left) Case: (8 subst) is derivable in Co `

[X=X ] `co  ) 8X: `co  ((880left right) 8X: `co 8X: Case: (8! distr) is derivable in Co ` +(cut) 8X: `co   `co  8X:( !  ) `co  !   !  `co (8X:) !  (!) (cut) 8X:( !  ) `co+cut (8X:) !  (81 right) 8X:( !  ) `co+cut (8X:) ! (8X: ) Conversely, the remaining cases of the implication from right to left are proven as follows: Case: (8 left) is derivable in Mitchell’s system by

8X:  [=X ]   using (trans) on (8 elim) and the premiss of (8 left), i.e., [=X ]   . Case: (8n right) is derivable in Mitchell’s system as follows:   8X: by (8 intro) and (8n right) side-condition X not free in   8X:(1 ! : : : (n !  ) : : :) by (8 subst) on (8n right) premiss  (8X:1 ) ! 8X:(2 ! : : : (n !  ) : : :) by (8! distr)  (8X:1 ) ! ((8X:2 ) ! 8X:(3 ! : : : (n !  ) : : :)by (8! distr) and (!) .  ..  (8X:1 ) ! ((8X:2 ) ! : : : ((8X:n ) ! (8X: ) : : :) by (8! distr) and (!)  1 ! (8X:2) ! : : : (8X:n ) ! (8X: ) : : :) by (!); (8 intro) and (8n right) side-condition X not free in 1 ..  .  1 ! (2 ! : : : (n ! 8X: ) : : :) by (!); (8 intro) and (8n right) side-condition X not free in n .

514 Coherence and Transitivity of Subtyping as Entailment

5 Coercions and base types This section extends the results of the previous sections to subtyping systems containing coercions between base types.

5.1 Systems F + B and Co `+ B Consider extending the language of Co ` with type constants 1 ; 2 ; 3 ; ::: For example, these could be base types such as bool, int, real. As in the case of Co ` , the extended system will be considered together with the corresponding extension of System F. System F + B is obtained by adding to the language of System F type constants 1 ; 2 ; 3 ; ::: and term constants ci;j for each i; j such that a subtyping relation holds between i ; j , and adding an axiom (with an arbitrary context)

`F +B ci;j : i ! j :

System Co `+ B is defined in its labelled and unlabelled versions, with the language extended in the same way as for System F. In the labelled Co `+ B system, the following Gentzen-style rule is added:

x :  ` M : i x :  ` ci;j M : j It is a derivable rule with respect to system F + B . In its unlabelled version, this rule takes (i

 j )

the form: (i

 j )

 ` i  ` j

Let `co+B denote entailment in this extended system. What happens then to the subtyping partial order and the coherence and transitivity properties of the ‘pure’ calculus Co ` ? First, observe that the expected subtyping judgment x : i `co ci;j x : j is easy to derive: just take  = i and M  x in the above rule. Indeed, as we now show, the new constants and rules introduce no new subtyping judgments beyond those expected. In a sense, base coercions act like variables; they do not compute. L EMMA 5.1 (Conservativity of Co `+ B with respect to Co ` ) Assume that types  and  do not contain occurrences of base types. Consider x :  `co+B M :  (in the labelled system). If M contains neither base types nor ci;j , then x :  `co M :  . In the unlabelled system, if  `co+B  then  `co  .

P ROOF. First, let be a Co `+ B derivation (labelled or unlabelled) of a sequent S , where S may contain base types. Observe that by uniformly substituting in , a fresh variable X for all base types 1 ; : : : n and omitting the (i  j ) rules, we obtain a Co ` derivation of [X=1; : : : ; X=n]S . This is easily shown by induction on Co `+ B derivations. The lemma then holds as a corollary of this observation.

5.2 Structural lemmas concerning base types We will require notions of equality on Co `+ B derivations, defined similarly to the equalities on Co ` derivations (see Section 3.2), and for which we will use the same symbols: =, = ,

Coherence and Transitivity of Subtyping as Entailment 515

=co.

At the moment, we need only =. (Recall that two derivations are = when the proof terms labelling their final sequents are syntactically identical.)

L EMMA 5.2 A (i  j ) rule cannot be applied at any point after an application of either (8n right) or (!). L EMMA 5.3 A (i  j ) rule can be permuted with (8 left) in both directions (the labels are not changed and the size of the new derivation is the same as the original one):

S1 (   ) S2 (8ileft) j S3

=

S1 (8 left) S20 (  j ) S3 i

L EMMA 5.4 Assume  `co+B  with derivation . (1) If  contains neither ! nor 8, then may contain applications of only (ax), (i and (8 left). (2) If  contains neither ! nor 8, then may contain applications of only (ax), (i and (80 right).

 j ),  j ),

P ROOF. By induction on the size of in both cases. Note that, in both cases, the derivations have just one branch. L EMMA 5.5 (Subtypes and supertypes of a base type) (1) Let  `co+B . Then,  = 8Xl : : : 8X1 : 0 , for some l  0 with  0 either a base type 0 or a variable Xi for i 2 1 : : : l. (2) Let  `co+B  . Then,  = 8Xm : : : 8X1 :0 for some base type 0 and m  0.

P ROOF. Consider statement (1). By Lemma 5.4 case 1, the derivation of  `co+B  consists of applications of only (ax), (i  j ), and (8 left). The derivation is linear, not a tree. The applications of (i  j ) can be permuted by Lemma 5.3 with the applications of (8 left), if any. Furthermore, (ax) must be of the form 0 `co+B 0 . Thus, an equal derivation of  `co+B  may be constructed with the following structure:

0

`co+B 0 .. .

(ax) (8 left . ) .. (8 left)

8Xl : : : 8X1:0 `co+B 0 0 8Xl : : : 8X1:0 `co+B p ( .. p ) .. .

. .. .

8Xl : : : 8X1:0 `co+B q (  ) 8Xl : : : 8X1:0 `co+B  q where k  0 is the number of applications of (i  j ) rules, i.e., (0  p ), . . . , (q  ), and l0  0 is the number of applications of (8 left). Note that l  l0 .

516 Coherence and Transitivity of Subtyping as Entailment Consider statement (2). By Lemma 5.4 case 2, the derivation of  `co+B  can contain applications of only (ax), (i  j ), and (80 right). By Lemma 5.2, all the applications of the (i  j ) rules must appear before those of (80 right). Furthermore, (ax) must be of the form  `co+B . Hence, the derivation has the following structure:

 `co+B   `co+B p .. .

 `co+B q  `co+B 0  `co+B 8X1:0 .. .

(ax) (   p ) .. .. .. .. .

(q  0 ) (80 right) .. .. ..

`co+B 8Xm : : : 8X1:0 (80 right) where k  0 is the number of applications of (i  j ) rules, i.e., (  p ), . . . , (q  0 ), and m  0 is the number of applications of (80 right). 

C OROLLARY 5.6 (1) Let x :  `co+B M : . Then, M  ck (: : : c1 (xl0 : : : 1 ) : : :) where c1 ; : : : ; ck denote ci;j for corresponding i; j and [l0 =Xl ] (: : : ([1 =X ]) : : :) = 0 3 . (2) Let x :  `co+B M :  . Then, M  Xm : : : X1 : ck (: : : c1 x) : : :) Note that the permutations of rules described in the lemma change neither labels nor the size of the derivations. Lemma 5.5 shows that adding extra base types and base coercions changes the subtyping partial order in a reasonable way: each base type has only itself, the empty type, or other base types, given by the extra subtyping rules, as its subtypes (up to bicoercibility, of course). =b 0 or   =b 8X:X , while for case 2,   =b 0 . Indeed, for case 1 of Lemma 5.5,  

Modifications to structural lemmas of Section 2.3 Adding base types changes little of most of the formulations and proofs of Section 2.2. Systems Co ` and F should be replaced everywhere by Co `+ B and F + B respectively. Sometimes an extra case should be considered in the proof: for example, Lemma 2.5 for the case when the rule r is a base type rule. The definition of an atomic derivation (Definition 2.12) is modified as follows. D EFINITION 5.7 (Atomic derivation with base types) An atomic derivation is one in which, for any base type , all axioms are of the form x : X `co+B x : X or x :  `co+B x :  in the labelled system, and of the form X `co+B X or  `co+B  in the unlabelled system, respectively. Only Lemma 2.15 is modified more seriously as follows. 3 The

8

number l0 need not be equal to l since the types  may themselves contain .

Coherence and Transitivity of Subtyping as Entailment 517 L EMMA 5.8 Let 1 and 2 be two atomic derivations of x :  `co+B M :  and x :  `co+B N :  respectively. Then, there exist atomic derivations 01 = 1 and 02 = 2 such that 01 and 02 have the same size as 1 and 2 , respectively, and

 either both end with the same rule,  or at least one ends with a base type rule.

P ROOF. In the case when both derivations end with some rule that is not a base type rule, the modified versions of Lemma 2.11 and Lemma 5.2 should be used instead of Lemma 2.11. If at least one ends with a base type rule, then Lemma 5.5 is applied. To the modified version of Lemma 2.16 (which asserts that if M; N are Co `+ B terms, then the -nf of [M=x]N exists and the reduction is linear), one may add the following obvious observation: if M  ci;j M 0 then [M=x]N is -normal.

5.3 Completeness of Co `+ B The definition of erasure (see Section 3.1) must also be extended to account for base coercions. D EFINITION 5.9 (Erasure of base coercions) erase(ci;j M )  erase(M ). L EMMA 5.10 If x :  `co+B M :  then erase(M )

! x.

T HEOREM 5.11 (Completeness of Co `+ B ) Let M be a term in -nf such that x :  `F +B x :  `co+B M :  .

M : 

and

erase(M )

! x.

Then,

P ROOF. Almost identical to the proof of completeness for the pure system Co ` (Theorem 3.3). Since ci;j has the type ki ! kj , M , in -nf, has the following general structure:

M

 Y~1 : y1 : 1 : : : Y~k : yk : k : Y~k+1 : c1 (: : : (cl (x~1 M1 : : : ~n Mn ~n+1 )) : : :):

First the same cases as in Theorem 3.3 are considered (using modified lemmas). Then if none of them apply, Lemma 5.5 and its Corollary 5.6 may be used. Corollary 3.4 holds for Co `+ B (i.e. one has the subject reduction property for Co `+ B ).

5.4 Transitivity

The fact that modified Lemma 2.16 holds for Co `+ B terms justifies the definition of (transitivity) for the labelled system as the following rule: (cut)

cut

x :  `co M :  y :  `co N :  x :  `co nf ([M=y]N ) : 

Recall that reduction to normal form is linear. The proofs of the following lemma and theorem are similar to the proofs of Lemma 3.7 and Theorem 3.8.

518 Coherence and Transitivity of Subtyping as Entailment L EMMA 5.12 Assume that x :  `co+B M :  and y :  `co+B N : . Then, erase (nf ([M=y ]N )) = nf (erase ([M=y ]N )). T HEOREM 5.13 (Transitivity of `co+B ) (cut) is an admissible rule in Co `+ B . We have thus shown that adding extra base types and coercions preserves the transitivity of entailment, i.e., of subtyping. Note that, if we had used axioms of the form i `co+B j to assert subtyping relations between base types instead of Gentzen-style rules (i  j ), it would have been impossible to eliminate cuts. Notice also that, usually, in Gentzen-style systems such as ours, ‘right’ rules are balanced by symmetric ‘left’ rules. In the system Co `+ B , though, only the rules (i

 j )

x :  `co+B M : i x :  `co+B ci;j M : j

are added to assert subtyping between base types. However, from the admissibility of (cut) (Theorem 5.13), we can deduce the admissibility of the ‘left’ analogue of the (i  j ) rules.

x : j `co+B M :  (i  j left) y : i `co+B [ci;j y=x]M :  By the observation following Lemma 5.2, [ci;j y=x]M is in -nf. To see this, assume that x : j `co+B M :  has been proved. Construct the following derivation with (cut):

y : ` y: (i  j ) y :  i `co+B c yi:  x : j `co+B M :  i co+B i;j j (cut) y : i `co+B [ci;j y=x]M :  where [ci;j y=x]M is the already in -nf. Since (cut) is admissible, we are done. 5.5 Coherence Suppose that base types 1 ; 2 ; 3 ; : : : are given together with some (i  j ) rules, as described previously, but that no other conditions are added; in particular, neither unicity of the constants ci;j for any given i; j , nor compositionality, nor associativity of coercions between base types are asserted. As regards compositionality, transitivity of `co+B (Theorem 5.13) guarantees that, once embedded in our Gentzen-style system, base coercions compose. The questions of unicity (coherence) and associativity may be settled only with respect to an equality stronger than = (in Co ` , already =co was needed). Consider first what happens if we extend to Co `+ B the equality =co

Coherence and Transitivity of Subtyping as Entailment 519 T HEOREM 5.14 (Weak coherence of Co `+ B ) Assume that both ci;j and di;j are coercion constants between base types i and j . If y :  `co+B M : i , then x : 8X: `co+B ci;j ([xX=y]M ) =co di;j ([xX=y]M ) : j . P ROOF. The assumption means that the following two rules have been added:

x :  `co+B M : i x :  `co+B ci;j M : j

x :  `co+B M : i x :  `co+B di;j M : j

Construct then the following derivation:

y :  `co+B M : i y :  `co+B M : i y :  `co+B ci;j M : j y :  `co+B di;j M : j (eq appl2 co) x : 8X: `co+B ci;j ([xX=y]M ) =co di;j ([xX=y]M ) : j In particular, since x : 8X:i `co+B xX : i , this weak coherence theorem implies equality of the composition of ci;j and di;j with the coercion xX : just take  = i and M  y . Clearly though, the theorem does not imply the equality of ci;j and di;j . In order to prove full coherence for Co `+ B , we need to force unicity of coercions (coherence) on base types. This may be done by adding a rule for equality on base types as follows. D EFINITION 5.15

=coB is the least equivalence relation generated by =co plus the following rule: (eq base co)

x : i

`co+B M : j x : i `co+B N : j x : i `co+B M =coB N : j

Note that, by the corollary to Lemma 5.5, the terms

ci (: : : (c1 x) : : :) and c0j (: : : (c01 x) : : :), respectively.

M

and

N

must have the structure

The condition on coercions between base types may be compared with the condition of coherence on basic coercions in [16].

T HEOREM 5.16 (Coherence of Co `+ B ) Let 1 and 2 be two derivations of x :  `co+B Then, M =coB N .

M :

and x :  `co+B

N :

respectively.

P ROOF. The proof is similar to the proof of coherence for the pure system Co ` (Theorem 3.10) but instead of Lemma 2.15, we have to use its modified version (Lemma 5.8). An extra case has to be considered: at least one of 1 , 2 ends with a (base) rule. In this case,  =  for some base type . By Lemma 5.4 case 1, both 1 and 2 may contain applications of only (ax), (i  j ), and (8 left) rules. Apply Lemma 5.3 to both derivations, permuting all applications of (i  j ) rules before those of (8 left), thus obtaining the following two derivations:

520 Coherence and Transitivity of Subtyping as Entailment

0 0

`co+B 0 `co+B p .. .

0 `co+B q x : 0 `co+B M :  .. .

8Xl : : : 8X1:0 `co+B 

0 `co+B 0 0 `co+B p0

(ax) (0  p ) .. . .. .

0 x : 0

(q  ) (8 left . )

.. (8 left)

.. .

`co+B q0 `co+B N :  .. .

8Xl : : : 8X1 :0 `co+B 

(ax) (0  p0 ) .. . .. .

( q 0   ) (8 left . )

.. (8 left)

Note that the applications of (8 left) are the same in both derivations but the applications of (i  j ) rules, and their number, may differ. Ignore now the applications of (8 left) and consider the (sub)derivations of x : 0 `co+B M :  and x : 0 `co+B N :  in each derivation. By simple application of the equality rule (eq base co) above, obtain x : 0 `co+B M =coB N : . Then, since (8 left) preserves equality of coercions (implied by (eq appl2 co); see remarks Section 3.2), we are done. C OROLLARY 5.17 Each (i  j ) rule preserves equality of coercions.

P ROOF. Assume that a rule (  0 ) has been added to the system, and assume that the equality x :  `co+B M =coB N :  has been derived. The equality implies that x :  `co+B M :  and x :  `co+B N : . By application of the rule (  0 ), we obtain both x :  `co+B ci;j M : 0 and x :  `co+B ci;j N : 0 where ci;j is the base coercion given by the rule. Use now the same proof technique as in Theorem 5.16 to derive x :  `co+B ci;j M =coB ci;j N : 0 . Coherence guarantees unicity of coercions on all types. As in the pure system Co ` , it implies that bicoercible types are isomorphic. And, as in Co ` , coherence implies the associativity of coercion composition, as given by transitivity of entailment.

6 Related work on subtyping 6.1 Proof-theoretic analyses In light of the proof-theoretic investigations that the problem of subtyping has stimulated, it is fair to say that ‘the notion of subtyping is one of the most important concepts introduced recently into the theory of functional languages’ [30]. Let’s quote some relevant papers. [6] solves the difficult problems of coherence and minimum typing in a different context. Clearly, if a term belongs to a type and to any larger type, then it has no unique type nor does it code a unique proof (or type derivation). Of course, the contravariant behaviour of ! (on the left) and second-order quantification complicate the problem. Yet, in [6], it is shown that each proof reduces to a unique ‘normal’ one, which also yields the minimum type of its coding term (see also [25]). Other extensions of various lambda-calculi further clarified the issue of subtyping at a syntactic level. The approach in [2] significantly departs from the ‘intended’ meaning in the previous papers: the subtyping

Coherence and Transitivity of Subtyping as Entailment 521

relation is interpreted by the existence of a certain definable term (coercion) between type expressions. Yet another approach may be found in [8]. This is directly related to Cardelli’s ideas for the programming language Quest and contains its main features as a basis (the Top type, bounded quantification, etc.). In short, a type-inference system is proposed which fully formalizes Quest’s rules and investigates conservativity of typing judgements and some categorical properties in a proof-theoretic frame (e.g. the syntactic isomorphisms between closed terms). Moreover, [8] suggests the rule (eq appl2) for equality of terms, the coercion version of which, (eq appl2 co), is used in our approach. Both [2] and [8] are ‘orthogonal’ to this paper, as they contain features (a Top type, records, variants, bounded quantification, etc.) that were motivated mostly by the practice of programming and which are not present in our approach. Our perspective stresses the logical (indeed, the ‘implicative’) nature of subtyping and, for now, it presents only the ‘pure logic’. It takes care of the introduction and elimination of universal quantification, which are not present in the other approaches except in that of Mitchell [24] and Tiuryn [30]. In a sense, the present paper (in particular, Section 4) may be seen as a proof-theoretic analysis of Mitchell’s Hilbert-style approach. Tiuryn gives two complete logical systems for subtyping, one in natural-deduction style and the other as a full Gentzen-style system; the latter system was inspired by the earlier version of our work here, i.e. by [20]. In [22, 16] subtyping in the c6(n627(6(n6(i)5.6gonal)-199.2(t)5.6(heo(i)5.6ec)2.-sn)-20875(s)8.2(pe(i)5.6(fi)656ec)2.-ed)-2 (as)-219(b)-169(een)-218.4nstihein consi rend tb rea w(as)-219(t)5.3(b)-169(e)-2-6.2(c)2.6consv ativ(i)5.3(t)5.3yd ofubt sypi ng

522 Coherence and Transitivity of Subtyping as Entailment In PER models, coercions are not arbitrary maps; nor are they identities. Indeed, this makes no sense in typed frameworks: if    but  and  are different, there is no identity from  to  . This is so both in models and in theories. Note that PER models are constructed over an underlying model of the type-free lambda calculus. Any model of partial combinatory logic may suffice, see [15] say, and in particular Kleene’s (!; :): in this case, n:m stays for the nth index or Turing Machine applied to m. Thus, in the PER model, terms are interpreted as equivalence classes of elements of ! . More precisely, define the erasure of a typed term to be the type-free term obtained by erasing all type information. Then, a term is interpreted by the equivalence class of (the interpretation of) its erasure. This allows second-order quantification to be interpreted as intersection, since the intersection is isomorphic to a categorical product [18]. In short, in the model in [4] (more precisely in the variant of it in [7, Section 4], a coercion c from  to  transforms each fng into fng , where fng denotes the equivalence class of n in  and fng is generally smaller than fng . Thus, c is computed, in particular, by any index i of the identity function; equivalently, c is represented by fig! , since fig! :fng = fi:ng = fng . Clearly, fig! contains many more elements than the indices of the identity function on !, when  or  are different from ! . Using erasures in order to interpret typed terms, then this suggests that syntactic coercions are typed terms, different, in general, from the identity, but whose erasure is equal to the identity x:x. This idea is nicely used in [24]. It gives the syntactic completeness theorem for our ‘logic of subtyping’, system Co ` .

6.3 Semantics of equality, =co The semantics, indeed the consistency, of (eq appl2 co) deserves close attention. The following remarks are likewise intended for the reader familiar with PER models. Rule (eq appl2 co) is valid in the PER model of subtyping [4]. More precisely, it is valid in the interpretation of subtyping with explicit coercions, given in [7] for the system QuestC . This can be seen as follows. Recall that, under an environment  , the interpretation of M :  is given by the equivalence class f[ erase(M ) ] g , where, by an abuse of language, we use the same name for both a type  and its meaning as a p.e.r.. (See the introduction and [7] on how type-free and typed environments are related; in short, a type-free environment  ‘picks out’ an element of the equivalence class of the typed  0 .) Assume now that yi : [i =X ] for i = 1; 2, and x : 8X:. If both [i =X ] for i = 1; 2 are subtypes of , then the equivalence classes f (yi )g[i =X ] are coerced to the (larger) equivalence classes f (yi )g for i = 1; 2. Note now that, for x : 8X:, one has xi : [i =X ] and erase(xi ) = erase(x) = x. Then, by the interpretation of polymorphic application (see [18]), f (x)g8X: : i = f (x)g[i=X ] , which is also equal to the interpretation of xi : [i =X ]. The validity of the premisses of (eq appl2 co) means that, in the model, Ni coerces the meaning of any term in [i =X ] into an equivalence class of . In particular, both interpretations f (x)g[i =X ] of xi : [i =X ] , i = 1; 2, are coerced by Ni to the same equivalence class f (x)g , which does not depend on i. This is exactly the validity of the consequence of (eq appl2 co). As recalled in the introduction, the coercions N1 ; N2 are interpreted by functions computed (also) by indexes of the identity function. In general, though, they are not themselves identities and f (yi )g[i =X ] may be strictly smaller than f (yi )g (and each of the f (x)g[i =X ] may be strictly smaller than f (x)g ). In [8], it is said that (eq appl2), which contains no coercions, is valid in PER models. This is correct but only modulo ‘forget-

Coherence and Transitivity of Subtyping as Entailment 523 ting’ the coercions in the model (as was suggested in [4]). However, this does not correspond exactly to the structure of PER models, where coercions are non-identical maps (see [7] for a more detailed discussion). Thus, (eq appl2 co) is a more precise formalization of ‘truth’, as given in PER models, than (eq appl2).

6.4 Second-order cut-elimination proofs The reader may wonder why the cut-elimination proof (Theorem 3.8) is so simple for a system which contains higher-order types and/or why we couldn’t just derive it from the proofs of more expressive systems. It is known that impredicative second-order logic requires powerful tools to yield cut-elimination or normalization proofs [12, 13]. Yet, in [21], we give another direct proof of cut-elimination for Co ` plus (cut) by a proof that does not rely on completeness: a non-obvious exercise. For the reader interested in the issue, we analyse here the key difficulties of a direct proof and some analogies and differences with respect to other calculi. We use [13] as a reference. For a first-order system, the proof of cut-elimination considered in Chapter 13 of [13] is divided into two main parts: Sections 13.1 and 13.2. The first part is the basis for an induction on the size of derivations: ‘cuts’ are permuted with other rules and, when possible, they are moved up. Among the key cases, there are two crucial ones: cases 6 and 7 (or 8). Case 6 has an arrow as the cut formula, and the point there is that if one ‘swaps’ the cut rule with the right and left-arrow rules, the cut rule is, unlike the preceding cases, NOT moved up, and a straightforward induction would then fail. This is the reason for introducing (in Section 13.2) the notion of ‘degree’ Æ of a formula and using a combined induction on derivations AND degrees: in case 6, Section 13.1, the degree of the cut formula does decrease (not the size of the derivation). Fortunately, in the first-order case, the notion of degree of a formula does not present any problems (Section 13.2): it is preserved under instantiation of a term variable by a term (e.g. Æ(A[t=x]) = Æ(A)). By this, the degree of the cut formulae in case 7 (and 8) of Section 13.1 does not change when moving up the cut rule. And the combined induction goes through. However, this form of combined induction cannot be used in the presence of higher-order (impredicative) formulae: no degree or measure on formulae is preserved by instantiation in general. Girard and Tait’s proof by ‘candidates of reducibility’ employs a powerful technique to overcome this crucial difficulty of impredicative systems. The heavy inductive loading used (conditions CR 1,2,3 of chapters 6 and 14) requires the intended set of terms (candidates of reducibility) to be closed under reductions and expansions. In our case of a direct proof [21], the difficulties of case 6 (Section 13.1 of [13]) are easily handled: that case corresponds to (!) occurring simultaneously on the left and on the right, where an arrow formula is eliminated, by cut. This gives a symmetric situation and allows one to move up the last cut rule, in contrast to case 6 in [13]. Thus, we do not need to introduce an induction on degrees of formulae, which would, in turn, cause problems in an impredicative system, like ours. Moreover, we can neither refer to nor use the candidates of reducibility, even though Co ` is a subsystem of System F: the terms of our system with the cut rule are not closed under expansions (CR3), as pointed out before Lemma 2.16. Note that, in general, cut-elimination cannot automatically be applied to subsystems: there are easy counter-examples. In the present approach, we used a simpler technique. The linearity of terms gives an immediate normalization theorem; then the completeness result (Theorem 3.3: coercions are exactly those F terms that erase to the identity) allows us to avoid a step-by-step cut-elimination

524 Coherence and Transitivity of Subtyping as Entailment and refer to the ‘evaluation’ (nf ) of a term’s erasure (cf. recent results on ‘normalization-byevaluation’ [5]).

7 Conclusions

The purpose of the calculus Co ` presented in this paper is to give a coherent logical meaning to the notion of subtyping:  is a subtype of  if  implies (entails)  . This meaning is the most obvious relation between logical implication and naive set-theoretic inclusion. The main advantage of our approach is that Co ` has a sound logical ‘status’, independently of its intended meaning for subtyping. This is obtained by presenting entailment in the frame of a second-order sequent calculus, where quantification is introduced and eliminated by right and left rules. We could then state and prove relevant properties such as coherence and the admissibility of (cut), which is equivalent to a cut-elimination theorem. It should be clear why we do not take the (cut) rule as part of the definition of our subtyping system. In order to obtain coherence, we would need to eliminate it anyway. And coherence is used to prove anti-symmetry. Moreover, without (cut), all our proof-terms (definable coercions) are in normal form, as only (cut) may introduce redexes, exactly as in the -calculus. In view of its formal relation to (cut), it may be fair to say that our system of subtyping is the least but still meaningful system for implication that also handles second-order universal quantification. Indeed, what weaker but still meaningful computation is there than ‘take an input and transform it into an element of a larger type’? And, intuitionistically, logical implications are computations. By our system, we characterized the logical implications which are coercions and explicitly used this characterization in the main results. In the final section, we extended Co ` with base types. Transitivity and coherence hold in this extended calculus, which is conservative over Co ` . In view of the work in [2, 8, 6], further extensions can be studied, in particular, with a Top type, records, variants and bounded quantification. Moreover, a joint system F+Co ` could also be relevant for investigating general polymorphic subtyping.

Acknowledgements We would like to thank Jerzy Tiuryn for his close analysis and critical insight into our work. We also acknowledge the comments of an anonymous referee on a previous version of this paper, as well as the referees of the present version. Sergei Soloviev was supported by a British EPSRC grant GR/K79130 while working at Durham University (1996–98); his work was also funded by HCM Project CHRX-CT930046 ‘Typed Lambda Calculus’ (while visiting ENS, Paris) and ‘CLICS II’. Giuseppe Longo’s work was partially supported by HCM Project CHRX-CT93-0046 ‘Typed Lambda Calculus’.

References [1] M. Abadi, L. Cardelli and P.-L. Curien. Formal parametric polymorphism, Theoretical Computer Science, 121, 9–58, 1993. [2] V. Breazu-Tannen, T. Coquand, C. Gunter and A. Scedrov. Inheritance as implicit coercion, Information and Computation, 93, 172–221, 1991. [3] K. Bruce, R. Di Cosmo and G. Longo. Provable isomorphisms of types, Mathematical Structures in Computer Science, 2, 231–247, 1992.

Coherence and Transitivity of Subtyping as Entailment 525 [4] K. Bruce and G. Longo. A modest model of records, inheritance, and bounded quantification, Information and Computation, 87, 196–240, 1990. [5] U. Berger and H. Schwichtenberg. An inverse of the evaluation functional for typed -calculus. In Proceedings of the 6th IEEE Symposium on Logic in Computer Science (LICS), pp. 203–213, IEEE Computer Society Press, 1991. [6] P.-L. Curien and G. Ghelli. Coherence of subsumption, minimum typing and type-checking in F , Mathematical Structures in Computer Science, 2, 55–92, 1992. [7] L. Cardelli and G. Longo. A semantic basis for Quest, Journal of Functional Programming, 1, 417–458, 1991. [8] L. Cardelli, S. Martini, J. Mitchell and A. Scedrov. An extension of system F with subtyping, Information and Computation, 94, 4–56, 1994. First appeared in Proceedings of the Conference on Theoretical Aspects of Computer Software, (Sendai, Japan), T. Ito and R. Meyer, eds., pp. 750–770, Lecture Notes in Computer Science 526, Springer, 1991. [9] L. Cardelli and P. Wegner. On understanding types, data abstraction, and polymorphism, ACM Computing Surveys, 17, 471–522, 1985. [10] G. Chen. Dependent type system with subtyping. Technical Report LIENS-96-27, Ecole Normale Sup´erieure, Paris. Revised version to appear in Journal of Computer Science and Technology. [11] G. Chen. Sous-typage, conversion de types et e´ limination de la transitivit´e. Doctoral thesis, Laboratoire d’Informatique de l’Ecole Normale Sup´erieure, Paris. December 1998. [12] J.-Y. Girard. Une extension de l’interpr´etation de G¨odel a` l’analyse, et son application a` l’´elimination des coupures dans l’analyse et la th´eorie des types. Proceedings of the 2nd Scandinavian Logic Symposium, J.E. Fenstad, ed., pp. 63–92, North-Holland, 1971. [13] J.-Y. Girard, Y. Lafont and P. Taylor. Proofs and Types. Cambridge Tracts in Theoretical Computer Science 7, Cambridge University Press, 1989. [14] J. Hyland. The effective topos. In Proceedings of The L. E. J. Brouwer Centenary Symposium, A.S. Troelstra and D.S. van Dalen, eds., pp. 165–216, North-Holland, 1982. [15] J. Hyland. A small complete category, Annals of Pure and Applied Logic, 40, 135–165, 1988. [16] A. Jones, Z. Luo and S. Soloviev. Some algorithmic and proof-theoretical aspects of coercive subtyping. Selected papers of the International TYPES’96 Workshop (Aussois, France, December 1996), E. Gim´enez and C. Paulin-Mohring, eds., Lecture Notes in Computer Science 1512, pp. 173–196, Springer, 1998. [17] S. Kleene. Mathematical Logic, Wiley, 1967. [18] G. Longo and E. Moggi. Constructive natural deduction and its ! -set interpretation. Mathematical Structures in Computer Science 1:2, pages 215–253, 1991. [19] G. Longo, K. Milsted, and S. Soloviev. The genericity theorem and the notion of parametricity in the polymorphic -calculus, Theoretical Computer Science, 121, 323–349, 1993. [20] G. Longo, K. Milsted and S. Soloviev. A logic of subtyping (extended abstract). In Proceedings of the 10th IEEE Symposium on Logic in Computer Science (LICS), IEEE Computer Society Press, 1995. [21] G. Longo, K. Milsted and S. Soloviev. A logic of subtyping. Complete version of the previous paper; available by anonymous ftp from ftp.ens.fr as /pub/dmi/users/longo/logicSubtyping.ps.Z or ... /logicSubtyping.dvi.Z [22] Zhaohui Luo. Coercive subtyping in type theory. In CSL’96, the 1996 Annual Conference of the European Association for Computer Science Logic, Utrecht, Lecture Notes in Computer Science, v. 1258, 1996. [23] Q. Ma and J. Reynolds. Types, abstraction, and parametric polymorphism, part 2. In Proceedings of the Conference on Mathematical Foundations of Programming Semantics, S. Brookes, M. Main, A. Melton, M. Mislove, and D. Schmidt, eds., pp. 1–40. Lecture Notes in Computer Science 598, Springer, 1992. [24] J. Mitchell. Polymorphic type inference and containment, Information and Computation, 76, 211–249, 1988. Also appeared in Logical Foundations of Functional Programming, G. Huet, ed., pp. 153–193, Addison-Wesley, 1990. [25] B. Pierce and M. Steffen. Higher-order subtyping, Theoretical Computer Science, 176, 235–282, 1997. A preliminary version appeared in Proceedings of the IFIP Working Conference on Programming Concepts, Methods and Calculi (PROCOMET), June 1994, and as University of Edinburgh technical report ECS-LFCS-94-280 and Universit¨at Erlangen-N¨urnberg Interner Bericht IMMD7-01/94, January 1994. [26] A. Pitts. Non-trivial power types can’t be subtypes of polymorphic types. In Proceedings of the 4th IEEE Symposium on Logic in Computer Science (LICS), pp. 6–13, IEEE Computer Society Press, 1989. [27] J. Reynolds. Polymorphism is not set-theoretic. Proceedings of Semantics of Data Types, G. Kahn et al., eds., pp. 145–156. Lecture Notes in Computer Science 173, Springer, 1984.

526 Coherence and Transitivity of Subtyping as Entailment [28] S. Soloviev. The category of finite sets and Cartesian closed categories, Journal of Soviet Mathematics, 22, 1387–1400, 1983. [29] J. Tiuryn. Equational axiomatization of bicoercibility for polymorphic types. In Proceedings of the 15th Conference on the Foundations of Software Technology and Theoretical Computer Science, P. S. Thiagarajan, ed., pp. 166–179. Lecture Notes in Computer Science 1026, Springer, 1995. [30] J. Tiuryn. A sequent calculus for subtyping polymorphic types. In Proceedings of the Conference on Mathematical Foundations of Computer Science, Springer, 1996. [31] J. Tiuryn and P. Urzyczyn. The subtyping problem for second-order types is undecidable. In Proceedings of the 11th IEEE Symposium on Logic in Computer Science (LICS), IEEE Computer Society Press, 1996.

Appendix

!) and (8n right) pre-

The following rules complete the definition of =co (Section 3.2); they just say that rules ( serve equality of coercions: (eq

(eq

`

x0 : 0 co M

!)

`

y :  co N

=co N 0 :  0

!  `co x0 : 0 : [xM=y]N =co x0 : 0 : [xM 0=y]N 0 : 0 !  0 x :  `co x1 : 1 : : : xk : k :M =co x1 : 1 : : : xk : k :N : 1 ! : : : (n !  ) : : :) x :  `co x1 : 1 : : : xk : k : : : xn : n :X:Mxk+1 : : : xn =co x1 : 1 : : : xk : k : : : xn : n :X:Nxk+1 : : : xn : 1 ! : : : (n ! 8X: ) : : :)

x:

80kn right)

 for X not free in  nor in 1 ; : : : ; n , for M not of the form y:M 0 , for xk+1 ; : : : ; xn fresh

(eq

=co M 0 : 

80n