Partial Blocking and Associative Learning - Semantic Scholar

0 downloads 0 Views 349KB Size Report
(1) a) Black Bart killed the sheriff. b) Black Bart caused the sheriff to die. The Bi–OT explanation builds up on a rule known as Horn's division of pragmatic ...
Partial Blocking and Associative Learning Anton Benz∗ Institut for Fagsprog, Kommunikation og Informationsvidenskab Syddansk Universitet, Kolding [email protected]

Abstract We are going to explain partial blocking as the result of diachronic processes based on what will be call associative learning. Especially, we argue that the task posed by partial blocking phenomena is to explain their emergence from unambiguous and fully expressive languages. This contrasts with approaches that presuppose underspecified semantic meanings or ineffability like Bi–OT and some Game Theoretic explanations. We introduce a formal framework based on learning, speaker’s preferences and pure semantics for describing diachronic strengthenings of meaning. Moreover, we show how the diachronic development of systems of semantically co–extensive forms can be described in terms of a complete system of diachronic laws.

1

Introduction

One of the selling–points of bidirectional Optimality Theory (Bi–OT) is its success in explaining partial blocking phenomena. In (1) it has to be explained why kill tends to denote a direct killing whereas caused to die an indirect killing1 : (1)

a) Black Bart killed the sheriff. b) Black Bart caused the sheriff to die.

The Bi–OT explanation builds up on a rule known as Horn’s division of pragmatic labour2 : Marked forms typically get a marked interpretation, and unmarked forms an unmarked interpretation. Kill is the less marked form, and if we assume that speakers prefer less marked forms over marked forms, then kill is the optimal way for them to denote a killing event. As direct killing is the normal and expected way of killing, the hearer should have a preference for interpreting the speaker’s utterance as referring to a direct killing. We can see that kill and direct killing build an optimal form–meaning pair from both perspectives. In addition we can see that the marked form tends to denote the less expected meaning, i.e. cause to die tends to denote an indirect killing. In general, if F1 and F2 are forms and M1 and M2 are meanings where F1 is preferred over F2 and M1 over M2 , then F1 tends to denote M1 and F2 to denote M2 : ∗ This paper is the result of my half year work in the Bi-OT project at the ZAS in Berlin. I found there a stimulating atmosphere and excellent colleagues. This paper profited from discussions at the fourth Szklarska Poreba Conference and the Stockholm workshop on ‘Variations within OT.’ 1 See e.g. (Blutner, 2000). 2 See (Horn, 1984), p. 22.

1

(2) F1 F2

M1 • ↑ •

←− ←−

M2 • ↑ •

Horn explains his principle by recursion to two pragmatic principles, called the Q– and R–principle. Blutner (1998) gave them a formally precise formulation, especially, he made explicit the role of switching between speaker’s and hearer’s perspective. This laid the foundation for an optimality theoretic reformulation (Blutner, 2000), and thereby of setting radical pragmatics into the broader linguistic context provided by OT. Underspecification of semantic meaning is a central assumption in Bi–OT. In contrast, we will argue that the task posed by partial blocking phenomena is to explain how they can emerge from languages where meanings are fully specified and every state of affairs can be expressed. We are going to explain partial blocking as the result of diachronic processes based on what we will call associative learning. In bidirectional OT, Horn’s principle can be derived as a special case of the principle of weak optimality. This principle poses two immediate problems: (1) it over–generates partial blocking and (2) it has only a weak foundation, i.e. there is no good explanation for the principle of weak optimality which does (a) not make (an implicit) appeal to Horn’s principle of pragmatic labour, and (b) provides more than just an algorithm for how to calculate weakly optimal form–meaning pairs. Game theory has soon been proposed as a remedy for the last problem3 . We will discuss Bi–OT in more length in Section 2, and in Section 3 we consider game theoretic approaches, especially van Rooy’s4 , for explaining Horn’s division of pragmatic labour. We will argue that they relate to a different type of problems: Partial blocking can be observed in examples where expressions are unambiguous and where there would be an alternative form for denoting the more marked meaning. We will see that these observations make the discussed game theoretic models inapplicable, as they have to assume ambiguity of interpretations or ineffability of some states of affairs. We claim that partial blocking can be explained as an effect of associative learning plus speaker’s preferences on forms. It emerges as a result of a diachronic process. We explain Example (1) by postulating the following five stages: (1) In the initial stage all killing events are direct killing events. The speaker will always use kill to denote these events. (2) Interpreters will learn that kill is always connected with direct killing. They associate kill with direct killing. (3) The speaker will learn that hearers associate kill with direct killing. (4) If then an exceptional event occurs where the killing is an indirect killing, then the speaker has to avoid misleading associations, and use a different form. In this case the more complex form cause to die. (5) The hearer will then learn that cause to die is always connected to an untypical killing. By associative learning we mean the learning process in (2), (3), and (5). If the hearer can observe that: In every actual instance where the form F is used for classifying events or objects it turns out that the classified event or object is at least of type t, then the hearer learns to associate F with t, i.e. he learns to interpret F as t. A similar process is assumed for the speaker to explain step (3). In Section 4 we work out a formal model which describes diachronic process based on associative learning. We call it a diachronic model. It consists of (1) a simple synchronic system which is defined by 3 (Dekker 4 (v.

& v. Rooy, 2000) Rooy, 2002).

2

the underlying semantics and fixed pragmatic constraints; and (2) of a sequence of synchronic stages which contain all actual dialogue situations and the speaker’s and hearer’s selection and interpretation strategies. In Section 5 we characterise processes based on associative learning in terms of laws of diachronic change. In (1) we find two synonymous expressions that differentiate their meanings. This leads to the question: Given any set of semantically synonymous expressions, how and when can associative learning and speaker’s preferences lead to a change in interpretation? Is there a general characterisation of diachronic processes leading to differentiations in their meanings? These are the guiding questions for Sections 5. Formulating general laws allows us to answer the questions completely in the sense that all possible strengthenings due to associative learning can be accounted for in terms of diachronic laws.

2

Bi–OT and Weak Optimality

2.1

An Intuitive Introduction

According to OT, producer and interpreter of language use a number of constraints which govern their choice of forms and meanings. These constraints may get into conflict. OT proposes a mechanism for how these conflicts are resolved. It assumes that the constraints are ranked in a linear order. If they get into conflict, then the higher-ranked constraints win over the lower ranked. This defines preferences on forms and meanings. Optimality theory has divided into many sub–theories and variations. Beaver and Lee (2003) provide for a useful overview of versions of optimality theoretic semantics. They discuss seven different approaches. In particular they compare them according to whether they can explain partial blocking. It turns out that the only approach which can fully justify Horn’s division of pragmatic labour is Blutner’s Bi–OT5 . What are the structures which underlie Bi–OT? In bidirectional OT it is common to assume that there is a set F of forms and a set M of meanings (Blutner, 2000). A set Gen, the so–called generator, tells us which form–meaning pairs are grammatical. The grammar may leave the form–meaning relation highly underspecified. In a graphical representations like (2) a grammatical form–meaning pair hF, M i is represented by a bullet at the point where the row for F and the column for M meet. Underspecification means that a row corresponding to a form F may contain several bullets. The speaker has to choose for his utterance a form which subsequently must be interpreted by the hearer. It is further assumed that the speaker has some ranking on his set of forms, and the hearer on the set of meanings. Blutner (2000) introduced the idea that the speaker and interpreter coordinate on form–meaning pairs which are most preferred from both perspectives. In (J¨ ager, 2000) the mechanism which leads to optimal form–meaning pairs is discussed in greater detail. The speaker has to choose for a given meaning M0 a form F0 which is optimal according to his ranking of forms. Then the interpreter has to choose for F0 a meaning M1 which is optimal according to his ranking of meanings. Then again the speaker looks for the most preferred form F1 for M1 . A form–meaning pair is optimal if ultimately speaker and hearer choose the same forms and meanings. If hF, M i is optimal in this technical sense, then the choice of F is the optimal way to express M such that both speaker’s and interpreter’s preferences are matched. It is easy to see that the procedure for finding an optimal form–meaning pair stops for a pair hF, M i exactly if there are no pairs hF ′ , M i and hF, M ′ i such that the speaker prefers F ′ over F given M and the hearer prefers M ′ over M given F . In the graph (2) 5 See

(Beaver & Lee, 2003), Section 7 and 5.

3

hF1 , M1 i is optimal because there are no arrows leading from hF1 , M1 i to other form– meaning pairs. In general we find: A form–meaning pair hF, M i is optimal exactly if in the graph representing the situation there are no arrows leading away from hF, M i. Weak optimality is a weakening of the notion of optimality. In (2) we find that F2 should go together with M2 , but hF2 , M2 i is not optimal — there are arrows leading away from hF2 , M2 i. What makes the difference between hF1 , M2 i and hF2 , M1 i on the one side, and hF2 , M2 i on the other side? For hF1 , M2 i and hF2 , M1 i there is either a row or a column which contains it together with the optimal form–meaning pair hF1 , M1 i. For hF2 , M2 i neither its row nor its column contains the optimal hF1 , M1 i. If we remove the row and the column which contain hF1 , M1 i, then hF2 , M2 i is optimal in the remaining graph. This can be generalised: If we remove from a given graph all rows and columns which contain an optimal form–meaning pair, then the optimal form–meaning pairs in the remaining graph are called weakly optimal. We can iterate this process until no more form–meaning pairs, and hence no graph, remains6 . The principle of Weak Optimality: If hF, M i is weakly optimal, then F means, or has at least a tendency to mean, M . As a special case we find: Horn’s Division of Pragmatic Labour: If F must actually be interpreted as M but could also mean M ′ , and F ′ must actually be interpreted as M ′ but could also mean M , then F is more marked than F ′ if, and only if M is more marked than M ′7 .

2.2

Over Generation and the Problem of Foundation

There are two serious problems with Bi–OT which I want to address in this section: (1) The problem of over–generation, and (2) the foundation problem. The Problem of Over–Generation Bi–OT can successfully explain examples like (1) but if we apply it straightforwardly, then there are many examples where it over-predicts partial blocking. We first look at two examples where the crucial interpretation task is to resolve some anaphora. For the first example, (3), we assume that the speaker has two children, a daughter named Marion and a son called Jo8 . (3)

a) My daughter was frustrated with my son. She was pulling his hair out. b) My daughter was frustrated with my son. Marion was pulling Jo’s hair out. c) My daughter was frustrated with my son. Marion was pulling the boy’s hair out.

If we assume that a pronoun is more economic than a proper name, and a proper name more economic than a definite description, then the speaker should continue his first 6 The principle of weak optimality is due to Blutner, see (Blutner, 1998, 2000). He calls superoptimality what was later called weak optimality. The process for finding weakly optimal form meaning pairs is due to G. J¨ ager, see (Blutner, J¨ ager, 2000; J¨ ager, 2000). (Dekker & v. Rooy, 2000) was a first attempt to bring weak optimality together with the notion of nash equilibria which is common in game theory. 7 This formulation has to be understood as being in the spirit of Horn (1984), p. 22. We will discuss his original version in the next subsection (2.2). 8 Examples of this type have first been discussed by J. Mattausch (2000).

4

sentence with She was pulling his hair out. The use of Marion, Jo and the boy are less preferred, hence they should go together with a marked interpretation. Now, there is a marginal possibility that Marion is the name of the boy and Jo the name of the girl. Hence if we apply Bi–OT straightforwardly, then it predicts a tendency of e.g. Jo, or the boy, to indicate that Jo is not the name of the boy. But for all three examples we get the same references. We may suppose that there is not enough underspecification in the first sentence so that there is not much to choose for the interpreter. But even if it is not semantically clear who is male and who is female, we don’t get a marked interpretation for a marked expression: (4)

a) The doctor kissed the nurse. She is really beautiful. b) The doctor kissed the nurse. Marion is really beautiful. c) The doctor kissed the nurse. The woman is really beautiful. d) (?)The doctor kissed the nurse. SHE is really beautiful.

The anaphoric relations are the same for all examples. If we stress the pronoun, then the sentence becomes rather ungrammatical than getting a marked reading. Examples (3) and (4) are cases where underspecification is crucially involved. The pairs of sentences can have different meanings according to how we resolve the anaphoric expressions. The next two examples represent cases without underspecification. The critical expressions differ only with respect to their extension but cannot have two different readings. (5)

a) Hans hat sich ein Rad gekauft. b) Hans hat sich ein Fahrrad gekauft. c) Hans hat sich ein Zweirad gekauft. d) Hans has himself a bicycle bought.

The first two sentences are equivalent but the third is marked. The critical expressions are Rad, Fahrrad and Zweirad. In this context they all have the same meaning, namely bicycle. The principle of weak optimality would predict that Rad (wheel) is optimal, hence Fahrrad (driving–wheel) should tend to have a marked meaning. But both expressions are equivalent. Fahrrad and Zweirad (two–wheel) are of the same complexity, hence there should be no difference in meaning, but Zweirad is marked. In contrast, the following example clearly is in line with weak OT and Horn’s principle of division of pragmatic labour: (6)

a) Hans wischt den Boden mit Wasser. b) Hans wischt den Boden mit einer Fl¨ ussigkeit. c) Hans mops the floor with water/a liquid.

Fl¨ ussigkeit (liquid/fluid) is clearly understood as meaning it is not water what Hans uses for mopping the floor. We group the examples together into two classes which will be important for our discussion of game theoretic approaches. We distinguish a class (A) with situations where the hearer has to resolve an ambiguity for interpreting the speaker’s utterance, and a class (B) where the critical expressions differ only with respect to their extension. Examples (3) and (4) belong to class (A), and example s (1), (5) and (6) belong to class (B). Into the same group belongs the observation that: 5

(7) Although pink is the stereotypical form of being pale red, the preexisting form ’pink’ seems to have the effect that ‘the dress is pale red’ means that the dress is not prototypically pink, but somewhere between pink and pale red9 . This example differs from (1) because Pink and pale red are not cooextensive. Pink relates to a special pale red colour. Graphically this means that the right upper dot in (2) is missing. We see that we don’t get the effects predicted by Bi–OT for class (A). Marked expressions don’t show a tendency to go together with the unexpected reading. Our examples which show partial blocking belong all to class (B)10 . We can’t discuss here the examples of class (A) any further. There is some work now on anaphora and OT starting with (Beaver, 2000), (Hendriks & de Hoop, 2001)11 . I think that examples of class (A) simply constitute a different type of problem. Hence we restrict our considerations to cases without ambiguities, i.e. class (B). Conceptually, this is an important point as the assumption that meaning is highly underspecified is central for Bi–OT. The Foundation Problem How to justify the principle of weak optimality? In Section 2.1 we provided for an informal account but to a large extend we just explained how to calculate weakly optimal form– meaning pairs given a graphical representation of the problem. A crucial step in this calculation was the reduction of a graph once we have found an optimal form–meaning pair hF, M i. We had to removed all nodes in the row and the column which contain hF, M i. The question is: What justifies this step? In (2) we find that hF1 , M1 i is optimal. Why does F2 tend to mean M2 ? We may answer that F2 is more complex than F1 , and if the speaker wants to express M1 , then he will use F1 because it is optimal and has already the tendency to be interpreted as M1 , hence there remains for the complex form F2 only meaning M2 . But this begs the argument. It presupposes Horn’s division of pragmatic labour as formulated above. Horn (1984, p.22) presents a derivation of his principle. His original wording of division of pragmatic labour is: (8) The use of a marked (relatively complex and/or prolix) expression when a corresponding unmarked (simpler, less ’effortful’) alternative expression is available tends to be interpreted as conveying a marked message (one which the unmarked alternative would not or could not have conveyed)12. There is a problem if we want to apply the principle in this formulation to (1): It is essential that both expressions, kill and cause to die, can convey the unmarked message. In the form of (8) it fits only for examples like (7) where the situation looks as follows: (9) F1 F2

M1 • ↑ •

M2

←−



9 Cited after (v. Rooy, 2002, Sec. 3.2, p. 14), who attributes it to (McCawley, 1978); see also (Horn, 1984, p. 27). We come back to this example later, when discussing van Rooy’s (2002) game theoretic approach. 10 See e.g. (Horn, 1984), Section 10: The division of pragmatic labour and the lexicon. None of his examples involves underspecification of meaning. 11 I discussed Mattausch’s examples in two previous papers: (Benz, 2001) addresses them from within Bi–OT; in (Benz, 2003) I approached them from a dialogical perspective within a theory of rational interaction. 12 (Horn, 1984), p. 22.

6

But that is not important here, since Horn’s derivation does assume that both expressions are coextensive. It proceeds in six steps13 : (17a) The speaker used marked expression E ′ containing ‘extra’ material (. . . ) when a corresponding unmarked expression E, essentially coextensive with it, was available. (17b) Either (i) the ‘extra’ material was irrelevant and unnecessary, or (ii) it was necessary (i.e. E could not have been appropriately used). (17c) [(17b(i)) is excluded14 ] (17d) Therefore, (17b(ii)), from (17b), (17c) by modus tollendo ponens. (17e) The unmarked alternative E tends to become associated (by use or — through conventionalization — by meaning) with unmarked situation s, (. . . ). (17f) The marked alternative E ′ tends to become associated with the complement of s with respect to the original extension of E/E ′ . (. . . ) In (17d) Horn shows that the use of E ′ was necessary (17b(ii)), and in (17e) he concludes that therefore the unmarked expression E becomes associated with the normal situation s (through a diachronic learning process). But why can the use of E ′ be necessary if both expressions are coextensive? If my theory presented in Section 4 is correct, then the use of E ′ was necessary because the unmarked expression E had become associated with the normal situation s. Hence, the diachronic learning process in (17e) comes first, and then makes the use of E ′ necessary. There is a gap in Horn’s derivation. I suspect that his necessity presupposes the principle of division of pragmatic labour, hence begging the argument. Blutner developed Bi–OT by fusing Horn’s pragmatic approach and optimality theory. In this paper we are going to re–separate things. We derive Horn’s principle by recursion to learning. If the world is such that in every actual instance where an unmarked form E is used it turns out that the classified event or object is of type t, then the hearer learns to associate E with t. This will explain why the speaker cannot use E for classifying an unusual event or object. Before we introduce our theory, we discuss alternative foundational approaches based on (Evolutionary) Game Theory.

3

Game Theory and Partial Blocking

We saw in Section 2.2 that Bi–OT over–predicts partial blocking if applied too naively. Originally Blutner intended his theory not as a synchronic theory, i.e. as a theory which models the actual reasoning of interlocutors in an utterance situation. Weak optimality was intended to select diachronically stable form–meaning pairs. Soon after emergence of Bi–OT, Game Theory has been proposed as a foundational framework15 . It allows to embed OT within a well understood theory of rational decision. In addition, there has been important work by Prashant Parikh on resolving ambiguities within game theoretic frameworks before16 . This problem may seem to be similar to partial blocking. For the following discussion we concentrate on van Rooy’s paper (2002) because he explicitly proposes his theory as a game theoretic explanation of Horn’s division of pragmatic labour. 13 Cited

after (Horn, 1984), p. 22, with minor abbreviations. to Horn’s R Principle related to the speaker’s preferences for economic forms. 15 It was (Dekker & v. Rooy, 2000). 16 See (Parikh, 1990, 1991, 2000, 2001) and discussion of his work in (v. Rooy, 2002).

14 Due

7

Our aim in this section is not so much to show weaknesses of these approaches but to show that they apply to different problems. In the previous section we introduced a distinction between two classes of examples: (A) A class where at least one critical expression has more than one reading, i.e. is ambiguous. (B) A class where all critical expressions have only one reading but can differ with respect to their extensions. We may add a third class: (C) A class where at least one critical expression has no reading at all, i.e. has no semantic meaning. Our observations indicate that partial blocking belongs only to class (B). In this section we will argue that game theoretic approaches like that of Parikh and van Rooy are more suitable for classes (A) and (C).

3.1

A General Outline

A game theoretic model for communication involves a range of parameters: Properties of the environment, conditions on the epistemic states of participants, their choices of actions, their payoffs, and their expectation about states of the world. If we use the apparatus of Game Theory, then there should be at least a possibility to specify payoffs. Else, the fragment of the theory which is really used becomes very weak. Van Rooy (2002) provides for an excellent overview of game theoretic approaches related to partial blocking and Horn’s division of pragmatic labour. We concentrate here on what van Rooy calls Parikhian signalling games (v. Rooy, 2002, Sec. 4.4). Other models may vary with respect to some parameters but the general picture fits for all of them. General Setting: • The Speaker knows the situation t ∈ M and chooses a signal F ∈ F, the hearer has to interpret the signal F by choosing a state of affairs t as meaning17 . • The game is sequential, what counts are the participants’ choices of strategies. They are represented by functions: – S : M −→ F representing the speaker’s choice of a form for each situation t ∈ M. – H : F −→ M representing the hearer’s choice of an interpretation for each form F ∈ F. • Each state t ∈ M has a certain probability P (t). • The complexity (Parikh) of a form F is measured by a positive real–valued function c where c(F ) < c(F ′ ) iff F is less complex than F ′ . The utility of the speaker 17 This is a game with private information. Different models may differ with respect to their assumptions about epistemic states of interlocutors.

8

choosing S(t) and the hearer choosing H(S(t)) in situation t is then measured by a function U with:  c(S(t))−1 if H(S(t)) = t, (3.1) U (t, S(t), H(S(t))) = 0 otherwise. A strategy tells us how each participant behaves given certain contextual information. For the speaker it must tell us which form F he chooses for a given meaning t. For the hearer it must tell us which meaning t he chooses for which form F . The following tables show a speaker and a hearer strategy for a situation with two possible states of affairs t, t′ , two possible forms F, F ′ , S(t) = F ′ , S(t′ ) = F , H(F ) = t′ and H(F ′ ) = t18 . S

t F′

t′ F

H

F t′

F′ t

The probability P tells us how probable is an event or state of affairs, not with what probability a certain signal should be related to which state. So P does not depend on the signal in this setting. The probabilities are the counterpart of the hearers preferences on interpretations in Bi–OT. The function U measures the utility of the strategies S and H for each situation t ∈ M . We can think of it as the product h · k of two functions h and k, where h tells us whether communication is successful, and k measures the costs for the complexity of the produced signal. Communication is successful iff H(S(t)) = t, i.e. h(t) = 0 iff H(S(t)) 6= t, h(t) = 1 iff H(S(t)) = t. The complexity is measured by k(t) = c(S(t))−1 . c is the counterpart of the speaker’s preferences on forms in Bi–OT. If c(F ) is large, i.e. if F is a very complex form, then c(F )−1 becomes small, hence the utility is diminished. The speaker will always prefer less complex forms. We may notice that c does not depend on the meaning t. In Bi–OT we calculate the weakly optimal form–meaning pairs and assume that speakers and interpreters coordinate on these pairs. In game theory it is assumed that it is rational for agents to coordinate on certain equilibria. The counterpart of a (strongly) optimal form–meaning pair in OT is a Nash equilibrium. A pair hF, ti is a Nash equilibrium if for any other choice F ′ of the speaker and t′ of the hearer the utilities of hF ′ , ti and hF, t′ i are not higher than the utility of hF, ti. General Strategy: scheme:

Game theoretic and evolutionary approaches follow the following

1. The speaker’s and the hearer’s choices of signalling and interpretation strategies is formulated as a decision problem. 2. A certain class of Nash equilibria is introduced — call them X–equilibria19 . 3. The decision problem is solved if there is a unique X–equilibrium. 4. It is assumed that interlocutors coordinate in the long run at X–equilibria. How is this related to the principle of weak optimality in Bi–OT? The principle of Weak Optimality is modelled as a property of signalling and interpretation strategies. The principle is explained if all X–optimal pairs of signalling and interpretation strategies have 18 This

and all other game examples in sections 3.1 and 3.2 are taken from (v. Rooy, 2002). approaches may differ according to their choices: Lewis (1969) considered coordination equilibria, Parikh (2001) Pareto equilibria, and van Rooy (2002) evolutionary equilibria. 19 Different

9

this property. In general, this approach justifies some pragmatic principle of coordination if we can show that any X–optimal pair of signalling and interpretation strategies adheres to it. The main difficulty with applying game theoretic methods is not to calculate probabilities and utilities, but to find the correct model which fits to the empirical problem under consideration. In the next sub–section we give some evidence that these approaches are not suitable for explaining examples of class (B).

3.2

Van Rooy’s Principle

Van Rooy observes that if communication should be successful, i.e. if H(S(t)) = t, then speakers and hearers must coordinate on separating strategy pairs hS, Hi, i.e. there must be a subset of forms F ′ such that H ◦ S maps F ′ 1–1 onto M. This implies that it is desirable that speaker’s strategies are too separating, i.e. that t 6= t′ implies S(t) 6= S(t′ ). Only then it can be guaranteed that every state of affairs can be expressed by language. If the speaker’s strategy is not separating, then communication must fail for at least one situation, i.e. there exists t ∈ M such that H(S(t)) 6= t. If it is rational for interlocutors to coordinate on strategies where communication is always successful, then the following principle must hold: (10) Suppose that F is a lighter expression than F ′ , F > F ′ , and that F ′ can only mean t, but F can mean both. Suppose, moreover, that t is more salient, or more stereotypical, than t′ , t > t′20 , then speaker and hearer coordinate on strategy pairs hS, Hi such that S(t) = F ′ , S(t′ ) = F , H(F ) = t′ and H(F ′ ) = t. Van Rooy introduces his principle as a counterexample for Bi–OT and Parikh’s approach. We can represent the situation by the following graph: (11) F F′

t • ↑ •

←−

t′ •

It is not difficult to see that van Rooy’s principle (10) contradicts Bi–OT and Horn’s division of pragmatic labour. Clearly hF, ti is optimal. If we then reduce the graph and eliminate all nodes in the row and column containing hF, ti, then no combination remains. Hence, Bi–OT predicts that F denotes t — and t cannot be expressed. There arise similar consequences in Parikh’s approach. Lets consider a game theoretic model as outlined in the previous sub–section. Suppose that higher salience is modelled by the following probabilities: P (t) = 0.8 and P (t′ ) = 0.2, i.e. t is much more expected than t′ . This gives rise to the following tables (v. Rooy, 2002, Sec. 3.2): S1 S2

t F F′

t′ F F

H1 H2

F t t′

F′ t t

t H1 H2 S1 1 0 S2 0.5 0.5

t′ S1 S2

H1 0 0

H2 1 1

S1 S2

H1 H2 0.8 0.2 0.4 0.6

The first two tables represent two possible strategies for the speaker and for the hearer. The third table shows the utilities of strategy pairs hSi , Hj i in situation t, and the fourth table shows them for t′ . For calculating the utilities we need to know the complexities of F and F ′ . We get the results shown in the third and fourth table with c(F ) = 1 and 20 The

first part is cited from (v. Rooy, 2002, Sec. 3.2, p. 13). The notation is slightly adapted.

10

c(F ′ ) = 2, i.e. F ′ is twice that complex as F . The fifth table shows the expected utility of strategy pairs21 . Parikh uses a notion of X–equilibrium where we need to compare the expected utilities22 . It selects the strategy pair hS1 , H1 i because it has the highest expected utility, hence interlocutors should coordinate on it. This means t′ cannot be expressed. Van Rooy points out that it would be desirable to coordinate on the separating strategy hS2 , H2 i. This is the only strategy pair where communication is always successful. He therefore applies a different X–equilibrium condition: The strategy pair should be a Nash equilibrium for all situations, i.e. for both t and t′ . In van Rooy’s example we find only one such pair, namely hS2 , H2 i. This predicts successfully that the separating strategy should be selected. Evaluating van Rooy’s Principle for Class (B) We now want to show that van Rooy’s principle is not applicable in class (B). The claim that interlocutors always coordinate on the separating strategy seems to be incorrect: (12)

a) Zwei Amerikaner wurden bei dem Anschlag get¨ otet. b) Mehrere Afrikaner wurden in der S-Bahn angep¨ obelt. c) Eine Gruppe von Asiaten besucht den Reichstag. a) Two Americans were in the plot killed. b) Some Africans were in the city train verbally abused. c) A group of Asians visited the Reichstag. Without special context these sentences must be understood as: a) Zwei US–Amerikaner wurden bei dem Anschlag get¨ otet. b) Mehrere Schwarzafrikaner wurden in der S-Bahn angep¨ obelt. c) Eine Gruppe von Ostasiaten besucht den Reichstag. a) Two US–Americans were killed in the plot. b) Some Black–Africans23 were verbally abused in the city train. c) A group of East–Asians visited the Reichstag.

The critical expressions are Amerikaner, Afrikaner and Asiaten. They have a wider extension than US–Amerikaner, Schwarzafrikaner, and Ostasiate. Moreover, they are lighter than the special expressions and the special expressions can only have a special meaning. We assume here that (a) in most cases where Germans talk about inhabitants of the American continent, they talk about US–Americans, (b) most Africans Germans get in contact with and recognise as Africans are from Schwarzafrika, (c) most Asiatic visitors groups come from East Asia, and we assume that the difference between US–Americans and Non–US–Americans, North–Africans and Non–North–Africans, East–Asians and Non– East–Asians matters. If we naively apply van Rooy’s principle, then we should expect a 21 To get e.g. the expected utility of the pair hS , H i shown in the upper left corner of the fifth table, 1 1 we have (a) to multiply its utility for t shown in the third table with the probability of t, and add (b) the product of its utility for t′ shown in the fourth table and the probability P (t′ ). We get 1·0.8+0·0.2 = 0.8. 22 Parikh looks for Pareto–Nash equilibria (Parikh, 2001, Sec. 4.5). 23 This has to be read as a direct translation from German. In German Schwarzafrikaner has no racist connotations.

11

tendency for Amerikaner to denote Non–US–Americans, for Afrikaner to denote North– Africans, etc. But we observe the opposite effect. This effect is not confined to examples where we classify people according to their nationality: (13)

a) Hans macht Urlaub in Amerika. b) Hans f¨ ahrt seinen Wagen in die Garage. c) Hans hat sich Bleistifte gekauft. a) Hans makes holidays in America. b) Hans drives his car into the garage. c) Hans has himself pencils bought. Without additional contextual information these sentences cannot be used if: a) Hans macht Urlaub in Chile. b) Hans f¨ ahrt seinen Handwagen in die Garage. c) Hans hat sich Druckminen–Bleistifte gekauft.

The first example must be understood as meaning that Hans makes holidays in the USA. Wagen can have a very wide meaning including both, a car and a hand cart (Handwagen). Bleistift can mean both, normal pencils you can sharpen with a sharpener, or mechanical pencils which you can push at the top to get out the mine. The lighter, more general expression has always the tendency to denote the normal case. What if van Rooy’s principle could be applied to these examples? It would predict the contrary effect. This shows that van Rooy’s principle is violated in class (B) — if applied too naively, of course. I mean by applying naively: applying without checking the preconditions. There are two points which seem to me especially important for why van Rooy’s game theoretic model cannot be used for class (B): 1. It presupposes that the meaning of at least some relevant forms is underspecified and can be estimated only with some probability. 2. He explains division of pragmatic labour by starting with non–separating signalling systems, and then tries to show that they develop into separating ones. This implies that the models cannot be applied if: 1. Forms have unique meanings. 2. Languages are separating. This is the situation we find in examples of class (B). We can always assume that natural language is fine grained enough to express every state of affairs, i.e. we can assume that natural language is separating. Hence, the central problem with partial blocking phenomena is to explain how there can be shifts in meaning for signalling systems that are (a) separating and (b) unambiguous. If this is true, then partial blocking poses a type of problem which is sharply differentiated from the problems approached by van Rooy. Parikh addresses questions of disambiguation. Hence, although this might at first sight seem to be a similar issue, it constitutes a different type of problem.

12

4

Associative Learning and Partial Blocking

Let us reconsider our introductory example (1), repeated as (14): (14)

a) Black Bart killed the sheriff. b) Black Bart caused the sheriff to die.

It has to be explain why kill tends to denote a normal killing event whereas caused to die tends to denote an untypical killing event. I want to show that partial blocking can be explained as an effect of associative learning and speaker’s preferences. It emerges as the result of a process which divides into the following stages: (1) In the initial stage all killing events are direct killing events. The speaker will always use kill to denote these events. (2) Interpreters will learn that kill is always connected with direct killing. They associate kill with direct killing. (3) The speaker will learn that hearers associate kill with direct killing. (4) If then an exceptional event occurs where the killing is an indirect killing, then the speaker has to avoid misleading associations, and use a different form. In this case the more complex form cause to die. (5) The hearer will then learn that cause to die is always connected to an untypical killing. By associative learning we mean the learning process in (2), (3) and (5). For the hearer I assume that the following principle holds: (H) In every actual instance where the form F is used for classifying events or objects it turns out that the classified event or object is at least of type t, then the hearer learns to associate F with t, i.e. he learns to interpret F as t. A similar principle is assumed for the speaker to explain step (3): (S) In every actual instance where the form F is used for classifying events or objects it turns out that the hearer interprets F as t, then the speaker learns that he can use F for expressing t. Before presenting a formal model we want to add two clarifying remarks. (1) If we compare only terms of the same category, then most probably they don’t build a separating system. Hence, if we compare forms according to their meanings, then it is not only the meaning of words that is involved: 1. The dress is pink. 2. The dress is pale red. 3. The dress is pale red but not pink. All three phrases, pink, pale red, and pale red but not pink, are forms which the speaker can choose. The forms F may even be lengthy descriptions of a situation. (2) If we say that a form F strengthened its meaning due to a diachronic process, then we don’t mean to say that it changed its semantics. All predicted strengthenings remain defeasible. I will describe the learning process by diachronic models, where a diachronic model is a sequence of synchronic stages. The process described in (1), (2), and (3) will be assigned to one stage in the diachronic model, and the processes (4) and (5) to a second stage. The proposed models have to be understood as hypothetical models. The data could be explained by them. In this respect they suffer from the same shortcomings as evolutionary, game theoretic, and OT models. They are all hypothetical models. The formal model must contain the following elements: 13

1. A representation for the meaning of words and phrases. We use simple attribute– value functions. 2. A representation for the semantics of a given language NL. We introduce for this end structures which we call Synchronic Semantic Models. 3. A representation for the speaker’s preferences on forms. We do this by adding a pre–order  on NL to synchronic semantic models. Less obvious from the previous discussion, we will need also: 4. A representation for the speaker’s knowledge about the object or event he wants to classify. We use again an attribute–value function. 5. A representation for the speaker’s intentions how to classify an object or event. Again we use an attribute–value function. These elements form the static part of our model. What does change diachronically? 6. The type of objects and events which actually occur. We represent the actual occurrences of objects and events during a period i by a set E i . 7. The hearer’s interpretation of forms. We represent it by a function H. 8. The speaker’s choice of forms. We represent it by a function S. The functions S and H are the counterparts of the speaker’s and hearer’s strategies in game theoretic approaches. In our model, the speaker’s strategy S will depend on his knowledge about an object or event e and his intentions how to classify it. In a first section we introduce the static model. Then we consider a special case: Examples where objects can be classified by attribute–value functions with one feature and two values. With this background we introduce the general dynamic model. In a fourth sub–section we reconsider our examples for partial blocking.

4.1

The Static Model

We consider settings of the following form: There is an object or event e and the speaker wants to classify it as being of a certain type f . Maybe he knows more about the object, maybe he knows that it is in fact of a more special type t. But all he wants to communicate is that it is of type f . He has to choose a form F such that the hearer can conclude that the object or event e is of type f . This explains why we need a representation for speaker’s knowledge and intentions. Definition 4.1 (Attribute–Value Function) 1. Feat: A set of features; 2. Val: A set of values {0, 1, −1}; 3. Type: A set of attribute–value functions {f | f : Feat −→ Val}; 4. Type∗ : The set of all attribute–value functions in Type where all values are elements of {1, −1};

14

Feat is a set of the usual semantic features. The values are to be interpreted as follows: Let m be some feature representing some property of objects, f an attribute–value function, and e an object of type f . Then f (m) = 1 means that e does have the property m; f (m) = −1 means that e does not have the property m; and f (m) = 0 means that e may or may not have property m. If f ∈ Type∗ , then for all properties it is specified whether the object has this property or not. Attribute–value functions are a very primitive examples for typed feature structures24 . The following definition introduces some useful relations and operations for attribute– value functions. Definition 4.2 Let f , f ′ ∈ Type be two attribute–value functions. f is more informative than f ′ , f ≤ f ′ , iff: ∀m ∈ dom f : f ′ (m) 6= 0 → f ′ (m) = f (m).

(4.2)

They are compatible, f |f ′ , iff ∀m ∈ dom f : f (m) 6= 0 ∧ f ′ (m) 6= 0 → f (m) = f ′ (m). If they are compatible, then their meet f ∧ f ′ is defined by:  f (m) for f (m) 6= 0, ′ (f ∧ f )(m) := f ′ (m) else. The join f ∨ f ′ of two attribute–value functions is defined by:  f (m) if f (m) = f ′ (m), (f ∨ f ′ )(m) := 0 else.

(4.3)

(4.4)

(4.5)

We noted in the last section that the central problem with partial blocking phenomena is to explain how there can be shifts in meaning for signalling systems that are (a) separating and (b) unambiguous. Hence, we assume that in the initial situation choice and interpretation of language is governed by its (unambiguous) semantics. We capture the semantics in a Synchronic Semantic Model. Definition 4.3 (Synchronic Semantic Model) A synchronic semantic model is a tuple M = hE, Type, NL, [ ], : i with: 1. E is a set of entities; 2. NL is a set of phrases of a given language. 3. [ ] : NL −→ Type is a map from NL onto Type; 4. : : E −→ Type∗ is a map from E to Type∗ . That the initial language is unambiguous and separating is captured by the conditions on [ ]: That it is a function implies that it is unambiguous; that it is onto implies that it is separating. Onto means that there exists for every type f ∈ Type a form F ∈ N L such that F means f , i.e. [F ] = f . E is intended to represent all possible occurrences of objects or events, not just the actual ones. Different occurrences of the same object may be represented by different 24 See

(Carpenter, 1992), Def. 3.1, p. 36, or Def. 4.1, p. 52.

15

entities in E. Furthermore, it should represent not only occurrences of actual objects but of all possible objects. Structurally, E is just a set. : tells us for all possible entities e ∈ E what properties they do possess. t ∈ Type∗ means that all features must be specified. We define the meaning of a type f ∈ Type for a given synchronic semantic model M by: [[f ]] := {e ∈ E | ∃t ∈ Type∗ : t ≤ f ∧ e : t}.

(4.6)

We can identify the meaning of an expression F ∈ NL with: [[F ]] := {e ∈ E | ∃t ∈ Type∗ : t ≤ [F ] ∧ e : t} = [[[F ]]]. We extend the definition of : to all f ∈ T ype by: e : f Synchronic semantic models represent the purely part we need to represent the speaker’s preferences preference relation to synchronic semantic models. simple synchronic system.

(4.7)

iff e ∈ [[f ]]. semantic part. For the pragmatic on forms. We do it by adding a We call the resulting structure a

Definition 4.4 (Simple Synchronic System) A structure S = hM, i is a simple synchronic system if 1. M = hE, Type, NL, [ ], : i is a synchronic semantic model; 2.  is a linear well–founded order on NL. We note that the preferences are thereby independent of the meanings of forms. The same assumption we found in game theoretic models but it is far less general than the situations considered by Bi–OT. This concludes the definition of the static model. But before we consider a special type of examples, we introduce a model for the initial situation. The Initial Situation Our task is to explain shifts of meaning when we start with an unambiguous and separating language. The initial selection and interpretation strategies must be defined by pure semantics and the speaker’s preferences. We have mentioned before that we consider situations where the speaker wants to classify some entity e as being of a certain type f ′ . He knows that it is in fact of type f . He has to choose a form F such that the hearer can conclude that e is of type f ′ . We assume that the speaker always selects the form which is most preferred and successful. How to represent selection and interpretation strategies? By functions S and H for the speaker’s selection strategy and the hearer’s interpretation strategy. Our conditions are captured by the following strategy pair: 1. S : Type × Type −→ NL is a partial function with: (f , f ′ ) ∈ dom S ⇒ f ≤ f ′ and S(f , f ′ ) := min{F ∈ NL | f ≤ [F ] ≤ f ′ }. 2. H : NL −→ Type with H(F ) := [F ]. The condition (f , f ′ ) ∈ domS ⇒ f ≤ f ′ means that the speaker will only say what he thinks is true. The hearer’s initial interpretation simply follows the rules of pure semantics. The definitions imply that f ≤ H(S(f , f ′ )) ≤ f ′ ,

(4.8) 16

i.e. the speaker will always have success. The definition does not capture that the speaker does classify entities correctly. We can extend S to E such that it is guaranteed that he never makes mistakes: • he, f , f ′ i ∈ dom S ⇒ e : f and S(e, f , f ′ ) = S(f , f ′ ) as defined before.

4.2

The Situation with two Basic Types

In this section we look at a special case: The situation for one feature with two values. All examples considered so far are of this type, at least after some simplifications of the scenarios. E.g. in (1) the question was whether the killing is direct or not. Hence we can assume one feature direct with possible values −1 and 1 for not direct and direct. In (6) the question was whether it is water or not what Hans uses for mopping the floor. Let S be a given simple synchronic system. According to our assumption, languages are separating. This implies that there is a subset F ′ of NL such that [ ] maps F ′ 1–1 onto Type. Hence, for each f ∈ Type the set [ ]−1 (f ) = {F ∈ NL | [F ] = f } of synonymous forms expressing f is not empty. The classified events or objects can differ only with respect to one feature. As attribute–value functions can have only three values, namely {−1, 0, 1}, there are only three classes of forms to be considered. Let m be a feature and [i] := {F ∈ F |[F ](m) = i}. As  is a linear well–founded order on NL, it follows that for each [i] there is a unique minimal form in [i]. As the speaker will choose the most preferred form, he has to consider only three forms: The minimal elements of [−1], [0] and [1]. In general, if we consider a situation with two basic types t0 and t1 , then there are only three forms F0 , F1 , F2 the speaker has to consider for making his choice. Without loss of generality we can assume that [F0 ] = t0 , [F1 ] = t1 and [F2 ] = t0 ∨ t1 . Hence, F2 always denotes the form with the wider meaning. We can further assume that in general F0 is preferred over F1 . Hence, we arrive at the following complete classification of all choice situations with two basic types: t0 F0

t1

t

F0

6

t

F1 F2

t

t0

Case I

6 t

F2

t1

t

6 t

F1 Case II

F2 t

6 t

F0

t0

t1

t

t

6 t

F1 Case III

6 t

The topmost form is the most preferred one, the lowest the least preferred. The vertical arrow indicates the speaker’s preferences. The horizontal line means that the respective form has an extension which comprises the meaning of both types t0 and t1 . We list one (German) example for each case: Case I 1. Vater — Mutter — Elternteil 2. father — mother — one of the parents. Case II 17

1. Wasser — Fl¨ ussigkeit — alkoholische Essenz 2. water — liquid — alcoholic essence. Case III 1. Amerikaner — Nordamerikaner — Lateinamerikaner 2. American — North American — Latin American The speaker’s choice depends on the epistemic context and his communicative intentions. So an utterance situation is given by an entity e which has to be classified, the speaker’s knowledge f about e, and his intention to classify it as f ′ . A situation he, f , f ′ i totally determines his choice. This was captured in the definition of his initial selection strategy S 25 . There are only three types involved if we confine considerations to the situation with two basic types. Hence, it is feasible to provide graphical representations for all possible choice situations. The first row in the following graphs represents the speaker’s possible intentions how to classify an object. 2S is to be read as the speaker knows that. . . Hence, 2S t0 means that the speaker knows that the entity he classifies is of type t0 . The circles around bullets are to indicate that these form–meaning pairs are optimal according to his preferences. The arrows from 2S ti indicates that this optimality depends on the speaker’s knowledge26 . This leads to the following graph for Case I:

F0 F1 F2

t0 t1 t0 ∨ t1   u u  2S t0     6 u u  2S t1   6 u 2S t0 2S t1 Case I

We restrict attention to the cases where the speaker does know whether either the object to be classified is of type t0 or t1 . If he knows only that it is of type t0 ∨ t1 , then he can only choose F2 . What can we read off from this graph? We see that there is never a reason for the speaker to use the general, heaviest form F2 . Diachronically, this may explain why these forms disappear27 . The following graph represents the situation of Case II: 25 See

p. 16. ideally we should have a 3–dimensional representation with t0 , t1 , t0 ∨ t1 as arguments of the two dimensions ‘speaker’s knowledge’ and ‘speaker’s intentions,’ and one dimension F1 , F2 , F3 for the possible values of S. 27 This is beyond our theory because I will not go to explain why forms die out. 26 So

18

F0 F2

t0 t1  u  6

F1 2S t0

t0 ∨ t1  u  2S t0  

u  2S t1   u u  6 2S t1

Case II We can see that the speaker will use the general form F2 only if he knows that the entity e to be classified is of type t1 . Hence, as a matter of fact, if the hearer knows that the speaker knows the type of e, he can safely infer from an utterance of F2 that the entity is of type t1 . The following graph represents the situation of Case III: t0 ∨ t1  u  2S t0 /2S t1   u u   6 u u t0

F2 F0 F1

2S t0

t1

 6 2S t1

Case III If F2 is the lightest expression, then the hearer can extract the least information from its utterance. The speaker may classify the entity as t0 or as t1 , in both cases he will use F2 . Hence, the hearer cannot obtain more information than is conveyed by the semantics of F2 . So even a first survey of the initial situation shows how associative learning can lead to stronger interpretations. If both basic types are realised and all possible speaker’s intentions, then strengthening can only occur in a specific type of examples, namely Case II examples. Moreover, the survey provides us with a complete classification of utterance situations. We will make use of this classification when discussing our examples. The Coordination Problem In Game Theoretic approaches discussed in Section 3 the coordination problem imposed by interpretation plays a central role. The previous graphical representations for cases I—III are not representations for the decision problem. If we write the speaker’s possible choices for forms on the left side, the hearer’s choices of interpretations in the top row and indicate by bullets grammatical form–meaning pairs, then we get the following graph depicted on the left side:

19

t0 F0 F1

t1

t0 ∨ t1

u

t0

t1

t0 ∨ t1

b0 a0 u

F2

b1 a1 u

b2 a2

However we choose the payoffs ai for the speaker and bi for the hearer, they will always coordinate on a Nash–equilibrium. As the game is sequential, there is in fact nothing to choose for the interpreter. Hence, the coordination problem in the initial situation is trivial. It is solved by semantics. And we add, it will remain trivial for all subsequent diachronic coordination problems.

4.3

The Diachronic Model

What does change over time? We consider three parameters: (1) The type of events classified by the speaker, (2) the hearer’s interpretation strategy, and (3) the speaker’s selection strategy for forms. We assume that the causal order of these changes follows the order (1)—(3). At least this is the order indicated by our informal explanation for the Kill–and–Cause–to–Die–Example (1). The dynamics in the first period follows the following schema: • Initially selection and interpretation strategies are given by pure semantics. • Then the hearer learns that the factual information of an utterance is stronger than its semantics. • The speaker learns to exploit this situation. What are selection and interpretation strategies? When is communication successful? Definition 4.5 (Selection and Interpretation Strategy) Let hM, i be a given simple synchronic system with synchronic semantic model M = hE, Type, NL, [ ], : i. 1. We call a partial function S : Type × Type −→ NL a selection strategy for forms if (f , f ′ ) ∈ dom S ⇔ f ≤ f ′ . 2. We call a function H an interpretation strategy if H : NL −→ Type. Definition 4.6 (Condition of Success) Let hM, i be a given simple synchronic system with M = hE, Type, NL, [ ], : i. Then a pair of selection and interpretation strategies hS, Hi is successful if for all (f , f ′ ) ∈ domS: f ≤ H(S(f , f ′ )) ≤ f ′

(4.9)

We repeat the definitions of the initial selection and interpretation strategies from p. 16: Definition 4.7 (Initial Situation) Let hM, i be a given simple synchronic system with M = hE, Type, NL, [ ], : i. Then let 20

1. S 0 : Type × Type −→ NL be the selection strategy defined by S 0 (f , f ′ ) := min{F ∈ NL | f ≤ [F ] ≤ f ′ }, 2. H 0 : NL −→ Type the function defined by H 0 (F ) := [F ]. The Diachronic Model Defined We have to describe how selection and interpretation strategies change from stage to stage. We repeat the informal description of the two basic principles governing changes: (H) In every actual instance where the form F is used for classifying events or objects it turns out that the classified event or object is at least of type f , then the hearer learns to associate F with f , i.e. he learns to interpret F as f . (S) In every actual instance where the form F is used for classifying events or objects it turns out that the hearer interprets F as f , then the speaker learns that he can use F for expressing f . We assume from now on that there is a given fixed simple synchronic system hM, i with M = hE, Type, NL, [ ], : i. We describe the diachronic development of selection and interpretation strategies by a sequence (Syni )i=0,...,n of synchronic stages.

Definition 4.8 (Synchronic Stage) A synchronic stage is a triple Syni = E i , S i , H i with E i ⊆ E × Type × Type and: he, f , f ′ i ∈ E i ⇒ e : f & f ≤ f ′ .

(4.10)

This means that every synchronic stage is characterised by (1) the set of utterance situations, where each situation comprises a classified entity e, the speaker’s knowledge f about e, and his intentions to classify e as f ′ ; (2) the speaker’s selection strategy; and (3) the hearer’s interpretation strategy. Now and then we will identify E i with {e ∈ E | ∃f , f ′ he, f , f ′ i ∈ E i }. Lets assume that the hearer recognises that whenever an entity is classified by use of a form F it is at least of type f . There are many types f ′ such that the hearer could recognise that the entity is always of type f ′ . But there can be only one minimal such type — and there is always a minimal such type. This follows from our definition of attribute–value functions in 4.1. The following definition contains the idea of the paper in a nutshell. Assume we are in stage Synn = hE n , S n , H n i. How does the new selection and interpretation strategies in the next stage Synn+1 look like? H n+1 (F ) S n+1 (f , f ′ )

:= min{f ∈ Type | f ≤ H n (F ) ∧ ||F ||n ⊆ [[f ]]n } _ = H n (F ) ∧ {t ∈ T ype∗ | ∃e ∈ ||F ||n e : t}. := min{F ∈ NL | f ≤ H n+1 (F ) ≤ f ′ }.

(4.11)

(4.12)

Where [[f ]]n denotes the extension of f in E n , i.e. [[f ]]n := {e ∈ E n | e : f }; ||F ||n is the set of all entities where the speaker has in fact used F to classify them, i.e. ||F ||n := {e ∈ E | ∃f , f ′ : he, f , f ′ i ∈ E n ∧ S n (f , f ′ ) = F }.

(4.13)

H n+1 and S n+1 describe both, the hearer’s and the speaker’s learning. The hearer’s learning precedes the speaker’s, but we put both processes together in one stage.

21

Problem: This learning should take place only with respect to actually used forms. If a form is never used, then the hearer can associate no restricted information with this form. Hence, we have to check which forms are used in a stage. We collect them in a set NLn+1 : NLn+1 := {F ∈ NL | ∃(e, f , f ′ ) ∈ En S n (f , f ′ ) = F }

(4.14)

If learning takes place with respect to NLn+1 only, then we have to restrict the definition of H n+1 in (4.11) to this set. The actual selection and interpretation functions H n+1 are defined by:  n H (F ) for F 6∈ NLn+1 n+1 H (F ) := , (4.15) H∗n+1 (F ) else where H∗n+1 is the function defined in (4.11). At this point it is not clear whether S n+1 is always well-defined. We will see later that it is if there are at least n + 1 synonyms for every type f . We notice that a diachronic model (Syni )i=0,...,n is totally determined by the underlying synchronic system hM, i and the sequence (E i )i=0,...,n of actual utterance situations.

4.4

Examples

In this section we show how to treat the examples presented so far. We first address the examples Section4.2. They are especially simple to treat and provide the basis for other examples. Then we look at examples where the situation can be characterised by a Blutner–Square (2). Next we address examples (12) and (13) which proved to be problematic for Van Rooy’s Principle. This leads us to a discussion of some problematic examples, which closes this section. The Situation with two Basic Types There are three forms F0 , F1 , F2 and two types t0 , t1 . Semantically Fi has to be interpreted as ti for i = 1, 2, and F2 as t0 ∨ t1 . The possible intentions and epistemic states of the speaker are simultaneously

given by t0 , t1 , t0 ∨t1 . The diachronic model will be a sequence (Syni )i=0,1 with Syni = E i , S i , H i . We consider only the case where the speaker always knows whether the entity he wants to classify is of type t0 or t1 , i.e. ∀i : he, f , f ′ i ∈ E i ⇒ (f = t0 ∨ f = t1 ).

(4.16)

Let us further assume that for all (E i )i=0,1 every type is represented at least once and that there is for each type at least one situation where the speaker does not care how to classifies the entity. Hence we have a situation where: ∀i, j (∃e ∈ E i e : tj & ∃ he, tj , f ′ i ∈ E i f ′ = t0 ∨ t1 ).

(4.17)

Case I: Our example here was F0 =father, F1 = mother, F2 = one of the parents. We saw that in this case the meaning of F0 and F1 remain unchanged and F2 should never be used. It is easy to check that indeed H 1 (Fi ) = ti for i = 0, 1, and that F2 6∈ NL1 .

22

Case II Here a strengthening of interpretation will occur. We consider the example with Wasser–Fl¨ ussigkeit–Alkoholische Essenz. The only possibility where the speaker selects F2 :=Fl¨ ussigkeit is the case where he knows that a liquid e is a t1 := alcoholic essence and where it is not important to him whether or not the hearer is informed about this fact. Hence ||F2 || ⊆ {e ∈ E 0 | e : t1 } and H 1 (F2 ) = t1 = alcoholic essence. This analysis rests upon the assumption that the speaker does sometimes not care how he classifies the liquid. There is another scenario where Fl¨ ussigkeit can develop to mean not–water. Let us give up assumption (4.16) and replace it by ∀ he, f , f ′ i ∈ E 0 : (e : t0 ⇒ f = t0 ) ∧ (e : t1 ⇒ f = t0 ∨ t1 ).

(4.18)

This means that the speaker can only identify water. If it is not water, then he has no idea what it is. Of course, this inability to identify a liquid is then an indicator that it is not water. It is easy to see that then again H 1 (F2 ) = t1 . Hence S 1 (t1 , f ′ ) = F2 . If then in period E 1 he can identify a liquid as not being water, he can classify it most economically as t1 by using F2 . Case III In this case nothing changes. It is easy to see that H 1 = H 0 , S 1 = S 0 and NL1 = NL0 . If we give up (4.17), then strengthening of interpretation may occur. Lets reconsider the example with F2 = American, F0 = North American = t0 , and F1 = Latin American = t1 . If we assume that the speaker cares about how he classifies a person if that person is Latin American but not if he knows that he is North American, i.e. if for all (e, f , f ′ ) ∈ E 0 : (f = t0 ⇒ f ′ = t0 ∨ t1 ) ∧ (f = t1 ⇒ f ′ = t1 ),

(4.19) 1

then it follows that F2 = American will develop to mean North American: H (F2 ) = t0 because of ||F2 ||0 ⊆ [[t0 ]]0 . Blutner–Squares We consider first the Kill–and–Cause–to–Die Example (1). We can consider it to be an example with one feature and two values. The relevant types are t0 = direct killing and t1 = indirect killing. The example belongs to Case III with F2 = kill, F0 = directly killing and F1 = indirectly killing. F3 := cause to die is dominated by F2 . We must assume that F2 ≺ F3 ≺ F0 ≺ F1 , i.e. kill is preferred over cause to die, which is preferred over directly killing, which is again preferred over indirectly killing. At the beginning of Section 4 we outlined informally how the difference in meaning can emerge diachronically. Here we show how this process can be simulated using diachronic models. The model will be a sequence (Syni )i∈{0,1,2} . 1. In a first period Syn0 there are only events e which represent direct killings. The speaker will always use the more preferred form F2 = kill. F0 , F1 and F3 will never be used. W 2. Hence, H 1 (F2 ) = {t ∈ T ype∗ | ∃e ∈ ||F2 ||0 e : t} = t0 . 3. Now, in the second period Syn1 the speaker encounters an instance e′ of an indirect killing. We find that S 1 (t1 , t0 ∨ t1 ) = min{F ∈ NL | t1 ≤ H 1 (F ) ≤ t0 ∨ t1 } = F3 . He cannot select F2 because t1 6≤ H 1 (F2 ). F3 6∈ NL1 , hence we find H 1 (F3 ) = t0 ∨t1 . This shows that the speaker has to use the less preferred form F3 = cause to die. 23

4. There are only two types of utterance situations where the speaker has a reason to choose F3 : for situations of the form he, t1 , t0 ∨ t1 i and he, t0 ∨ t1 , t0 ∨ t1 i. If we assume that he always knows whether it was a direct or indirect killing, i.e. if we assume that (4.16), then H 2 (F3 ) = t1 . We elaborate on this example later. In general we can ask: Given a set of forms which have the same semantic interpretation, how can they change their meaning due to diachronic processes based on associative learning? The Kill–and–Cause–to–Die Example represents only one special case. In Section 5 we provide for general answers to our questions in terms of diachronic laws. The Examples violating Van Rooy’s Principle All the examples in (12) and (13) are of the same form: The lighter and more general form F2 has the tendency to have a special meaning t0 although there is a form F0 which denotes t0 . We consider the following example: (15)

a) Hans f¨ ahrt seinen Wagen in die Garage. b) Hans drives his car into the garage.

Without specific context it can not mean that: (16)

a) Hans f¨ ahrt seinen Handwagen in die Garage. b) Hans drives his hand cart into the garage.

This example belongs to Case III. Let t0 = car, t1 = hand cart; F0 = Kraftfahrzeug, F1 = Handwagen, and F2 = Wagen. Hardly anyone possesses a hand cart today. So we can assume that normally F2 refers in fact to a car if used as in (15). The hearer will learn to interpret F2 as classifying a car. Limitations of the Theory It was for good reason that we choose (15) instead of other examples from (12). Let us consider e.g.: (17)

a) Mehrere Afrikaner wurden in der S-Bahn angep¨ obelt. b) Some Africans were in the city train verbally abused. c) Some Black–Africans were verbally abused in the city train.

Afrikaner must be interpreted as Black–African. The example is of the same type as (15). In (15) we explained the narrowing of the meaning of F2 by the fact that normally nobody has a hand cart, and therefore there are no utterance situations where Hans f¨ ahrt seinen Wagen in die Garage can mean that he put his hand cart into the garage. But this argumentation can hardly be applied to (17). It’s not because North–Africans are rare topics of conversation, but because Black–Africans are more prototypical Africans, that Afrikaner has to be interpreted as Black–African. In our model frequency of entities in conversation is responsible for changes of meaning. Hence, we had to stretch our model somewhat to cover examples like (17). We have nothing substantial to say about the relation between frequency and prototypicallity. We note that there is a limitation and leave it as an open question. Another limitation is shown by the Fahrrad–Examples (5): 24

(18)

a) Hans hat sich ein Rad gekauft. b) Hans hat sich ein Fahrrad gekauft. c) Hans hat sich ein Zweirad gekauft. d) Hans has himself a bicycle bought.

Our theory would predict that Fahrrad and Zweirad are never used. As long as we assume that there is no type of rare bicycles, Fahrrad cannot develop a special meaning. If it is used, then this means that the speaker just missed the most economic form. The case of Fahrrad and Zweirad is problematic. The use of Zweirad in (18) is marked and shows a tendency to mean that the bicycle Hans has bought is not a prototypical bicycle. This is not an outright counterexample for our theory. As Fahrrad and Zweirad are equally complex, our theory cannot say which form is to be selected by the speaker. But the example shows a clear limitation. Our model assumes that forms get stronger interpretations by associative learning. Do we interpret forms identically in all contexts? The only argument of the hearer’s interpretation function is the form F itself but it may well depend on the wider linguistic context. The following examples show that arguments have an influence on interpretation28 : (19)

a) The monk killed the girl. b) The doctor killed the patient. c) The psychologist killed the patient. d) The pharmacist killed the patient. e) The sadist killed his victim.

What do we expect about how the subject killed the object. Is it always that we expect a direct killing? In the first example this seems to be correct, but for the second, third and fourth we would expect that the killing is indirect and in some way connected to the profession of the subject and the role of the victim29 . In e) the expected killing may be more direct but extended and torturous — in contrast to a). Finally we emphasise again that all predicted strengthenings remain defeasible, i.e. they don’t change semantics. (20)

a) Wasser is eine Fl¨ ussigkeit / Water is a liquid. b) John caused Peter to die by stabbing him right into the heart.

These sentences are not contradictory.

5

Principles of Diachronic Change

We are going to explain partial blocking by diachronic laws derived from the underlying learning model. As long as we consider only isolated examples, the models of the last section may be sufficient for explaining the observed data. But if we ask for overall regularities, then it is a great advantage to start with a complete classification of (1) choice situations and (2) laws that describe how these situations can develop diachronically. This will help us e.g. to evaluate the principle of weak optimality as a diachronic principle. We 28 Thanks

to Henk Zeevat and Darrin Hindsill who made me aware of this problem. ‘The doctor/the psychologist/the pharmacist killed the butterfly’ I would expect a much more direct killing. 29 In

25

will reconsider the situation with two basic types and show how the predictions of Horn’s principle can be explained as the result of a diachronic process. The guiding questions of this section are: Is there a complete characterisation of all possible diachronic processes in terms of laws of diachronic change? Given a set of semantically synonymous expressions, how and when can associative learning and speaker’s preferences lead to a change in interpretation? We first work out an answer for the situation with two basic types, and subsequently generalise it for arbitrary cases.

5.1

Laws of Diachronic Change in the Situation with two Basic Types

We confine our considerations to the case with two basic types, i.e. we assume throughout that two basic types t0 and t1 are given in Type∗ , and that forms can only have meanings t0 , t1 , and t0 ∨ t1 . Hence, there can be only three forms F0 , F1 , F2 the speaker can make his choice from. Due to symmetry reasons we can divide all possible situations into three groups: t0 F0

u

F1 F2

t1

u

t0 F0

u

u

F2

u

u

F1

Case I

Case II

t1

t0

t1

F2

u

u

u

F0

u

u

F1

u Case III

The types in the first row denote possible meanings of forms, the lines between bullets that the respective form has to be interpreted as t0 ∨ t1 . We assume again that F2 has the widest meaning, that Fi has to be interpreted as ti for i = 1, 2, and that the speaker prefers F0 over F1 . Hence the three cases differ according to the relative complexity of F2 : (I) F0 ≺ F1 ≺ F2 , (II) F0 ≺ F2 ≺ F1 , (III) F2 ≺ F0 ≺ F1 . Future classifications of concrete examples is meant up to renaming of types and forms. If F0 and F1 are adjacent, then the relation between their complexities is irrelevant. We restrict our considerations further to situations where (4.16) and the second part of (4.17) hold, i.e. where (1) the speaker always knows whether the entity he wants to classify is of type t0 or t1 , and where (2) there is for each type ti a situation where the speaker wants to classify the object only as t0 ∨ t1 : he, f , f ′ i ∈ E i ⇒ (f = t0 ∨ f = t1 ).

(5.20)

(e ∈ E i & e : tj ) ⇒ he, tj , t0 ∨ t1 i ∈ E i .

(5.21)

Our discussion of examples in Section 4.4 did yield the following result: Case I and Case III situations remain stabile and Case II situations turn into Case I situations after optimisation. Beside selection and interpretation strategies, there is only one parameter which can change: The set E i of entities which occur in stage i. With conditions (5.20) and (5.21) there can be only two ways in which E i has an influence on subsequent interpretations: Either type t0 or t1 is not realised in E i . If only t0 is realised, we say that we get the new situation by {t0 }–reduction, and if only t1 is realised, we say that 26

we get the new situation by {t1 }–reduction. It is possible that a {t1 }–reduction follows a {t0 }–reduction: We see reduction always relative to the full situation given by Case I to Case III. {ti }–reduction has the effect that the hearer associates ti with the lightest form Fj . Lets consider the situation for Case III examples. Let F3 be another form with wide meaning but more complex than F2 . Our discussion of examples where the initial situation is given by a Blutner–square, p. 23, did show that division of pragmatic labour does emerge out of Case III examples. We will see that this is the only possibility. Which effects has {t0 }–reduction? There are only three possible types of situations: Either (a) F2 ≺ F3 ≺ F0 ≺ F1 , (b) F2 ≺ F0 ≺ F3 ≺ F1 , or (c) F2 ≺ F0 ≺ F1 ≺ F3 . For (a) and (b) the situation looks as follows (left side): t0

t1

F2

u

u

F3

e

e

F0

u u

F1 III a

t0

t1

F2

u

u

F0

u

F3

e

F1

t0

t1

F2

u

F3

u

e

F0

e

u

F1

u III a

III b

u

t0

t1

F2

u

F0

e

F3

u

u u

F1 III b

The hollow bullets mean that the speaker has never a reason to choose the respective form. {t0 }–reduction means that the hearer learns to associate t0 with the least complex form F2 . The situation resulting from learning is depicted on the right side. We see that a Case II situation has emerged. For (c) {t0 }–reduction would lead to a Case I situation. For Case II there can only be two further sub–cases: Either (a) F0 ≺ F2 ≺ F3 ≺ F1 , or (b) F0 ≺ F2 ≺ F1 ≺ F3 . For Case I there is only one: F0 ≺ F1 ≺ F2 ≺ F3 . Reduction and subsequent associative learning yields the following list of laws: Reduction Laws: (R1) II situations turn by {t1 }–reduction into I situations where F2 is associated with t1 and F3 is the lightest expression with meaning t0 ∨ t1 . (R2) III a) situations turn by {ti }–reduction into II situations where F2 is associated with ti and F3 is the lightest expression with meaning t0 ∨ t1 . (R3) III b) situations turn by {t0 }–reduction into II situations where F2 is associated with t0 and F3 is the lightest expression with meaning t0 ∨ t1 . (R4) III b) situations turn by {t1 }–reduction into I situations where F2 is associated with t1 and F3 is the lightest expression with meaning t0 ∨ t1 . (R5) III c) situations turn by {ti }–reduction into I situations where F2 is associated with ti and F3 is the lightest expression with meaning t0 ∨ t1 . The classification of the resultant state is again meant to be correct up to suitable renaming. The effect of {t1 }–reduction in Case II situations is the same as the effect of simple associative learning without reduction. Hence, it is covered by the following law: 27

Law of Associative Learning: (A) Case II situations turn into Case I situations where F2 is associated with t1 and F3 is the lightest expression with meaning t0 ∨ t1 . In all other cases the resulting situation is the same as the original one. For subsequent purposes it is sufficient to notice which type of situations can turn in which other type. Hence we re-formulate the laws as follows: (R1) II situations turn into I situations by {t1 }-reduction. (R2) III a) situations turn into II situations by {t0 }– and {t1 }-reduction. (R3) III b) situations turn into II situations by {t0 }-reduction. (R4) III b) situations turn into I situations by {t1 }-reduction. (R5) III c) situations turn into I situations by {t0 }– and {t1 }-reduction. and (A) II situations turn into Case I situations by associative learning. Now it is not difficult to see how and when we can derive the effects of Horn’s division of pragmatic labour. We need an initial situation with two co–extensive forms F2 and F3 which can develop into a Case I situation where F2 is interpreted either as t0 or t1 , and F3 as the other one. There are only two such situations: Case III a) and Case III b) situations. For Case III a) the desired Case I situation emerges by a three–stage process: t0

t1

t0

t1

t0

t1

F2

u

u

u

e

u

e

F3

e

e

u

u

e

u

F0

u u

F1 F4

e

e

u

e

Stage 0

e

e

e

Stage 1

e u

u

Stage 2

The second reduction law (R2) implies that the situation on the left side turns into the situation in the middle by {t0 }–reduction. The case for {t1 }–reduction is symmetric. Then, either by the first reduction law, or by the law of associative learning, the situation in the middle turns into the situation on the right. The hollow bullets in the rows for F2 and F3 are meant to indicate that the respective type is still part of the semantic meaning of the form but it is excluded from its actual interpretation by the hearer. The case for III b) differs from III a) because the first reduction must be a {t0 }–reduction. The first two rows of Stage 0 form a Blutner square (2). Let us call a situation like that represented by the first two rows in Stage 2 a Horn situation. Blutner’s principle of weak optimality diachronically interpreted predicts that the Blutner square in Stage 0 develops into a Horn situation. We can recover this principle as follows: 28

The Emergence of Horn Situations: A Horn situation can only develop out of III a) and III b) examples. It emerges as the result of the following two processes: III a III b

{ti }–red.

-

II

{t0 }–red.

-

II

{t1−i }–red./learn. {t1 }–red./learn.

-

Horn Sit. Horn Sit.

We can also see the result of turning a Case II example into a Case I example as a Horn situation. In this extended sense, there are three types of situations which can develop into Horn situations. For other situations, or other processes we get counter examples for Horn’s division of pragmatic labour. It is not uninteresting to have a look at van Rooy’s principle. It addresses situations of type III b) and III c). Let us assume that t0 is the more expected state of affairs. Then van Rooy’s principle would predict that the form F2 with the wider meaning associates with t1 . But it is easy to see that by {t0 }–reduction F2 associates with t0 . Of course, we cannot really apply van Rooy’s principle due to the assumption of separatedness of language. We get the effect of the principle, if we assume first a {t1 }–reduction and subsequently a {t0 }–reduction: In the final stage F2 will go together with t1 although t0 will be more expected. But this is a two stage process and not a one stage process as van Rooy’s principle predicts it. How could an example look like? Here is an hypothetical one: Think that some population has only the words phone, mobile phone, and phone with wire to classify various types of telephones. In the first stage there are only telephones with wires t1 . Our theory predicts that phone associates with t1 . But then the world changes: telephones with wire get out of use, so everybody uses only mobile telephones t0 . Our theory predicts that this will not lead to a re–association of phone with t0 but to using the more complex mobile phone. The world can change in many ways but meanings can only be strengthened.

5.2

Laws of Diachronic Change in Situations with more than two Basic Types

In general there are more than two basic types involved if speakers choose their forms and hearers interpret them. When we try to generalise the results for two basic types, we are faced by an exploding number of possible classes of situations. For n basic types there are 2n − 1 forms to be considered. They can be ordered according to their complexity in (2n − 1)! different ways. We can reduce the number if we again identify situations that can be derived from one another by renaming basic types. This divides the number of possible orders by n!. In case of n = 2 we got three classes. If we increase the number only by 1, then we get no less than 840 classes! Even worse, the case n = 3 is derivative on the case n = 4 because basic types are functions from features into types {−1, 1}. This explosion of complexity is not a problem if we consider only isolated examples, but if we ask for overall regularities and laws describing these regularities, then we face a real problem. Fortunately, the results for two basic types can be generalised quite easily. The characterisation even simplifies. In this section we develop the necessary ideas. We restrict considerations to situations where conditions equivalent to (5.20) and

(5.21) hold. Let T be some non–empty subset of Type∗ and E i , S i , H i be a given synchronic stage. Then, let Φ1 (E i , T ) denote condition: he, f , f ′ i ∈ E i ⇒ f ∈ T.

(5.22)

29

This condition tells us that in stage i only entities of types in T are realised. Let Φ2 (E i , T ) denote the condition (5.23): ∀t ∈ T ∃e ∈ E i e : t

(5.23)

This condition tells us that in stage i all types in T are exemplified by at least one entity. Let Φ3 (E i , T ) denote condition (5.24) which guarantees that all possible speaker’s intention restricted by T are realised in at least one utterance situation: _ _ ∀ he, f , f ′ i ∈ E i ∀∅ = 6 X ⊆ T (f ≤ X ⇒ he, f , Xi ∈ E i ) (5.24)

We say that E i is of type T , iff Φ1 (E i , T ) ∧ Φ3 (E i , T ). E i is an R–reduction for ∅ = 6 R if Φ1 (E i , R) ∧ Φ2 (E i , R). We assume throughout that for every stage i there are T, Ri such that stage i is of type T and an Ri –reduction with ∅ 6= Ri ⊆ T . We want to find general characterisations of how meaning can be strengthened by associative learning. For notational convenience we introduce the following conventions: Let X ⊆ Type∗ , f ∈ Type and Y := {t ∈ Type∗ | t ≤ f }; then we write f ≤ X iff Y ⊆ X, f ∩ X for Y ∩ X, and f = X iff Y = X. Let E i , S i , H i be a given synchronic stage of fixed type T . Let F be a form. The following propositions, (5.25) through (5.29), provide first results on what we can say about the meaning of F in the next stage i + 1. The first proposition, (5.25), says that meanings can only be strengthened. This follows immediately from the definition of H i+1 (F ): ∀t ∈ T ype∗(t 6≤ H i (F ) ⇒ t 6≤ H i+1 (F ))

(5.25)

i

We assume from now on that H (F ) ≤ T . Let Ψ(F, t, i) denote the formula: ∀F ′  F (t ≤ H i (F ′ ) ≤ H i (F ) ⇒ F ′ = F ).

(5.26)

Ψ(F, t, i) tells us that there is no less complex and stronger form than F for classifying entities of type t in stage i. With (5.24) we arrive at the following characterisation of NLi+1 : F ∈ NLi+1 iff ∃t ≤ H i (F ) ∩ Ri Ψ(F, t, i).

(5.27)

I.e. F is selected at least once, iff there is a basic type t in Ri such that F is optimal for t. If F is never selected by the speaker, then its meaning cannot be strengthened: F 6∈ NLi+1 ⇒ H i+1 (F ) = H i (F ).

(5.28)

The following proposition is the central condition for characterising strengthening of interpretation due to associative learning. For F ∈ NLi+1 , t ≤ H i (F ) ∩ Ri we find: t ≤ H i+1 (F ) iff Ψ(F, t, i);

(5.29)

i.e. if entities of type t can be classified by using F , and if there is no stronger and less complex form F ′ which too can classify entities of type t, then F can still be used in stage i + 1 for classifying entities of type t. This is again a direct consequence of conditions (5.22) through (5.24). We are interested in how forms with identical interpretations can change their meaning. In the situation for two basic types this meant that we were interested into what Blutner– squares can develop. In the general situation this means that we have to look at sets Synon(f , i) := {F ∈ NL | H i (F ) = f }, i.e. at sets of forms which all are interpreted by the same type f . We call Synon(f , i) a system of f –synonyms. Due to its importance we want to describe this situation by an explicit definition: 30

Definition 5.1 (f –Synonyms) Let hS, Di be a diachronic model where S = hM, i is a simple synchronic system, M=

hE, Type, NL, [ ], : i and D = (Syni )i=0,...,n , where each Syni is of the form E i , S i , H i . Let Synon(f , i) := {F ∈ NL | H i (F ) = f } = {Fj | j ≤ α} for f ∈ Type and some natural number α. We assume that F0 ≺ F1 ≺ F2 ≺ . . .. Then we call (Fj )j≤α a system of f –synonyms for stage Syni . For two basic types and α = 2 a system of synonyms is just a Blutner–square (2). We first note an immediate consequences of (5.27): If (Fj )j≤α is a system of f –synonyms for stage i, then ∀j > 0 Fj ∈ Synon(f , i + 1).

(5.30)

This means that in each stage only one form can strengthen its meaning. It is always the least complex form. It may be, of course, that Synon(f , i + 1) contains new forms. We note another important consequence of (5.29): If we want to know whether t ≤ H i+1 (F ), then it is enough to consider forms F ′ with t ≤ H i (F ′ ) ≤ H i (F ). This means: If we characterise the diachronic behaviour of forms in Synon(f , i), then it is enough to consider the sets Synon(f ′ , i) for f ′ ≤ f . More precisely: Lemma 5.2 Let hS, Di and hS, D′ i be two diachronic models for the same simple syn

i i i chronic system S. Let Syni = E i , S i , H i and Syn′i = hE ′ , S ′ , H ′ i be both of type T and Ri –reductions for the same ∅ 6= Ri ⊆ T . Assume that f ≤ T . If for all f ′ ≤ f Synon(f ′ , i) = Synon′ (f ′ , i), then ∀f ′ ≤ f ∀F ∈ Synon(f , i) (F ∈ Synon(f ′ , i + 1) ⇔ F ∈ Synon′ (f ′ , i + 1)). As mentioned there is still the possibility that Synon(f , i + 1) contains new forms. We can exclude this possibility if we assume that ∀F (f < H i (F ) ⇒ ∀t ∈ T (Ψ(F, t, i) ⇒ t 6≤ f )).

(5.31)

With (5.31) it follows by (5.29) that Synon(f , i + 1) ⊆ Synon(f , i). With these preliminaries we can start to generalise the results found for two basic types. We introduce again a classification of situations which allows to predict the diachronic development of interpretations of forms. Let hS, Di be a diachronic model as usual and (Fj )j≤α a system of f –synonyms for stage Syni . We assume that (5.31), that E i is of type T and an Ri –reduction for some ∅ 6= Ri ⊆ T . Our observations show that Ψ is an important relation. For a given system of f –synonyms we collect all t which stand in relation Ψ to a form Fj in stage i, but such that we neglect for Fj all less complex forms Fl , l < j: S i (Fj ) := {t ∈ T | Ψj (F, t, i)},

(5.32)

where Ψj is Ψ restricted to NL\{Fl |l < j}; i.e. Ψj (F, t, i) holds iff for all F ′ ∈ NL\{Fl |l < j}: F ′  F ∧ t ≤ H i (F ′ ) ≤ H i (F ) ⇒ F ′ = F.

(5.33)

It follows that S i (Fj ) ⊆ S i (Fj ′ ) for j ′ ≤ j. Si (F0 ) is the set of all basic types t in T where F0 is the most economical expression for classifying an entity of type t. For j > 0 S i (Fj ) is the set of all basic types t in T where Fj would be the most economic form for classifying entities of type t if the forms Fl , l < j, didn’t exist. We will redifine the classes I–III introduced for classifying the situations with two basic types. We will see that the 31

class of the next diachronic stage i + 1 is determined by the class of stage i and the way how S i (F0 ), S i (F1 ) and Ri are related to each other. The following observation tells us how to calculate the strengthened meaning of the lightest form of the f –synonyms: _ H i+1 (F0 ) = (Ri ∩ S i (F0 )) if F0 ∈ NLi+1 . (5.34) Remember that NLi+1 denotes the set of forms that have actually been used in stage i. We call E i a proper reduction for f , iff {t ∈ Type∗ | t ≤ f } 6⊆ Ri . We can distinguish three types of proper reductions: 1. E i is an r0–reduction iff Ri ∩ S i (F0 ) = ∅; 2. E i is an r1–reduction iff Ri ∩ S i (F0 ) 6= ∅ & S i (F1 ) ⊆ Ri ; 3. E i is an r2–reduction iff Ri ∩ S i (F0 ) 6= ∅ & S i (F1 ) 6⊆ Ri . E i is an Ri reduction, hence only entities of basic type t ∈ Ri can occur. It follows that in case of an r0–reduction there will be no reason to use F0 , hence it will not be an element of NLi+1 and therefore not change its meaning. The following picture shows the situation for all three reductions. It always holds that S i (F1 ) ⊆ S i (F0 ) and S i (Fj ) ⊆ H i (Fj ), j = 1, 2. We indicate the sets S i (Fj ) by dashed boxes. We see e.g. that in case of an r0–reduction Ri ∩ S i (F0 ) = ∅: T Ri

r0–reduction

i

S (F0 ) F0 Ri

r1–reduction

i

S (F1 ) F1 Ri

r2–reduction

More interesting are r1– and r2–reductions. In these cases we find that F0 ∈ NLi+1 , hence it strenthens its meaning according to (5.34). With these distinctions we can formulate our central lemma for characterising reduction effects on interpretation: Lemma 5.3 Let hS, Di be a diachronic model where S = hM, i is a simple synchronic system, . Each Syni is of the form

i i Mi = hE, Type, NL, [ ], : i, and D = (Syni )i=0,...,n E , S , H . We assume that there is a set T ⊆ Type∗ such that each E i is of type T and an Ri –reduction for some ∅ 6= Ri ⊆ T . Let (Fj )j≤α be a system of f –synonyms for stage Syni . Let f ≤ T be such that (5.31) holds. Let E i be a proper reduction for f , i.e. {t ∈ Type∗ | t ≤ f } 6⊆ Ri . Then we find that: 1. If E i is an r0–reduction, then Synon(f , i + 1) = Synon(f , i). 32

2. If E i is an r1– or r2–reduction, then Synon(f , i + 1) = Synon(f , i) \ {F0 } and: H i+1 (F0 ) =

_

(Ri ∩ S i (F0 )).

3. If E i is an r1–reduction, then ¬∃t ≤ f Ψ(F1 , t, i + 1); i.e. it holds S i+1 (F1 ) = ∅. 4. If E i is an r2–reduction, then ∃t ≤ f Ψ(F1 , t, i + 1); i.e. it holds ∅ = 6 S i+1 (F1 ) 6= f . In the case for two basic types t0 , t1 we classified situations according to the relation between forms F2 , F3 with wide meaning (t0 ∨t1 ), and forms F0 , F1 with special meaning. In the general case there are too many forms to be considered. Insted of directly relating extension of f –forms to extensions of forms with more special meaning, we classify situations according to the relation between S i (F0 ), S i (F1 ) and Ri . Let Synon(f , i) be given in a situation where the conditions of Lemma 5.3 hold. We again identify f with {t ∈ Type∗ | t ≤ f }. We distinguish three cases: I II III

S i (F0 ) = ∅; ∅ 6= S i (F0 ) 6= f ; S i (F0 ) = f .

If there are only two types involved, i.e. if f = t0 ∨ t1 , then these classes are identical with those introduced on p. 26. The reduction laws and the law of associative learning generalise as follows: General Reduction Laws: (r1) II/III situations turn into I situations by r1–reduction; (r2) II/III situations turn into II situations by r2–reduction; Law of Associative Learning: (A) II situations turn into I situation by associative learning. Here, the characterisation is even simpler than in the case for two basic types30 . If we call an initial situation a II or III situation, then this is meant with respect to a system of f –synonyms (Fj )0≤j≤α . If we classify the final situation as I or II, then it is meant with respect to (Fj )1≤j≤α . In order to see that this is a generalisation of the old characterisations, we re–introduce the old sub–classes: II a) II b)

∅ 6= S i (F1 ) 6= f ; S i (F1 ) = ∅;

III a) III b) III c)

S i (F1 ) = f ; ∅ 6= S i (F1 ) 6= f ; S i (F1 ) = ∅.

If there are only two types under consideration and E i is a proper reduction, then Ri can have only one element. r0–reductions have no effect, hence we can neglect them. In Case II there cannot be an r2–reduction; hence there remain only r1–reductions. Their effect is identical with the effect of associative learning (Law 1). In Case III proper reductions may either be r1–reductions or r2–reductions. In Case III a) there cannot be an r1–reduction — even if there are more than two basic types. For two basic types there are two possible r2–reductions, which both lead to a Case II situation (Law 2). In Case 30 See

p. 27.

33

III b) there can be exactly one r1–reduction (Law 4) and one r2–reduction (Law 3). For Case III c) there can only be an r1–reduction (Law 5). This comparison closes our discussion of principles of diachronic change. We add some remarks about the representation of meaning as attribute–value functions31 . The elements f ∈ Type are functions from features {m0 , m1 , m2 , . . .} into the set {−1, 0, 1}. The join f ∨ f ′ of two functions was defined by f ∨ f ′ (m) := f (m) if f (m) = f ′ (m), and f ∨ f ′ (m) = 0 otherwise. Of course, f ∨ f ′ is intended to represent the information contained in f or f ′ . But it is much weaker: If we have a situation with two features {m0 , m1 } and attribute– value functions f0 and f1 with f0 (m0 ) = f1 (m1 ) = 1 and f0 (m1 ) = f1 (m0 ) = −1, and f0′ and f1′ with f0′ (mi ) = −f1′ (mi ) = 1, then f0 ∨ f1 = f0′ ∨ f1′ . But this is not a serious problem for our theory. Instead of attribute–value functions we could have used subsets of Type∗ as interpretations of forms; i.e. instead of functions f we could have used {t ∈ Type∗ |t ≤ f }. Meets and joins can then be defined by intersection and union. In fact, for the discussion of principles of diachronic change this representation would have been more appropriate. But nothing depends on which version we use. All our constructions and proofs work for both — and for others32 .

6

Conclusions

We started out with the Bi–OT explanation for Horn’s division of pragmatic labour and its application to Blutner squares (2). We ended up with a general theory which describes the diachronic development of systems of semantically co–extensive forms. It is restricted to the class of situations where all critical expressions have only one reading as in the Kill–and–cause–to–die Example (1); i.e. the theory makes no claim for problems where interpreters must choose between ambiguous interpretations. We saw that alternative game theoretic explanations for Horn’s division of pragmatic labour are inapplicable if we assume that language is a separating signalling system. Partial blocking has to be explained for languages that are separating and where forms have unique meanings. We follow game theoretic approaches and represent the speaker’s choice of forms and the hearer’s choice of interpretations by a pair of selection and interpretation strategies. A diachronic model describes how selection and interpretation strategies develop over time. We presented a theory which predicts partial blocking as the result of diachronic processes which involve associative learning and depends on speaker’s preferences on forms. We assume that we can divide these processes in separate stages. Associative learning and the speaker’s preferences enter by the following two principles: (H) In every actual instance where the form F is used for classifying events or objects it turns out that the classified event or object is at least of type f , then the hearer learns to associate F with f , i.e. he learns to interpret F as f . (S) In every actual instance where the form F is used for classifying events or objects it turns out that the hearer interprets F as f , then the speaker learns that he can use F for expressing f . A diachronic model consists of (1) a simple synchronic system which is defined by the underlying semantics and fixed pragmatic constraints; and (2) of a sequence of synchronic 31 See

Definition 4.1, p. 14. did not represent meaning by sets of basic attribute–value functions, sets of possible worlds, or functions from worlds to extensions because I expect generalisations using typed feature structures (Carpenter, 1992) to be much more fruitful. But for the purposes of the paper I don’t need to make a decision. 32 I

34

stages which contain all actual dialogue situations and the speaker’s and hearer’s selection and interpretation strategies. We divided all possible utterance situations in three classes. For the situation with two basic types t0 , t1 they look as follows: t0 F0

u

F1 F2

t1

u Case I

t0 F0

u

u

F2

u

u

F1 Case II

t1

t0

t1

F2

u

u

u

F0

u

u

F1

u Case III

The classification allowed us to provided for a complete characterisation of diachronic processes in terms of diachronic laws. These laws divide into two groups: The law of associative learning and reduction laws. Associative learning can lead to a strengthening of interpretation in Case II situations only: Law of Associative Learning: • (A) Case II situations turn into Case I situations by associative learning. Reduction takes place if only a subset of all possible basic types are realised in a stage. Depending on which types are realised and how forms are ordered, we saw that Case III situations can develop into Case I or Case II situations, and Case II situations in Case I and Case II situations. The model allows to predict some regularities: • If a set of forms is given for one stage such that all forms are interpreted in the same way, then only the least complex form can change its meaning. • Interpretations can only be strengthened. • Horn and Anti–Horn situations need at least two stages to emerge. They can only develop out of Case III situations. Finally I want to emphasise that I see my model as a hypothetical model. For each component it’s easy to think of another one that could replace it and still lead to an interesting model, e.g. the representation of meaning, the specific properties of the learning model, the set of context-parameters. More data are necessary for evaluating different possibilities. But all parts fit together to a coherent and rather new approach that offers significant advantages over previous explanations of partial blocking starting from radical pragmatics, OT and evolutionary game theory.

References D. Beaver (2000): The Optimization of Discourse; ms. Stanford; to appear in Linguistics and Philosophy. D. Beaver, H. Lee (2003): Input–Output Mismatches in OT; To appear in: R. Blutner, H. Zeevat (eds.): Optimality Theory and Pragmatics. Palgrave/Macmillan. Pre–paper downloadable from http://www.stanford.edu/~dib/. 35

A. Benz (2001): Towards a Framework for Bidirectional Optimality Theory in Dynamic Contexts; ms., Humboldt Universit¨ at Berlin. Available as ROA 465-0901. A. Benz (2003): On Coordinating Interpretations - Optimality and Rational Interaction ; To appear in P. K¨ uhnlein, H. Rieser, H. Zeevat (eds.): Perspectives on Dialogue in the New Millennium; preliminary paper available from http://www.anton-benz.de. R. Blutner (1998): Lexical Pragmatics; Journal of Semantics 15, pp. 115–162. R. Blutner (2000): Some Aspects of Optimality in Natural Language Interpretation; In: Helen de Hoop & Henriette de Swart (eds.) Papers on Optimality Theoretic Semantics. Utrecht Institute of Linguistics OTS, December 1999, pp 1-21. Also: Journal of Semantics 17, pp. 189-216. R. Blutner, G. J¨ ager (2000): Against Lexical Decomposition in Syntax; In A.Z. Wyner (ed.): Proceedings of the Fifteenth Annual Conference, IATL 7, University of Haifa, pp. 113-137 Proceedings of IATL 15, University of Haifa. B. Carpenter (1992): The Logic of Typed Feature Structures; Cambridge University Press, Cambridge. P. Dekker, R. v. Rooy (2000): Bi–Directional Optimality Theory: An Application of Game Theory; Journal of Semantics 17, pp. 217–242. L. Horn (1984): Towards a new taxonomy of pragmatic inference: Q–based and R–based implicature; In: D. Schiffrin (ed.): Meaning, Form, and Use in Context: Linguistic Applications, Georgetown University Press, Washington, pp. 11–42. J. Groenendijk, M. Stockhof (1991): Dynamic predicate Logic; Linguistics & Philosophy 14, pp. 39–100. P. Hendriks, H. de Hoop (2001): Optimality Theoretic Semantics; Linguistics and Philosophy 24:1, pp. 1-32. G. J¨ ager (September 2000): Some Notes on the Formal Properties of Bidirectional Optimality Theory; ms, ZAS Berlin. D. Lewis (1969): Convention; Harvard University Press, Cambridge. J. Mattausch (November 2000): On Optimization in Discourse Generation; master thesis, Universiteit van Amsterdam. J.D. McCawley (1978): Conversational Implicatures and the Lexicon; In P. Cole (ed.): Syntax and Semantics, Vol. 9, Pragmatics, Academic Press, New York. P. Parikh (1990): Situations, Games and Ambiguity; In R. Cooper, K. Kukai, J. Perry: Situation Theory and its Applications I, CSLI Publications, Stanford. P. Parikh (1991): Communication and Strategic Inference; Linguistics and Philosophy 17, pp. 473–513. P. Parikh (2000): Communication, Meaning, and Interpretation; Linguistics and Philosophy 23, pp. 185–212. P. Parikh (2001): The Use of Language; CSLI Publications, Stanford. R. van Rooy (2002): Signalling Games select Horn Strategies; ms Universiteit van Amsterdam; to appear in Linguistics & Philosophy. 36