Optimal Stopping

1 downloads 0 Views 893KB Size Report
HERBERT ROBBINS, Columbia University. The theory of probability began with efforts to calculate the odds in games of chance. In this context, optimal stoppingĀ ...
Optimal Stopping Author(s): Herbert Robbins Source: The American Mathematical Monthly, Vol. 77, No. 4 (Apr., 1970), pp. 333-343 Published by: Mathematical Association of America Stable URL: http://www.jstor.org/stable/2316139 . Accessed: 03/09/2014 08:55 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

.

Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to The American Mathematical Monthly.

http://www.jstor.org

This content downloaded from 103.21.127.78 on Wed, 3 Sep 2014 08:55:34 AM All use subject to JSTOR Terms and Conditions

OPTIMAL STOPPING HERBERT ROBBINS, ColumbiaUniversity

The theoryof probabilitybegan witheffortsto calculate the odds in games of chance. In this context, optimal stopping problems concern the effecton a gambler'sfortuneof various possible systemsfordecidingwhen to stop playing a sequence of games. Such problems are of interest in statistics, where the experimentermust constantly ask whether the increase in informationcontained in furtherdata will outweighthe cost of collectingit. Optimal stopping theory provides a general mathematical frameworkin which such problems can be precisely formulatedand in some cases solved completely.The examples which we shall considerhere are of a simplernature than those arisingin statistics,but will serve to illustratesome of the problems that arise in the general theory. EXAMPLE 1. A faircoin is tossed repeatedly.Aftereach toss we have the op-

tion of stopping or going on to the next toss, our decision at each stage being allowed to depend on the outcome thus far.We must stop aftersome finite(but not necessarilypreassigned) number of tosses, and it is agreed that if we stop afterthe nth toss we are to receive a rewardxn which is a given functionof the outcomes of the firstn tosses. When should we stop so as to maximize our expected reward? Let us introduce random variables yl, y2, * * to representthe successive tosses, the y, being independent with the common probability distribution P(y;= 1)=P(y =-1)=1/2; yi=I denotingheads on the ith toss and yi= -1 tails. The rewardsequence will thenconsistofa sequence offunctionsxi,X2, . . *, where x.

=fn(yi,

, yYn).

A stopping rule is then a random variable t with

} and such that the event [t n] depends solely values in the set { 1, 2, 3, * on the values of yi, * e , and not on futurevalues yn+, Using a stopping rule t our reward xt will be a random variable whose expectation Ext measures the performanceon the average of the stoppingrule t. The supremum V=sup {Ext } over the class C of all possible stopping rules t for which Ext exists is called the value of the sequence {xn}, and ifa stoppingrule t existssuch that Ext = V, t is said to be optimal. *

.

yn

an assistantship Prof.RobbinsearnedhisHarvardPh.D. underHasslerWhitney.He followed underMarstonMorseat the Instituteand an NYU instructorship careerin the witha four-year U. S. Navy. He thenwas AssociateProf.and Prof.at theUniv.ofN. Carolinabeforeassuminghis presentpositionat Columbia.He spentleaves at Berkeley,Minnesota,Purdue,Michigan,and I.A.S. He has servedas Presidentofthe InstituteofMath. Stat. and deliveredtheRietzand Wald lectures.He was a Guggenheim Fellowin 1952-53. Since his thesisin topology,Robbins'smain researchhas been in probabilitytheoryand The mathematical statistics.His book (withY. S. Chowand D. 0. Siegmund)GreatExpectations; His previousWhatis TheoryofOptimalStoppingwillbe publishedshortlyby HoughtonMifflin. Mathematics? withRichardCourant(OxfordUniv. Press,1941) is a landmarkin mathematical Editor. exposition. 333

This content downloaded from 103.21.127.78 on Wed, 3 Sep 2014 08:55:34 AM All use subject to JSTOR Terms and Conditions

334

[April

HERBERT ROBBINS

Thus far we have said nothing about the nature of the reward sequence , yn). To begin with, let us take *

xn- fn(Y.

n2n n/Yi +l\ = + - II(< ~~~xn 2 1

(1)

(n = 1, 2, . . .

n +1

and analyze the resultingsituation. Equation (1) is just a symbolic way of saying that if we stop after the nth toss with all heads we are to receive n 2n/(n+ 1), while ifany one of the firstn tosses has been a tail we are to receive nothing.A littlereflectionwill show that we need consideronly the class of stopping rules {tk}, k=I, 2, * *, where tk=k; i.e., tk stops after the kth toss no matter what sequence of heads and tails has appeared. Clearly 1 Exth

2k

k2A

k +I

+

I 1-

1\

k

2kA)

k+1

-*0-

and thereforeV= 1 but no optimal stoppingrule exists. We remarkin passing that at any stage n in which all heads have appeared, so that xn=n 2/((n+ 1), the conditionalexpected reward formaking one more toss beforestopping is 1 (n + 1)2n+' 2n(n + 1) n2n\ E

xn+l xn =

2

(n + 1)-

(rz+ 2)

-

Xn.

(n + 2) ~~~~>

Hence it is always "foolish" to stop with all heads. But if we do not act "foolishly" at some point we shall wait forthe firsttail to occur and our finalreward will always be 0. Thus acting "wisely" at each stage is the worst long-range policy. EXAMPLE

(2)

2. The same, except that now

+ Y?

n

+ Yn

(n =1, 2,**).

This problem is much harder than the precedingone because of the enormous familyof possible stopping rules which must be considered and evaluated. A simple instance of such a rule is (1

if Yi = 1, otherwise

tn if n is the firstintegersuch that yi +

+ yn= 0.

The firstquestion we must ask is, does (3) definea legitimatestoppingrule in the sense that P(t < oo) = 1? In probabilitytheoryit is shown that in repeated tossingsof a faircoin, forany fixedintegerk = 0, ? 1, + 2, * * *, the probability is 1 that the difference(numberof heads in firstn tosses) - (number of tails in firstn tosses) will assume the value k infinitelyoften.In our notation the case It k=O corresponds to the event yi+ * * * +Yn=O, and hence P(t< oo>1. remains to evaluate Ext for (2) and (3). We have

This content downloaded from 103.21.127.78 on Wed, 3 Sep 2014 08:55:34 AM All use subject to JSTOR Terms and Conditions

1970]

335

OPTIMAL STOPPING

Ext=--*

1 1 2 1

1

1

2

2

+-*0=-

-

It followsthat V , and of course V_ 1, since x,n 1 in all cases. By trial and errorwe can invent other stopping rules t for which Ext> 1, but we will find none for which Ext>.9, say. However, it is not so easy to prove that Vv. stilllesstodecidewhether

In the present problem it has been proved [6, 11] that an optimal t does exist and that v= V, but the exact descriptionof t and the value of V are not known. EXAMPLE

3. As in Example 1 but now with Xn= min(1,y, +

*..

(n > 1).

+ y)-n/(n+1)

Consider the stoppingrule t = firstn > 1 such that yj+

(4) That P(t< x)=1

*+ yn=1

followsas in Example 2, and since n/(n+1) 0

(the exact value of Ext would require some probabilitytheoryto compute). A littlethoughtwill show that t is in fact optimal forthis example and hence that

V=Ext>0.

On the other hand, since the yi are independentand identicallydistributed

This content downloaded from 103.21.127.78 on Wed, 3 Sep 2014 08:55:34 AM All use subject to JSTOR Terms and Conditions

336

[April

HERBERT ROBBINS

with Ey= - 1 +4- (-1) = 0, Wald's lemma (see below) shows that if, unlike (4), t is any stoppingrule forwhichEt< oo, and hence in particularif t Cv CNfor , then E(yi+ some N=1, 2, +yt)=0, so that .

Ext < E(yj + ***+

yt) - E (

)S

~2

t+ I

forall N2 1, and hence v =limNv, Hence, VN O. A slightvariation on this example is given by

VN
1 such that yn > b (b any finiteconstant),

and V= Exe = + oo, while (ii) if foydF(y) < X thereexists a unique numberf such that (15)

f(

-

3)dF(y)

= c;

This content downloaded from 103.21.127.78 on Wed, 3 Sep 2014 08:55:34 AM All use subject to JSTOR Terms and Conditions

1970]

339

OPTIMAL STOPPING

an optimal stoppingrule is

1 such thatyn > A,

t = firstn

(16) and V=Ext=3.

Proof. (i) Let p=p(y

P(t = n)

(17)

Et =E

Then for (14)

and q=1-p.

?b)=I-F(b)>0

00

P(t < oo) = ,pqn-f1 = p.

=

pqn-1,

00

d npqn-'= p

1

=

1-q

1,

1

qn)

P.

dq\o/(1q)

1

1

2

P

and .00

E max(yl, * * * ,yt)---

LydF(y) P

00*

Hence by (17) Ext= oo -

C

==

so t is necessarilyoptimal. (ii) Consider the function +(b)

=

r0

(y-

b)dF(y),

which is equal to the area of the region in a y, z-plane bounded below by the curve z=F(y), on the left by the line y=b, and above by the line z=1. It is geometricallyevident that thereis a unique solutionof (15) forany given c >0, and fort definedby (16) we have as in (i)

Et =

-,

Ext =

-d

(p = 1- F

(y)-c)/p

Hence by (15) (18)

Ext =

To provethatthetdefinedby (16) is optimal,lett' be anystoppingrulesuch thatExt,existsand is > - oo. Chooseany b>, and observethat (19)

x=

max(yi,

,yn) - cn ? b +

n

((y; - b)+ -c),

This content downloaded from 103.21.127.78 on Wed, 3 Sep 2014 08:55:34 AM All use subject to JSTOR Terms and Conditions

340

[April

HERBERT ROBBINS

wherewe definea+ = max(a, 0). The sequence of random variables wi, w2, *, is independent and identically distributed with where w= (y -b)+-c, since b>13 and 13satisfies (15). ui=Ewi=f-.o((y-b)+-c)dF(y)=X(b)-c by hypothesis,so by (19)

> Ext,-b >-

E(Sw)

(20)

oo,

and hence by Corollary 1, whetherEt' be finiteor infinite,

E(w)

(21)

=,)Et'