Advanced Financial Models Michael R. Tehranchi

7 downloads 9111 Views 917KB Size Report
be presented in these notes is concerned with the analysis of market models that .... An attempt is made to keep this course self-contained, but you ..... But for the sake of building realistic models, we usually assume that markets are free of.
Advanced Financial Models Michael R. Tehranchi

Contents 1. Standing assumptions: complications we ignore 2. Prerequisite knowledge

5 8

Chapter 1. One-period models 1. One-period models 2. Arbitrage, martingale deflators and 1FTP 3. Num´eraires and equivalent martingale measures 4. Contingent claim pricing 5. Markets with an infinite number of assets 6. Call prices from moment generating functions

9 9 10 14 17 23 25

Chapter 2. Discrete-time models 1. Investment and consumption 2. A motivating utility maximisation problem and arbitrage 3. Arbitrage and the first fundamental theorem 4. Elements of the proof of the harder direction of 1FTAP 5. Num´eraires and equivalent martingale measures 6. Contingent claims 7. Super-replication of American claims

29 31 31 36 41 44 48 53

Chapter 3. Brownian motion and stochastic calculus 1. Brownian motion 2. Itˆo stochastic integration 3. Itˆo’s formula 4. Girsanov’s theorem 5. A martingale representation theorem

57 58 58 62 66 67

Chapter 4. Continuous-time models 1. The set-up 2. Admissible strategies 3. Arbitrage and local martingale deflators 4. The structure of local martingale deflators 5. Replication and super-replication 6. The Black–Scholes model and formula 7. Markovian markets and the Black–Scholes PDE 8. Black–Scholes volatility 9. Local volatility models 10. American claims in local volatility models - NOT LECTURED!

69 69 69 72 74 77 79 80 83 85 89

3

Chapter 5. Interest rate models 1. Bond prices and interest rates 2. Bank accounts to bond prices and interest rates 3. Short rate models 4. Markovian short rate models 5. The Heath–Jarrow–Morton framework

93 93 94 95 96 98

Chapter 6. Crashcourse on probability theory 1. Measures 2. Random variables 3. Expectations and variances 4. Special distributions 5. Conditional probability and expectation, independence 6. Probability inequalities 7. Characteristic functions 8. Fundamental probability results

103 103 103 104 106 107 107 108 108

Index

111

4

Financial mathematics as a subject is young (as compared to, say, number theory), but it is mature enough now that there has emerged some consensus on the notation, vocabulary and important results. These notes are an attempt to present many of the main ingredients of this theory, mainly concerning the pricing and hedging of derivative securities. But before launching into the story, we will begin by acknowledging some of the real-world complications that will not be discussed at length hereafter. 1. Standing assumptions: complications we ignore Unfortunately, actual financial markets are very complicated. Of course, in order to develop a systematic financial theory, it is prudent to concentrate on the essential features of these markets and ignore the less essential complications. Therefore, the theory that will be presented in these notes is concerned with the analysis of market models that have plenty of simplifying assumptions. That is not to say that these complications are not important. Indeed, there is active ongoing research attempting to remove these simplifying assumptions from the canonical theory. Below is a list of these assumptions. 1.1. Dividends. The total stock of a publicly traded firm is divided into a fixed number N of shares. The owner of each share is then entitled to the fraction 1/N of the total profit of the firm.1 A portion of the firm’s profit is usually reinvested by management, for instance by building new factories, but the rest of the profit is paid out to the shareholders. In particular, the owner of each share of stock will receive periodically a dividend payment. However, in this course, we will assume that there are no dividend payments. Actually, this assumption is not as terrible as it sounds. Example sheet 1 will show how to adapt the theory developed for assets that pay no dividends to incorporate assets that have non-zero dividend payments. 1.2. Tick size. Financial markets usually have a smallest increment of price, the tick. (The tick refers back to the days when prices were quoted on ticker tape.) Indeed, the tick size can vary from market to market, and even for assets traded in the same market. There seems to be an industry-wide effort to hamonise tick sizes, but a quick google search found this document http://cdn.batstrading.com/resources/participant resources/BATSEuro Ticks.pdf which highlights the complexity of the system in Europe. However, in this course, we will assume that the tick size is zero. This is a convenient assumption for those who prefer continuous mathematics to discrete. It is usually a harmless assumption, unless the prices of interest are very close to zero. 1Actually,

things are even more complicated. For instance, stocks can be classified as either common or preferred, with implications on dividends, voting rights and claims on the firm’s assets in case of bankruptcy. Also, the number N of shares outstanding is not necessarily fixed. 5

1.3. Transactions costs. Financial transactions are processed by a string of middle men, each of whom charge a fee for their services. Usually the fee is nearly proportional to the size of the transaction. However, in this course, we will assume that there are no transactions costs. This assumption is justified by by the fact that transactions costs are often very small relative to the size of typical transactions. But one must always remember that in some applications, it might not be wise to neglect these costs. 1.4. Short-selling constraints. In the real world, it is actually possible for someone to sell an asset that he does not own. The essential mechanism is to borrow a share of that asset from a broker, and then immediately to sell it to the market. This procedure is called short selling. Brokers, however, place contraints on this behaviour. Indeed, they usually require collateral and charge a fee for their service. Furthemore, if the market price of the asset increases, or if the price of the collateral decreases, the broker may ask the short seller to put up even more collateral. However, in this course, we will assume that there are no short-selling constraints. Indeed, the theory of discrete-time trading is cleaner without additional assumptions on the sizes of trades. But we will see that to overcome some technical problems in the theory of continuous-time trading, it will be natural to restrict trading to what are called admissible strategies. 1.5. Divisibility of assets. There is another real-world trading constraint of a rather technical nature. The smallest unit of stock is the share. A share cannot be further divided – it is generally impossible to buy half a share of a particular stock. However, in this course, we will assume that assets are infinitely divisible. 1.6. Bid-ask spread. Real-world trading is asymmetrical since the price to buy a share is usually higher than the price to sell it. The reason is that are two different ways to buy or sell an asset listed on an exchange: the limit order and the market order. A limit buy order is an offer to buy a certain number of shares of the asset at a certain price. A limit sell order is defined similarly. The collection of unfilled limit orders is called the limit order book. At any time, there is the highest price for which there is an order to buy the asset. This is called the bid price. The lowest price for which there is an order to sell is called the ask price. The bid/ask spread is the difference. Figure ?? illustrates the evolution of a hypothetical limit order book as various orders arrive and are filled. A market order are instructions to execute a transaction at the best available price. In particular, if the market order is to buy, then the lowest limit sell order is filled first. Therefore, for small market buy orders, the per share price paid is the ask price. Similarly, 6

if a market sell order arrives, then the highest limit buy order is filled first, and hence the per share price received is the bid price. However, in this course, we will assume that there are no bid-ask spreads. This assumption is justified by the observation that in many markets, the spread is very small. However, in times of crisis, this assumption is not usually applicable, and hence the theory breaks down dramatically.

Figure 1. Top left. The bid price is £8 and the ask is £11. Top right. A limit sell order for three shares at £11 arrives. Bottom left. A limit buy order for two shares at £8 is cancelled. Bottom right. A market order to buy five shares arrives. Note that four shares are sold at £11 and one at £12. After the transaction, the ask price is £12. 1.7. Market depth. As described above, there are only a finite number of limit orders on the book at one time. If a large market buy order arrives, for instance, then the lowest limit sell order is filled first. But if the market order is bigger than the total shares available to buy at the ask price, then the limit orders at the next-to-lowest price are filled, and progresses up the book until the market order is finally filled. In this way, the ask price increases. 7

The market depth is the number of shares available to buy or sell at the ask or bid price respectively. Equivalently, the depth of a market is a measure of the size of a market order necessary to move quoted prices. However, in this course, we will assume that there is infinite market depth. Equivalently, we will assume that investors are small relative to the limit order book, so they are price takers, not price makers. However, the most recent financial crisis shows that this assumption does not always approximate reality – just ask the traders at Lehman Brothers! 2. Prerequisite knowledge The emphasis of this course is on some of the mathematical aspects of financial market models. Very little is assumed of the reader’s knowledge of the workings of financial markets. However, some mathematical background is needed. Our starting point is the famous observation (sometimes attributed to Niels Bohr) that it is difficult to make predictions, especially about the future. Indeed, anyone with even a passing acquaintance with finance knows that most of us cannot predict with absolute certainty how the the price of an asset will fluctuate – otherwise we would be much richer! Therefore, the proper language to formulate the models that we will study is the language of probability theory. An attempt is made to keep this course self-contained, but you should be familiar with the basics of the theory, including knowing the definition and key properties of the following concepts: random variable, expected value, variance, conditional probability/expectation, independence, Gaussian (normal) distribution, etc. Familarity with measure theoretical probability is helpful, though a crashcourse on probability theory is given in an appendix. Please send all comments and corrections (including small typos and major blunders) to me at [email protected].

8

CHAPTER 1

One-period models 1. One-period models We consider a market with n assets. The identity of the assets is not important as long as the standing assumptions (zero dividends, zero tick size, zero transaction costs, no short-selling constraints, infinite divisibility, zero bid-ask spread, infinite market depth) are fulfilled. We usually think of the assets as being stocks and bonds, but they also can be more exotic things like pork belly futures. The models we will encounter in this course will be of form (Pt1 , . . . , Ptn )t∈T where Pti will model the price of asset labelled i at time t ∈ T. In this course, the index set T will be one of the three sets • {0, 1} for single period models, • Z+ = {0, 1, 2, . . .} when time is discrete, and • R+ = [0, ∞) when time is continuous. As simple as it seems, much of the financial aspects of this course already appear in oneperiod models where T = {0, 1}. It is, therefore, appropriate to devote a significant portion of the course to this important special case. We now describe our first model. As we shall see later, this set-up captures most of the essential features of the general discrete-time model. We think of time 0 as the present, where we have full information. Time 1 is the future, so outcomes are uncertain. We model this as follows P0 ∈ Rn is not random and P1 is a Rn -valued random vector How can we tell if a model is good? Note that the non-random vector P0 is already observed, so the only thing left to model is the distribution of the random vector P1 . A statistical criterion would be to say a model is good if it fits data well. However, we are in a situation where we have only one realisation of P1 , which will happen in the future. In particular, at time 0 we have no observations of P1 and hence conventional statistics is impossible! Therefore, we think about the question economically. We consider an investor in the market. Suppose his initial wealth is the non-random amount x ∈ R. He chooses a nonrandom portfolio H ∈ Rn , where the real number H i denotes the number of shares held in asset i. (If H i > 0 then the position is said to be long, and if H i < 0 then the position is said to be short.) The investor consumes the remaining wealth x − H · P0 , where we are using the notation a·b=

n X i=1

9

ai b i

for the usual Euclidean inner (or dot) product in Rn . At time 1, the agent then liquidates the position, receiving the random amount H · P1 which is consumed. We are lead to the following problem: Problem. Given x ∈ R and the function U : R2 → R ∪ {−∞} such that U (·, c1 ) is strictly increasing for all c1 ∈ R and U (c0 , ·) is strictly increasing for all c0 ∈ R. Find H ∗ ∈ Rn to (*)

maximise E[U (x − H · P0 , H · P1 )].

[Technical point. One should be careful about integrability. Recall for that expected value E(Z) is always defined whenever Z ≥ 0 almost surely. But in this case, the expected value may take the value +∞. On the other hand, the expected value E(Z) for a general realrandom variable Z is defined only if either E(Z + ) < ∞ or E(Z − ) < ∞ (or both inequalities hold) in which case E(Z) = E(Z + )−E(Z − ). This convention avoids having to define ∞−∞. By the way, here we are using the standard notation z + = max{z, 0} and z − = max{−z, 0} for real z. To ensure that the objective function of (*) is defined for all H, we implicitly assume that the given initial wealth x, utility function U , the initial price vector P0 , and the terminal price random vector P1 have the following property: E[U (x − H · P0 , H · P1 )+ ] < ∞, for all H. Easy-to-check sufficient conditions which would imply this assumption are either to assume that P1 is bounded or to assume that U is bounded from above. Furthermore, we also assume implicitly that there is at least one portfolio H such that E[U (x − H · P0 , H · P1 )] > −∞ since otherwise the problem is not very interesting. A sufficient condition is to assume U (x, 0) > −∞, corresponding to the expected utility of the portfolio H = 0. These points were omitted in lecture to avoid getting too bogged down on technicalities...] 2. Arbitrage, martingale deflators and 1FTP When does the above optimal investment problem (*) have a solution? To answer the question, we introduce a crucial definition: Definition. An arbitrage is a portfolio H ∈ Rn such that • H · P0 ≤ 0 ≤ H · P1 almost surely, and • P(H · P0 = 0 = H · P1 ) < 1. Example. Consider a market with two assets with prices given by (P 1 , P 2 )

(4, 7)

; ww ww w w ww 1/2

(3, 4)

GG GG GG 1/2 GG#

(2, 3) 10

(The above diagram should be read P01 = 3, P(P12 = 7) = 1/2, etc.) Consider the portfolio H = (−4, 3). It costs H · P0 = 0 to buy at time 0, but the time-1 wealth H · P1 is strictly positive in all states of the world: @5     0= == == 1/2 == 1/2

1

At this point, you should check that the portfolio H = (−3, 2) is also an arbitrage in the example above. Proposition. If the market has an arbitrage, then the optimal investment problem (*) has no solution. (Conversely, if the problem has a solution then the market has no arbitrage.) Proof. Let H be an arbitrage. Pick a portfolio K ∈ Rn . Note that since U (·, ·) is strictly increasing in both arguments we have   U x − K · P0 , K · P1 ≤ U x − (K + H) · P0 , (K + H) · P1 almost surely and    P U x − K · P0 , K · P1 < U x − (K + H) · P0 , (K + H) · P1 > 0. Hence, we have     E U x − K · P0 , K · P 1 < U x − K · P0 − H · P0 , K · P1 + H · P1 and the portfolio K + H is preferred to the original portfolio K. Hence K is not the maximiser. Since this holds for all K, then no maximiser can exist.  Markets with an arbitrage opportunity would be nice–we all would be a lot richer. But for the sake of building realistic models, we usually assume that markets are free of arbitrages. Indeed, the existence of arbitrage is not very economical. In particular, note that if H is an arbitrage, so are the portfolios 2K, 3H, . . . . That is, if we spot an arbitrage in the market, we could scale it to larger and larger proportions and consume more and more. Eventually, the assumption that we are price takers (meaning we are so small relative to the market that we can trade with no price impact) becomes unrealistic. Aside. There is a further reason why we traditionally look at markets with no arbitrage. To explain, we take a moment to ask where do prices come from? The terminal price P1 is unknown at time 0 but is revealed at time 1, so we model it as random. What determines the randomness? One could argue that all that matters is the beliefs of the market participants, not the underlying mechanism that causes the apparent randomness. So, we assume that there are J investors, and each investor j has a probability measure Pj , where j = 1, . . . , J modelling the distribution of P1 . We also assume that each agent j comes to the market with initial capital xj . The market already has the n assets, with total supply of asset i given by S i and S = (S 1 , . . . , S n ). The agents trade with each other until each arrive at an optimal allocation Hj∗ and collectively determine an initial price P0∗ . To formalise this, we have the following definition: 11

Definition. Given initial wealths xj , utility functions Uj and probability measures Pj , for j = 1, . . . J, an equilibrium is a collection of portfolios Hj∗ ∈ Rn , and an initial price P0∗ ∈ Rn such that for all j = 1, . . . , J we have Ej [U (xj − Hj∗ · P0∗ , Hj∗ · P1 )] ≥ Ej [U (xj − H · P0∗ , H · P1 )] for all H ∈ Rn and J X

Hj∗ = S,

j=1

where the notation Ej denotes expectation with respect to Pj . Note that the above condition says that for the equilibrium initial price P0∗ , the agents portfolios Hj∗ solve their version of the optimal investment problem (*). A consequence of the previous proposition is the following motivating result: Proposition. If the market is in equilibrium then no agent can believe there is an arbitrage. We now return to our market model P0 , P1 . Now we explore heuristically the implication of having a maximiser to an optimal investment problem. Consider the function   F (H) = E U x − H · P0 , H · P1 , and suppose H ∗ ∈ Rn maximises F . Assuming that the ingredients x, U and P1 are regular enough that F is smooth, then the first order condition for a maximum reads     ∂U ∗ ∗ ∂U ∗ ∗ ∗ (c , c ) P0 = E (c , c )P1 ∇F (H ) = 0 ⇒ E ∂c0 0 1 ∂c1 0 1 where c∗0 = x − H ∗ · P0 and c∗1 = H ∗ · P1 . We will use this intuition to motivate the following definition: Definition. A martingale deflator (also called a state price density, pricing kernel or stochastic discount factor) for a market model P0 , P1 consists of a non-random Y0 > 0, a random variable Y1 > 0 almost surely such that E(Y1 |P1i |) < ∞, and E(Y1 P1i ) = Y0 P0i for all i = 1, . . . , n. The origin of the terminology martingale deflator is that the process (Yt Pt )t∈{0,1} is a martingale, in the most trivial sense of the word. However, we will use this observation to define martingale deflators in both discrete and continuous time. Also, note that H ∗ is the maximiser of the optimal investment problem (*) then we can construct a martingale deflator as follows:   ∂U ∗ ∂U ∗ Y0 = E (c0 , c1 ∗) and Y1 = (c , c1 ∗). ∂c0 ∂c1 0 Therefore, the economic interpretation of a martingale deflator is the marginal utility of consumption of an optimally invested agent. Now we come to first theorem of the course, and one of the most important theorems in financial mathematics. It is no surprise that it is often called the first fundamental theorem of asset pricing. 12

Theorem (First fundamental theorem of asset pricing). A market model has no arbitrage if and only if there exists a martingale deflator. Proof of the easy direction. First we prove that if there exists a martingale deflator, then there is no arbitrage. Letting Xt = H · Pt for t = 0, 1, we have by the definition of martingale deflator and the linearity of expectations E(Y1 X1 ) = H · E(Y1 P1 ) = H · (Y0 P0 ) = Y0 X0 . Now suppose X0 ≤ 0 and X1 ≥ 0 almost surely. Since Y0 ≥ 0 and Y1 ≥ 0 almost surely, we have 0 ≥ Y0 X0 = E(Y1 X1 ) ≥ 0 so that Y0 X0 = 0 = E(Y1 X1 ). Since Y0 6= 0, we see that X0 = 0. Also, see that Y1 X1 = 0 almost surely by the pigeonhole principle (Recall the pigeonhole principle: if Z ≥ 0 a.s and E(Z) = 0 then Z = 0 a.s.) Again, since Y1 6= 0 almost surely, we conclude that X1 = 0 almost surely. In particular, the portfolio H is not an arbitrage.  Proof of the hard direction. We now suppose that the market has no arbitrage, so that for any vector H ∈ Rn that has the property that H · P0 ≤ 0 ≤ H · P1 almost surely, it must be the case that H · P0 = 0 = H · P1 almost surely. We will show that, given any positive random variable Z, there exists a martingale deflator Y0 , Y1 such that Y1 ≤ Z almost surely. This extra boundedness assumption is much stronger than what we need, but it comes for free from the proof and we will find it useful later in the course. Note that it is sufficient to find a martingale deflator (Yt )t∈{0,1} with the property that Y1 ≤ CZ for some constant C > 0, since Yt0 = Yt /C, also is a martingale deflator and Y10 ≤ Z. 2 Let ζ = e−kP1 k /2−1/Z . Define a function F : Rn → R by F (H) = eH·P0 + E[e−H·P1 ζ]. The positive random variable ζ is introduced to ensure integrability. Indeed note that the 2 integrand e−H·P1 ζ ≤ ekHk /2 is bounded for each choice of H. In particular, the function F is finite everywhere and (by the dominated convergence theorem) smooth. We will show that no investment-consumption arbitrage implies that the function F has a minimiser H ∗ . By the first order condition for a minimum, we have 0 = ∇F (H ∗ ) = eH

∗ ·P 0

P0 − E[e−H

∗ ·P

1

ζPt ]

and hence we may take ∗



Y0 = eH ·P0 and Y1 = e−H ·P1 ζ. Note that Y1 ≤ CZ for some constant C > 0 (which depends on H ∗ in general). So let (Hk )k be a sequence such that F (Hk ) → inf H F (H). If (Hk )k is bounded, we can pass to a convergent subsequence, by the Bolzano–Weierstrass theorem, such that Hk → H ∗ . By the smoothness of F we have inf F (H) = lim F (Hk ) = F (lim Hk ) = F (H ∗ ) H

k

k



so H is our desired minimiser. 13

It remains to show that no arbitrage implies that the sequence (Hk )k is bounded. So for the sake of finding a contradiction, suppose (Hk )k is unbounded. Now we arrive at a little technicality. Let U = {u ∈ Rn : u · P0 = 0 = u · P1 a.s.} ⊆ Rn and let V = U ⊥. Notice that if u ∈ U and v ∈ V then F (u + v) = F (v). Hence, we may assume Hk ∈ V for all k. We can pass to a subsequence such that kHk k ↑ ∞. Now let ˆ k = Hk . H kHk k ˆ k k = 1 and that H ˆ k ∈ V. Since (H ˆ k )k is bounded, we can again pass to a Note that kH ˆ ˆ ˆ = 1 and that convergent subsequence such that Hk → H. Notice once more that kHk ˆ ∈ V. H We know that the sequence F (Hk ) is bounded (since it is convergent) but we also have ˆ

ˆ

F (Hk ) = (eHk ·P0 )kHk k + E[(e−Hk ·P1 )kHk k ζ] ˆ · P0 ≤ 0 ≤ H ˆ · P1 a.s. (since otherwise the right-hand side would so we must conclude that H blow up). ˆ · P0 = 0 = H ˆ · P1 a.s., which By the assumption of no arbitrage we conclude that H ˆ ∈ U. But we also know that H ˆ ∈ V. Since the subspaces are orthogonal, we means H have U ∩ V = {0}, and in particular, we have H = 0. But this contradicts the fact that ˆ = 1. kHk  3. Num´ eraires and equivalent martingale measures In this section, we introduce the concepts of num´eraire portfolios and equivalent martingale measures. The primary purpose of this section is to reconcile concepts and terminology used by other authors to the theory developed so far. We will also find that equivalent martingale measures can be used to simplify some calculations later in the course. In most discussions of arbitrage theory, there is the assumption that at least one asset is a num´eraire: Definition. An asset is a num´eraire iff its price is strictly positive for all time, almost surely. More generally, a portfolio η ∈ Rn is a num´eraire portfolio is η · P0 > 0 and η · P1 > 0 almost surely. Having a num´eraire in the market simplifies the story in some ways. For instance, when we discuss arbitrage theory, we no longer have to allow for intermediate consumption. Definition. A terminal consumption arbitrage a portfolio K such that • K · P0 = 0 ≤ K · P1 almost surely. • P(K · P1 > 0) > 0. Proposition. Suppose the market model has a num´eraire portfolio. There exists a terminal consumption arbitrage if and only if there exists an arbitrage. 14

Proof. Note that a terminal consumption arbitrage is already an arbitrage. So suppose H is an arbitrage. That is, H · P0 ≤ 0 ≤ H · P1 almost surely and P(H · P0 = 0 = H · P1 ) < 1. To find the terminal consumption arbitrage, the idea is to let K be the strategy that consists of holding the portfolio H but instead of consuming −H · P0 at time 0, this money instead is invested into the num´eraire portfolio. In notation, let H · P0 η. K=H− η · P0 Note that K · P0 = 0 by construction. Also we have η · P1 H · P0 . K · P1 = H · P1 − η · P0 Note that K · P1 ≥ 0 almost surely. Also, since

η·P1 η·P0

> 0 almost surely, we have

P(K · P1 > 0) = 1 − P(H · P0 = 0 = H · P1 ) > 0. Hence K is a terminal consumption arbitrage.



Our goal is to define an equivalent martingale measure. We begin with another definition: Definition. Let (Ω, F) be a measurable space and let P and Q be two probability measures on (Ω, F). The measures P and Q are equivalent, written P ∼ Q, iff P(A) = 1 ⇔ Q(A) = 1 iff P(A) = 0 ⇔ Q(A) = 0. The above definition says that equivalent probability measures have the same almost sure events. Complementarily, equivalent probability measures have the same null sets. It turns out that equivalent measures can be characterised by the following theorem. When there are more than one probability measure floating around, we use the notation EP to denote expected value with respect to P, etc. Theorem (Radon–Nikodym theorem). The probability measure Q is equivalent to the probability measure P if and only if there exists a P-a.s. and Q-a.s. positive random variable Z such that Q(A) = EP (Z 1A ) for each A ∈ F. The random variable Z is called the density, or the Radon–Nikodym derivative, of Q with respect to P, and is often denoted dQ Z= . dP In fact, P also has a density with respect to Q given by dP 1 = . dQ Z We only need the easy direction of the theorem, that the existence of a positive density implies equivalence, for this course. Here is a proof. The proof of the harder direction is omitted since we do not need it. 15

Proof. Suppose P(Z > 0) = 1 and that EP (Z) = 1. Define a set function Q by Q(A) = EP (Z 1A ). Note that Q is countably additive by the monotone convergence theorem. Also, Q(Ω) = EP (Z) = 1, so Q is a probability measure. If P(A) = 0, then the event {1A = 0} is P-almost sure and hence Q(A) = EP (Z 1A ) = 0. Conversely, if Q(A) = 0 we can conclude that {Z 1A = 0} is P-a.s. by the pigeon-hole principle since {Z 1A ≥ 0} is P-a.s. But since {Z > 0} is P-a.s., we must conclude that {1A = 0} is P-a.s., i.e. P(A) = 0. Thus Q and P are equivalent.  Example. Consider the sample space Ω = {1, 2, 3} with the set F of events all subsets of Ω. Consider probability measures P and Q defined by • P{1} = 21 , P{2} = 21 , and P{3} = 0 999 1 , Q{2} = 1000 , and Q{3} = 0. • Q{1} = 1000 Then P and Q are equivalent. We may take their density Z = dQ to be dP 999 1 , Z(2) = , Z(3) = 0. Z(1) = 500 500 (Since both measures don’t ‘see’ the event {3}, we can let Z(3) be any value.) Definition. Let (Pt )t∈{0,1} be a market model defined on a probability space (Ω, F, P). The measure P is called the objective (or historical or statistical ) measure for the model. Suppose the market has a num´eraire portfolio η. Let Nt = η·Pt for t = 0, 1. An equivalent martingale measure relative to the num´eraire is any probability measure Q equivalent to P such that EQ (|P1i |/N1 ) < ∞ and  i P1 Pi Q E = 0 N1 N0 for all i ∈ {1, . . . , n}. Remark. The idea is that a num´eraire can be used to count money. Hence, we can speak in terms of prices relative to (or discounted by) the num´eraire. As a preview of what’s to come, the term equivalent martingale measure is appropriate since the discounted price processes (Pti /Nt )t∈{0,1} are a martingale for Q with respect to the filtration (Ft )t∈{0,1} where F0 = {∅, Ω} and F1 = F. We will elaborate on this in the multi-period case. When the market has a num´eraire, then the notion of a martingale deflator and that of an equivalent martingale measure are essentially the same. Proposition. Consider a market with num´eraire portfolio η, and let Nt = η · Pt for t = 0, 1. If (Yt )t∈{0,1} is a martingale deflator, then the measure Q with density dQ Y1 N1 = dP Y0 N0 is an equivalent martingale measure relative to the num´eraire. Conversely, if Q an equivalent martingale measure then N0 dQ Y0 = 1 and Y1 = N1 dP 16

defines a martingale deflator. Proof. This is a matter of chasing the definitions.



Putting these ingredients together yields the more common version of the first fundamental theorem: Theorem (First fundamental theorem of asset pricing). Suppose that the market has a num´eraire. Then there is no terminal consumption arbitrage if and only if there exists an equivalent martingale measure relative to the num´eraire. Finally, some more definitions which are frequently used in financial modelling. Definition. An asset (or more generally a portfolio) is a one-period model is called riskless iff its time 1 price is not random. Definition. An equivalent martingale measure relatative to a riskless num´eraire is called a risk-neutral measure. 4. Contingent claim pricing The setting of this section is as follows. We find ourselves in a market with prices (Pt )t∈{0,1} . A new asset (a contingent claim) is introduced to the market with time 1 price ξ1 , a given random variable. The question is this: what is a good value for the initial price ξ0 of this new asset? Example (Call option). A European call option gives the owner of the option the right, but not the obligation, to buy a given stock at time 1 at some fixed price K, called the strike of the option. Let S1 denote the price of the stock at the maturity date 1. There are two cases: If K ≥ S1 , then the option is worthless to the owner since there is no point paying a price above the market price for the underlying stock. On the other hand, if K < S1 , then the owner of the option can buy the stock for the price K from the counterparty and immediately sell the stock for the price S1 to the market, realising a profit of S1 − K. Hence, the payout of the call option is ξ1 = (S1 − K)+ , where a+ = max{a, 0} as usual. The ‘hockey-stick’ graph of the function g(x) = (x − K)+ is below.

We will assume that the original market (Pt )t∈{0,1} has no arbitrage, since otherwise it is difficult to formulate a reasonable answer to the question. Now recall that given a utility function U and initial wealth x, the portfolio H is preferred to K iff E[U (x − H · P0 , H · P1 )] ≥ E[U (x − K · P0 , K · P1 )] 17

where if the inequality is > we says there is strict preference, if the inequality is = we say there is indifference. Now add the claim to the market. For which initial price ξ0 would you be willing to hold one share of the claim? Using the reasoning above, the answer is iff there exists a portfolio H ∈ Rn such that the augmented portfolio (H, 1) ∈ Rn+1 (the portfolio of holding H i shares of asset i for i = 1, . . . , n and holding one share of the claim) is preferred to the augmented portfolio (K, 0) for all K ∈ Rn . That is to say, you would be willing to pay ξ0 for one share of the claim iff max E[U (x − ξ0 − H · P0 , ξ1 + H · P1 )] ≥ max E[U (x − K · P0 , K · P1 )]. H

K

This leads us to a definition: Definition. Given the market (Pt )t∈{0,1} , the payout of the claim ξ1 , the initial wealth x and the utility function U , the utility indifference price ξ0∗ is defined to any solution to max E[U (x − ξ0∗ − H · P0 , ξ1 + H · P1 )] = max E[U (x − K · P0 , K · P1 )]. H

K

assuming right-hand side is finite. The idea is that if U is increasing in both arguments, so you prefer to consume more rather than less, then the indifference price ξ0∗ is an upper bound for the price you would be willing to pay for one share of the claim. Although the notion of the indifference price does provide an answer to our question, it is a bit unsatisfactory since it depends both the model of the future prices and our preferences. While there is plenty of price data available and hence there are many popular statistical models for prices, we do not directly observe our utility function. So in practice, it is hard to compute the utility indifference price. The following concept tackles the pricing problem in a preference-independent way. Definition. A portfolio H ∈ Rn super-replicates the claim with payout ξ1 iff H ·P1 ≥ ξ1 almost surely. The connection between super-replication and indifference pricing is this: Proposition. Suppose H · P1 ≥ ξ1 almost surely. If U is strictly increasing in both arguments, then H · P0 ≥ ξ0∗ . This says that if a portfolio super-replicates the claim, then the initial cost of this portfolio is an upper bound for your utility indifference price for the claim, and we have already argued that your utility indifference price is an upper bound for the price you would be willing to pay for the claim. Proof. For any portfolio K we have and using the assumption U (c0 , ·) is increasing we have E[U (x − H · P0 − K · P0 , ξ1 + K · P1 )] ≤ E[U (x − (H + K) · P0 , (H + K) · P1 )] ≤ max E[U (x − J · P0 , J · P1 )] J

= max E[U (x − ξ0∗ − J · P0 , ξ1 + J · P1 )] J

where the last line is just the definition of indifference price. Now by maximising over K and using the assumption that U (·, c1 ) is strictly increasing, we have −H · P0 ≤ −ξ0∗ which is what we wanted to show.  18

We finally come to a theorem which answers the question of when super-replication is possible. Theorem (Characterisation of super-replication). Suppose the market (Pt )t∈{0,1} is free of arbitrage and that there is a constant x such that E(Y1 ξ1 ) ≤ Y0 x for all martingale deflators (Yt )t∈{0,1} such that Y1 ξ1 is integrable. Then there exists a portfolio H ∈ Rn such that H · P0 ≤ x and H · P1 ≥ ξ1 a.s. Remark. Note the converse direction of the above theorem is also true, and the proof is trivial. Indeed, suppose that for some portfolio H we have H · P0 ≤ x and H · P1 ≥ ξ1 a.s. Then for any suitably integrable Y1 we have E(Y1 ξ1 ) ≤ E(Y1 H · P1 ) = Y0 H · P0 ≤ Y0 x. Proof. Suppose E(Y1 ξ1 ) ≤ Y0 x for every martingale deflator such that Y1 ξ1 is integrable. We need to show that there exists a H ∗ ∈ Rn such that H ∗ · P0 ≤ x and H ∗ · P1 ≥ ξ1 a.s. To that end, let Fγ (H) = e−γ(x−H·P0 ) + E[e−γ(H·P1 −ξ1 ) ζ] 2

2

where the factor ζ = e−kP1 k −ξ1 is introduced to ensure integrability. (This function is motivated by the utility maximisation problem introduced in the last chapter, where u(c) = −e−γc . The parameter γ > 0 is the investor’s risk aversion. We plan to send γ → +∞, which corresponds the limit where the investor can tolerate no losses.) For each γ > 0, by the proof of the first fundamental theorem of asset pricing, there exists a unique Hγ ∈ V such that Fγ (Hγ ) = inf Fγ (H), H

where V = {u ∈ Rn : u · P0 = 0 = u · P1 a.s.}⊥ . By the first order condition for a minimum ∇Fγ (Hγ ) = 0, we see that by setting Y0γ = eγ(Hγ ·P0 −x) and Y1γ = eγ(ξ1 −Hγ ·P1 ) ζ we have found a martingale deflator. Note that ∂ Fγ (h)|h=Hγ = Y0γ (Hγ · P0 − x) + E[Y1γ (ξ1 − Hγ · P1 )] ∂γ = Hγ · (Y0γ P0 − E[Y1γ P1 ]) + E[Y1γ ξ1 ] − Y0γ ξ0 ≤0 by since Y γ is a martingale deflator and the assumption that E(Y1 ξ1 ) ≤ Y0 x for all Y . Also note that γ 7→ Hγ is differentiable. (Indeed, recall that Hγ is defined as the root of the function ∇Fγ : V → V, and D2 Fγ is a strictly positive definite operator on V, so the differentiability of Hγ follows from the implicit function theorem.) Furthermore, Fγ (Hγ ) ≤ Fγ (Hγ±ε ) 19

since Hγ is the minimiser of Fγ and hence ∂ Fg (Hγ )|g=γ = 0. ∂γ Putting this together implies γ 7→ Fγ (Hγ ) is nonincreasing, and in particular sup Fγ (Hγ ) < ∞. γ≥1

Now we consider the sequence (Hk )k where the risk-aversion parameter takes the values γ = k ∈ N. If (Hk )k is bounded, then we can find a convergent subsequence such that Hk → H ∗ . Note that since Fk (Hk ) = (eHk ·P0 −x )k + E[(eξ1 −Hk ·P1 )k ζ] we have by the boundedness of the sequence that x ≥ H ∗ · P0 and ξ1 ≤ H ∗ · P1 a.s. So it remains to rule out the case that the sequence (Hk )k is unbounded. Suppose that it was unbounded. Then we can pass to a subsequence that kHk k ↑ ∞. Again, let ˆ k = Hk H kHk k ˆ k → H. ˆ Note that we have that H ˆ ∈ V and that and pass to a subsequence such that H ˆ = 1. But by the formula kHk ˆ k ·P0 − x H kH

Fk (Hk ) = (e

kk

ξ1

)kkHk k + E[(e kHk k

ˆ k ·P1 kkH k −H k

)

ζ]

ˆ · P0 ≤ 0 ≤ H ˆ · P1 . By no arbitrage, we have H ˆ · P0 = 0 = we see that boundedness forces H ˆ ˆ ˆ ˆ H · P1 a.s. Since H ∈ V we conclude that H = 0, contradicting kHk = 1.  Definition. A contingent claim with payout ξ1 is replicable or attainable iff there exists a portfolio H such that H1 · P1 = ξ1 almost surely. Theorem (Characterisation of replicable claims). Suppose that the market model has no arbitrage, and let ξ1 be the payout of a contingent claim. The claim is attainable if and only if there exists an x ∈ R such that E(Y1 ξ1 ) = Y0 x for all martingale deflators Y such that Y1 ξ1 is integrable. Proof. Suppose E(Y1 ξ1 ) = Y0 x for all suitably integrable Y1 . By the characterisation of super-replication (applied to ξ1 ) there exists H such that H · P0 ≤ x and H · P1 ≥ ξ1 a.s. Similarly, the characterisation of super-replication (applied to −ξ1 ) yields the existence of K such that K · P0 ≤ −x and K · P1 ≥ −ξ1 a.s. Adding gives us (H + K) · P0 ≤ 0 ≤ (H + K) · P1 a.s. Since there is no arbitrage by assumption, we have (H + K) · P0 = 0 = (H + K) · P1 a.s. 20

Therefore, ξ1 ≤ H · P1 = −K · P1 ≤ ξ1 so both H and −K replicate the claim. By the same argument, both have initial price H · P0 = −K · P0 = x.  Remark. For any contingent claim with payout ξ1 there is an interval [ξ0 , ξ0 ] where ξ0 = inf{H · P0 : H · P1 ≥ ξ1 } is the cost of the cheapest super-replicating portfolio and ξ0 = inf{H · P0 : H · P1 ≤ ξ1 } is the cost of the most expensive sub-replicating portfolio. However, if the claim is replicable, then the interval collapses into a single price, which can be calculated by computing the expected value of ξ1 Y1 /Y0 for any martingale deflator Y . Since attainable claims have unique no-arbitrage prices, we single out the markets for which every claim is attainable: Definition. A market is complete if and only if every contingent claim is replicable. A market is incomplete otherwise. We can characterise complete markets: Theorem (Second Fundamental Theorem of Asset Pricing). An arbitrage-free market model is complete if and only if there exists a unique martingale deflator Y such that Y0 = 1. Proof. Suppose that there is a unique martingale deflator such that Y0 = 1. Let ξ1 be any random variable. By the flexibility of the proof of the first fundamental theorem, we can choose the random variable ζ in such a way that we may suppose that ξ1 Y1 is integrable. In particular, there is a number x such that x = E(Y1 ξ1 ) for all (the unique) martingale deflators with Y0 = 1. By the characterisation of attainable claims, there exists a portfolio such that H · P1 = ξ1 . Hence the market is complete. Conversely, suppose that the market is complete. Let Y and Y 0 be martingale deflators 1 such that Y0 = Y00 = 1. Fix for the moment claim with payout |ξ1 | ≤ Y1 +Y 0 (The bound is 1 just to ensure integrability). Since ξ1 is attainable, there exists x such that E(Y1 ξ1 ) = x = E(Y10 ξ1 ) ⇒ E[ξ1 (Y1 − Y10 )] = 0. Now letting Y1 − Y10 ξ1 = (Y1 + Y10 )2 Since we have E[(YT − YT0 )2 Z] = 0 where Z = (Y1 + Y10 )−2 > 0, the pigeon-hole principle yields Y1 = Y10 almost surely as desired.  21

This box summarises the fundamental theorems: 1FTAP: 2FTAP:

No arbitrage Completeness

⇔ ⇔

Existence of martingale deflator Uniqueness of martingale deflator

Complete markets are convenient for a variety of reasons. For instance, complete markets have a riskless num´eraire portfolio: Proposition. Suppose the arbitrage-free market model P is complete. Then there exists a portfolio η such that Bt = η · Pt is strictly positive and non-random for t = 0, 1. Proof. By completeness, there exists a portfolio η such that η · Pt = 1 > 0 almost surely. By no-arbitrage, the initial price of this portolio η · P0 is strictly positive.  In complete markets have even more (arguably too much) structure: Theorem. If the market model P with n assets is complete, there there exists at most n events of positive probability. In particular, the n-dimensional random vector P1 takes values in a set of at most n elements. Proof. Suppose A1 , . . . , Ak are a collection of disjoint events with P(Ai ) > 0 for all i. Claim: the set {1A1 , . . . , 1Ak } is linearly independent, and in particular, the dimension of the span of {1A1 , . . . , 1Ak } is exactly k. To prove this claim, we must show that if a1 1A1 + . . . ak 1Ak = 0 a.s. for some constants a1 , . . . , ak , then a1 = · · · = ak = 0. To this end, note that if i 6= j the sets Ai and Aj are disjoint and hence 1Ai 1Aj = 0. By multiplying both sides of the equation by 1Ai we get ai 1Ai = 0. But since P(Ai ) > 0 it must be the case that ai = 0. Now if the market is complete, each of the 1Ai is replicable. Hence span{1A1 , . . . , 1Ak } ⊆ {H · P1 : H ∈ Rn } = span{P11 , . . . , P1n } Looking at the dimensions of the spaces above, we must conclude k ≤ n.



Example. (Put-call parity formula) Suppose we start with a market with three assets with prices (B, S, C). The first asset is such that the time-1 price B1 is not-random and is strictly positive. The next asset is a stock. The last asset is a call option on that stock with strike K and maturity T , so that CT = (ST − K)+ . Suppose that this market is free of arbitrage. Now we introduce another claim, called a put option. A put option gives the owner of the option the right, but not the obligation, to sell the stock for a fixed strike price at a fixed maturity date. If the strike is K and maturity date is T , then a similar argument as we used for the call option, the payout of a put option is PT = (K − ST )+ . It turns out that the put option is replicable in the market (B, S, C). Indeed, we have the identity P1 = (K − S1 )+ = K − S1 + (S1 − K)+ = (K/B1 , −1, +1) · (B1 , S1 , C1 ). 22

Hence H = (K/B1 , −1, +1) is a replicating portfolio. Now, suppose we want to assign a price P to the put for t = 0. The cost of replication is the unique indifference price (for all increasing utility functions) and also the unique price such that the augmented market (B, S, C, P ) has no arbitrage. Hence we usually set P0 to satisfy B0 P0 − C 0 = K − S0 . B1 This is the famous put-call parity formula. 5. Markets with an infinite number of assets We now consider the seemingly unrealistic situation where the market is allowed to have an infinite number of assets. Rather than being an exercise for mathematicians to generalise needlessly, we will see shortly that this modelling framework does have practical applications. We now let I be an arbitrary index set, and we assume that for each i ∈ I there is an asset i with price Pti at time t. For instance, if we let I = {1, . . . , n} we recover the case where there are n assets. For a finite subset J ⊆ I we will let PtJ denote the |J|-dimensional vector (Ptj )j∈J of asset prices indexed by J. Consider a portfolio of holding H j shares of asset j for each j ∈ J. We let H J denotes the |J|-dimensional vector (H j )j∈J and so the time t price of this portfolio is given by X H J · P1J = H j · P1j . j∈J

Definition. A sequence of finite-dimensional portfolios (H J )J⊆I asymptotically superreplicates a contingent claim with payout ξ1 iff lim inf H J · P1J ≥ ξ1 almost surely. |J|→∞

The sequence asymptotically replicates the claim iff lim H J · P1J = ξ1 almost surely.

|J|→∞

Rather than give some abstract theory around these definitions, we move to a concrete example. We consider a market consisting of a risk-free bond with time-1 price B1 is notrandom and is strictly positive, a stock with time-1 price S1 almost surely strictly positive, and a family of call prices indexed by the strike K ≥ 0 with time-1 price C1 (K) = (S1 −K)+ . Theorem. Given g ∈ C 2 (0, ∞), there exists a family of portfolios that asymptotically replicates the claim with payout g(S1 ). Proof. Before we begin, note that put options are replicable by put-call parity, so we can and will assume that the market also has put options of all strikes. Now the following formula holds identically Z a Z ∞ 0 00 + g(S1 ) = g(a) + g (a)(S1 − a) + g (K)(K − S1 ) dK + g 00 (K)(S1 − K)+ dK 0

a

for any a > 0. Note that the integrand of the first integral is zero unless min{S1 , a} ≤ K ≤ a. Similarly, the integrand of the second integral is zero unless a ≤ K ≤ max{S1 , a}. In 23

particular, the ranges of both integrals are bounded intervals strictly in (0, ∞) so both integrals are ordinary Riemann integrals. (One way to prove the identity is to fix S1 and let h(a) equal the right-hand side. By the standard rules of calculus, we have h0 (a) = 0 and hence h(a) is a constant. To evaluate that constant, let a = S1 and note that both integrals vanish since the ranges of integration have zero length.) To exhibit a family of portfolios approximating g(S1 ), we fix an a > 0 and consider a family of finite subsets of (0, ∞) defined by Kn = {K1n , . . . , Knn } n − Kin ≥ 0 for 1 ≤ i < n, that such that ∆ni = Ki+1

max ∆ni → 0 and Knn → ∞

1≤i K) =

B1 + ∂ C0 (K) B0 K

Q(ST ≥ K) =

B1 − ∂ C0 (K). B0 K

and

If K 7→ C0 (K) is twice-differentiable then the law of the random variable S1 has a density fS1 under Q given by B1 fS1 (K) = ∂KK C0 (K). B0 Proof. Note that + ∂K C0 (K) = lim ε↓0

=−

C0 (T, K + ε) − C0 (T, K) ε

B0 lim EQ [gε (S1 − K)] B1 ε↓0

where

x 1[0,ε) + 1[ε,∞) (x). ε Note that gε is bounded and gε → 1(0,∞) , so the first formula is proven by by the dominated convergence theorem. The formula for the left-derivative is proven similarly. Finally, if C0 is twice-differentiable we have gε (x) =

+ − ∂C0 (K) = ∂K C0 (K) = ∂K C0 (K)

so the density is recovered by differentiating once more with respect to K.



6. Call prices from moment generating functions Since a portfolio of calls and puts on a stock can essentially replicate any European contingent claim, it is important to have models where the call prices can be computed easily. Unfortunately, there are few models where there exists nice, elementary formulae for the call prices. However, there are many models (especially when we get to continuous time) where the moment generating functions can be computed explicitly, and we will now see that given the moment generating function we can compute call prices by integration: Consider a market model (B, S, C0 (·)) as in the last section and let Q be the equivalent martingale measure under which the call prices are calculated. We will suppose that S1 > 0 almost surely. For complex θ in the vertical strip Θ = {θ = p + iq : 0 ≤ p ≤ 1, q ∈ R} define the moment generating function of the log stock price by M (θ) = EQ (eθ log S1 ). 25

Note that we have for θ = p + iq ∈ Θ, EQ (|eθ log S1 |) = EQ (S1p ) ≤ EQ (S1 )p  p B1 S0 = 0 . Definition. Given initial wealth x ≥ 0, an investment-consumption strategy is an ndimensional process predictable process H satisfying the self-financing (with respect to the market P ) condition H1 · P0 ≤ x and Ht · Pt ≥ Ht+1 · Pt for t ≥ 1. Given the strategy H we define the corresponding consumption process c via c0 = x − H1 · P0 and ct = (Ht − Ht+1 ) · Pt . We now recall the definition of a predictable process: ***** This definion is purely mathematical but is useful for us because it is the right way of eliminating clairvoyant investors. Definition. A stochastic process X = (Xt )t≥1 is predictable 3 (with respect to a filtration F) if Xt is Ft−1 -measurable for all t ≥ 1. Remark. Note that the time index set for a predictable process (Xt )t≥1 is (usually) {1, 2, . . .}, not {0, 1, . . .}. Hence X0 is not necessarily defined. Remark. In discrete time, a process X is predictable if and only if the process Y is adapted, where Xt = Yt−1 . That is to say, the notion of predictability can be dispensed with by simply changing notation. However, in continuous time, there is a much deeper difference between the notions of predictability and adaptedness. Therefore, for the sake of a unified treatment of the discrete and continuous time cases, we keep it in. 2. A motivating utility maximisation problem and arbitrage Now that we have our market model and we’ve introduced an investor into this market, our first challenge is to find out how to invest optimally. We consider one such optimal investment problem. Obviously, the following set-up can be generalised in many ways, but since the main motivation for studying this problem is to introduce the very important notion of a martingale deflator (also called a state price density), we try to keep it simple. Let T > 0 be some non-random time horizon, and let U (c) = Eu(c0 , . . . , cT ), 3The

term ‘predictable’ is used in the US, while the synonym ‘previsible’ is more common in the UK. I am American, so I will use ‘predictable’ out of habit. I hope this will not cause too much confusion. 31

where u is a function on [0, ∞). We will suppose that our investor prefers a consumption stream c to c0 if and only if U (c) > U (c0 ). We will assume that ct 7→ u(c0 , . . . , cT ) is strictly increasing for each t, modelling the assumption that the investor strictly prefers more to less. (Usually we also assume that u is strictly concave, so that the investor is risk-averse, strictly preferring to consume the non-random quantity E(C) to the random quantity C, for any non-constant random variable C.) We suppose that investor’s initial wealth is x ≥ 0 given. We also suppose that he will live exactly to age T , and since he derives no utility from wealth in the afterlife, chooses a strategy H such that HT +1 = 0 a.s. Summing up, the investor faces the problem maximise U (c) subject to c0 = x − H1 · P0 , (Ht − Ht+1 ) · Pt = ct , and HT +1 = 0. With this problem in mind, we introduce an important definition: Definition. An arbitrage is an investment-consumption strategy H such that there exists a non-random time T > 0 with the properties, writing c0 = −H1 ·P0 , (Ht −Ht+1 )·Pt = ct , • ct ≥ 0 almost surely for all 0 ≤ t ≤ T , • HT +1 · Pt = 0 almost surely (so cT = HT · PT ) • P (ct > 0 for some 0 ≤ t ≤ T ) > 0. Note that if H f is a feasible investment strategy for the above investment problem and if H a is an arbitrage, then H f + H a is also feasible but has strictly higher expected utility U (cf + ca ) > U (cf ). Inductively, the strategy H f + kH a is feasible for every k ≥ 0. In particular, if there is an arbitrage then there cannot be an optimal investment strategy to the utility maximisation problem. 2.1. Motivation: Langrangian duality. As usual in a constrained optimisation problem, we apply the Lagrangian method. Recall that this involves replacing our given objective function with the so-called Lagrangian which encodes the constraints on the processes H and c. In this case the Lagrangian is L(H, c, Y ) = E[u(c0 , . . . , cT )] + Yt (Ht · Pt − Ht+1 · Pt − ct ) To identify the dual problem, we seek to find conditions on the Lagrange multiplier process Y such that the quantity sup{L(H, c, Y ) : ct ≥ 0, H predictable } is finite. To this end, we employ the standard trick of linear programming - we rewrite the Lagrangian as L(H, c, Y ) = E[u(c0 , . . . , cT )] + E

T X t=1

32

Ht · (Pt Yt − Pt−1 Yt−1 ) + xY0 .

Now, looking at the first term, we see that if Yt ≤ 0, there would not exist a finite maximum when we maximise over ct ≥ 0, since u is strictly increasing. So we see that the dual variable Y must satisfy Yt > 0 almost surely for all t ≥ 0 Look at the second term: since Ht is an arbitrary Ft−1 -measurable random vector, the requirement of a finite maximum leads us to E(Pt Yt |Ft−1 ) = Pt−1 Yt−1 . The notation E(X|G) denotes the conditional expectation of the random variable X with respect to the sigma-field G. The precise definition will be recalled below. Note that there is nothing rigorous to this argument. The intention of this section is just to show that the definition of a martingale deflator which will present now follows naturally from the utility maximisation problem. ***** We briefly recall some notions from probability. Definition. Given a probability space (Ω, F, P), let G ⊆ F be a sub-sigma-field of events. A random variable X : Ω → R is measurable with respect to G ( or briefly, Gmeasurable) if and only if the event {X ≤ x} is an element of G for all x ∈ R. You know what that the conditional expectation of an integrable random variable X given a non-null event G means E(X 1G ) E(X|G) = P(G) The next theorem leads to a definition of conditional expectation given a sigma-field: Theorem (Existence and uniqueness of conditional expectations). Let X be an integrable random variable defined on the probability space (Ω, F, P), and let G ⊆ F be a sub-sigma-field of F. Then there exists an integrable G-measurable random variable Y such that E(1G Y ) = E(1G X) for all G ∈ G. Furthermore, if there exists another G-measurable random variable Y 0 such that E(1G Y 0 ) = E(1G X) for all G ∈ G, then Y = Y 0 almost surely. Definition. Let X be an integrable random variable and let G ⊂ F be a sigma-field. The conditional expectation of X given G, written E(X|G), is a G-measurable random variable with the property that E [1G E(X|G)] = E(1G X) for all G ∈ G. Example. (Sigma-field generated by a countable partition) Let X be a non-negative random variable definedSon (Ω, F, P). Let G1 , G2 , . . . be a sequence of disjoint events with P(Gn ) > 0 for all n and n∈N Gn = Ω. Let G be the S smallest sigma-field containing {G1 , G2 , . . . , ...}. That is, every element of G is of the form n∈I Gn where I ⊆ N. Then E(X|G)(ω) = E(X|Gn ) = 33

E(X 1Gn ) if ω ∈ Gn P(Gn )

where the right-hand side denotes conditional expection given the event Gn . More concretely, suppose Ω = {HH, HT, T H, T T } consists of two tosses of a coin, and let G = {∅, {HH, HT }, {T H, T T }, Ω} be the sigma-field containg the information revealed by the first toss. Suppose the coin is fair, so that each outcome is equally likely. Consider the random variable  a if ω = HH    b if ω = HT X(ω) = c if ω = T H    d if ω = T T. Then  E(X|G)(ω) =

(a + b)/2 (c + d)/2

if ω ∈ {HH, HT } if ω ∈ {T H, T T }

The important properties of conditional expectations are collected below: Theorem. Let all random variables appearing below be such that the relevant conditional expectations are defined, and let G be a sub-sigma-field of the sigma-field F of all events. • linearity: E(aX + bY |G) = aE(X|G) + bE(Y |G) for all constants a and b • positivity: If X ≥ 0 almost surely, then E(X|G) ≥ 0 almost surely, with almost sure equality if and only if X = 0 almost surely. • Jensen’s inequality: If f is convex, then E[f (X)|G] ≥ f [E(X|G)] • monotone convergence theorem: If 0 ≤ Xn ↑ X a.s. then E(Xn |G) ↑ E(X|G) a.s. • Fatou’s lemma: If Xn ≥ 0 a.s. for all n, then E(lim inf n Xn |G) ≤ lim inf n E(Xn |G) • dominated convergence theorem: If supn |Xn | is integrable and Xn → X a.s. then E(Xn |G) → E(X|G) a.s. • If X is independent of G (the events {X ≤ x} and G are independent for each x ∈ R and G ∈ G) then E(X|G) = E(X). In particular, E(X|G) = E(X) if G is trivial. • ‘slot property’: If X is G-measurable, then E(XY |G) = XE(Y |G). In particular, if X is G-measurable, then E(X|G) = X. • tower property or law of iterated expectations: If H ⊆ G then E[E(X|G)|H] = E[E(X|H)|G] = E(X|H) Now we come to one of the most important concepts in financial mathematics, the martingale. A martingale is simply an adapted stochastic process that is constant on average in the following sense: Definition. A martingale relative to a filtration F is an adapted stochastic process M = (Mt )t≥0 with the following properties: • E(|Mt |) < ∞ for all t ≥ 0 • E(Mt |Fs ) = Ms for all 0 ≤ s ≤ t. Remark. The above definition of martingale is the same both discrete- and continuoustime processes. However, if the time index set is discrete T = Z+ , it is an exercise to show that an integrable process M is a martingale only if E(Mt+1 |Ft ) = Mt for all t ≥ 0. That is, it is sufficient to verify the conditional expectations of the process one period ahead. 34

Below are some examples of martingales. Before listing them, it is convenient to introduce a definition: Definition. Given a stochastic process Y = (Yt )t≥0 , the natural filtration of Y is the smallest filtration for which Y is adapted. That is, it is the filtration (Ft )t≥0 where Ft = σ(Ys , 0 ≤ s ≤ t). In what follows, if a stochastic process is given but a filtration is not explicitly mentioned, then we are implicitly working with the natural filtration of the process. Example. Let ξ1 , ξ2 , ξ3 , . . . be independent integrable random variables such that E(ξi ) = 0 for all i. The process (St )t≥0 given by S0 = 0 and St = ξ1 + . . . + ξt is a martingale relative to its natural filtration. Indeed, the random variable St is integrable since E(|St |) ≤ E(|ξ1 |) + . . . + E(|ξt |) by the triangular inequality and all the terms in this finite sum are finite by assumption. Also, E(St+1 |Ft ) = E(St + ξt+1 |Ft ) = E(St |Ft ) + E(ξt+1 |Ft ) = St + E(ξt+1 ) = St , where the conditional expectation E(ξt+1 |Ft ) is replaced by the unconditional expectation E(ξt+1 ) by the assumption that ξt+1 is independent of Ft = σ(S1 , . . . , St ) = σ(ξ1 , . . . , ξt ). Example. We now construct one of the most important examples of a martingale. Let X be an integrable random variable, and let Mt = E(X|Ft ). Then M = (Mt )t≥0 is a martingale. Integrability follows from the theorem on the existence and uniqueness of conditional expectation. Indeed, not that by Jensen’s inequality E(|Mt |) = E(|E(X|Ft )|) ≤ E(E(|X| Ft )) = E(|X|) Now, for every 0 ≤ s ≤ t we have E(Mt |Fs ) = E[E(X|Ft )|Fs ] = E(X|Fs ) = Ms by the tower property. Notice that this example also works in continuous time. Sometimes we are given a process (Mt )0≤t≤T where T > 0 is a fixed, non-random time horizon. To check that this process is a martingale, we need only check that Mt = E(MT |Ft ) for all 0 ≤ t ≤ T, 35

because this corresponds to the construction above with X = MT . Example. This last example is theorem shows how to take one martingale and build another one. Let M be a martingale and let K be a bounded predictable process. Then the process N defined by t X Nt = Ks (Ms − Ms−1 ) s=1

is a martingale. Indeed, by assumption, we have E(|Mt |) < ∞ for all t since M is a martingale and that there exist a constant C > 0 such that |Kt | ≤ C almost surely for all t ≥ 0. Hence E(|Nt |) ≤ ≤

t X s=1 t X

E(|Ks ||Ms − Ms−1 |) C[E(|Ms |) + E(|Ms−1 |)] < ∞

s=1

Using the predictability of K and the slot property of conditional expectation, we have E(Nt+1 − Nt |Ft ) = E(Kt+1 (Mt+1 − Mt )|Ft ) = Kt+1 E(Mt+1 − Mt |Ft ) =0 and we’re done. Remark. The martingale N above is often called a martingale transform or a discrete time stochastic integral. As we will see, it is one of the key building blocks for the continuous time theory to come. 3. Arbitrage and the first fundamental theorem We are ready to rephrase the 1FTAP in discrete time. We put ourselves in the context of a market model with n-dimensional price process P . We begin with a definition, motivated by the heuristic analysis of the dual of a typical optimal investment problem. Definition. A martingale deflator is an adapted process Y such that Yt > 0 for all t ≥ 0 almost surely, and such that the n-dimensional process P Y = (Pt Yt )t≥0 is a martingale. Remark. A martingale deflator is also known as a state price density, a stochastic discount factor or a pricing kernel. Theorem (First fundamental theorem of asset pricing). A market model has no arbitrage if and only if there exists a martingale deflator. It turns out that to prove this theorem, even in the easy direction, we need a little more technology. With that introduction, we begin our study of local martingales. First we start with a definition. Definition. A stopping time for a filtration (Ft )t∈T is a random variable τ taking values in T ∪ {∞} such that the event {τ ≤ t} is Ft -measurable for all t ∈ T. 36

Example. Obviously, non-random times are stopping times. That is, if τ = t0 for some fixed t0 ≥ 0, then {τ ≤ t} = Ω if t0 ≤ t and ∅ otherwise. Example. Here is a typical example of a stopping time. Let (Yt )t≥0 be a discrete-time adapted process and let A be a Borel set. Then the random variable τ = inf{t ≥ 0 : Yt ∈ A} (with the usual convention that inf ∅ = +∞) corresponding to the first time the process enters the set A is a stopping time. Indeed, {τ ≤ t} =

t [

{Ys ∈ A}

s=0

is Ft -measurable because each {Ys ∈ A} is Fs -measurable by the adaptedness of Y , and Fs ⊆ Ft by the definition of filtration. Stopping times can be used to stop processes. Definition. For an adapted process X (in discrete or continuous4 time) and a stopping time τ , the process X τ defined by Xtτ = Xt∧τ is said to be X stopped at τ . Stopping times interact well martingales: stopped martingales are still martingales. Proposition. Let X be a discrete-time martingale and let τ be a stopping time. Then X is a martingale. τ

Remark. A version of this theorem also holds for continuous-time martingales with continuous sample paths. Proof. Note that Xtτ

= X0 +

t X

1{s≤τ } (Xs − Xs−1 ).

s=1 c

Since the event {t ≤ τ } = {τ ≤ t − 1} is Ft−1 -measurable by the definition of stopping time, the process Kt = 1{t≤τ } is predictable. Since X τ is the martingale transform of the bounded predictable process K with respect to the martingale X, it is a martingale.  The above result says that the martingale property is stable under stopping. We use this property as motivation for the following definition. Definition. A local martingale is an adapted process X = (Xt )t≥0 , in either discrete or continuous time, such that there exists an increasing sequence of stopping times (τN ) with τN ↑ ∞ such that the stopped process X τN is a martingale for each N . Remark. Note that martingales are local martingales. Indeed, given a martingale X and any sequence of stopping times τN ↑ ∞, the stopped process X τN is a martingale. 4If

time is continuous, we also need the extra technical assumption that X is progressively measurable in order that the map ω 7→ Xτ (ω) (ω) is measurable. Fortunately, it is sufficient to assume that sample paths of X are continuous, which will be enough for this course. 37

Remark. Note that the local martingale property is also stable under stopping. Indeed, let X be a local martingale and τ a stopping time. Then by definition, there exists a sequence of stopping times σN ↑ ∞ such that X σN is a martingale. Hence (X σN )τ = X σN ∧τ is again a martingale since σN ∧ τ is a stopping time. But note that X σN ∧τ = (X τ )σN , implying that the sequence of stopping times σN ↑ ∞ is such that (X τ )σN is a martingale. This means X τ is a local martingale. Theorem. Suppose X is a discrete-time local martingale and K is a predictable process. Let Mt = M0 +

t X

Ks (Xs − Xs−1 )

s=1

for t ≥ 1 where M0 is a constant. Then M is a local martingale. Remark. This is the martingale transform as before, but now do not insist that K is bounded or that X is a true martingale. As a consequence, we cannot assert that M is a true martingale, merely a local martingale. The idea is that by localising, we can study the algebraic and measurability structure of the martingale transform without worrying about integrability issues. Proof. Since X is a local martingale by assumption, there exists a sequence of stopping times (τn )n with τn ↑ ∞ a.s. such that X τn is a martingale. Let un = inf{t ≥ 0 : |Kt+1 | > n} with the convention inf ∅ = +∞. Note that since K is predictable we have {un ≤ t − 1}c = {un ≥ t} = {|Ks | ≤ n for all 0 ≤ s ≤ t} ∈ Ft−1 and hence that un is a stopping time with un ↑ ∞. Finally, let vn = τn ∧ un . Note vn ↑ ∞ and vn is a stopping time since {vn ≤ t} = {τn ≤ t} ∪ {un ≤ t}. Now X vn = (X τn )un is a stopped martingale, and hence a martingale. Also (Kt 1{t≤vn } )t≥1 is a predictable process, bounded by n. Writing Mtvn

=

t X

vn Ks 1{s≤vn } (Xsvn − Xs−1 )

s=1

we see that the stopped process is the martingale transform of a bounded predictable process with respect to the martingale, and hence is a martingale.  The next theorem gives a sufficient condition that a local martingale is a true martingale. Theorem. Let X be a local martingale in either discrete or continuous time. Let Yt be a process such that |Xs | ≤ Yt almost surely for all 0 ≤ s ≤ t. If E(Yt ) < ∞ for all t ≥ 0, then X is a true martingale. Proof. Let (τN )N be a localising sequence of stopping times for X. Note that Xt∧τN → Xt a.s. since τN ↑ ∞. Furthermore, by assumption |Xt∧τN | ≤ Yt which is integrable, so we 38

may apply the conditional version of the dominated convergence theorem to conclude E(Xt |Fs ) = E(lim Xt∧τN |Fs ) N

= lim E(Xt∧τN |Fs ) N

= lim Xs∧τN N

= Xs for 0 ≤ s ≤ t, where we have used the fact that the stopped process (Xt∧τN )t≥0 is a martingale.  The following corollary is useful: Corollary. Suppose X is a DISCRETE-TIME local martingale such that E(|Xt |) < ∞ for all t ≥ 0. Then X is a true martingale. Proof. Let Yt = |X0 | + . . . + |Xt |. The process Y is integrable by assumption and |Xs | ≤ Yt for all 0 ≤ s ≤ t. The conclusion follows from the previous theorem.  In the absence of integrability, the next best property is non-negativity. First we need some definitions. Definition. A supermartingale relative to a filtration (Ft )t≥0 is an adapted stochastic process (Ut )t≥0 with the following properties: • E(|Ut |) < ∞ for all t ≥ 0 • E(Ut |Fs ) ≤ Us for all 0 ≤ s ≤ t. A submartingale is an adapted process (Vt )t≥0 with the following properties: • E(|Vt |) < ∞ for all t ≥ 0 • E(Vt |Fs ) ≥ Vs for all 0 ≤ s ≤ t. Remark. Hence a supermartingale decreases on average, while a submartingale increases on average. A martingale is a stochastic process that is both a supermartingale and a submartingale. As in the case of the definition of martingale, to show that an adapted, integrable process U is a supermartingale in discrete time, it is enough to show that E(Ut+1 |Ft ) ≤ Ut for all t ≥ 0. Theorem. Suppose X is a local martingale in either continuous or discrete time. If Xt ≥ 0 for all t ≥ 0, then X is a supermartingale. Proof. In the general case, let (τN )N be the localising sequence for X. First we show that Xt is integrable for each t ≥ 0. Fatou’s lemma yields E(|Xt |) = E(Xt ) = E(lim Xt∧τN ) N

≤ lim inf E(Xt∧τN ) N

= X0 < ∞. 39

Now that we have established integrability, we can discuss conditional expectations. The conditional version of Fatou’s lemma yields E(Xt |Fs ) = E(lim Xt∧τN |Fs ) N

≤ lim inf E(Xt∧τN |Fs ) N

= lim inf Xs∧τN N

= Xs for 0 ≤ s ≤ t, as claimed.



As before, discrete time local martingales are particularly nice: Corollary. If X is a DISCRETE-TIME local martingale such that Xt ≥ 0 a.s. for all t ≥ 0, then X is a martingale. Proof. By the above theorem, we have that E(|Xt |) = E(Xt ) ≤ X0 < ∞. Since X is integrable, the previous corollary implies X is a martingale.  Theorem. Suppose that Mt = M0 +

t X

Ks (Xs − Xs−1 )

s=1

where K is predictable, X is a martingale and M0 is a constant. If MT ≥ 0 a.s. for some non-random T > 0, then (Mt )0≤t≤T is a true martingale. Proof. Just as before, let τN = inf{t ≥ 0 : |Kt+1 | > N }. Note Ms 1{t≤τN } is integrable for all 0 ≤ s ≤ t, since X is integrable by definition of martingale, and Ks is bounded on {t ≤ τN }. Hence we have 0 ≤ E[MT 1{T ≤τN } |FT −1 ]

= E[MT −1 1{T ≤τN } + KT 1{T ≤τN } (XT − XT −1 )|FT −1 ] = MT −1 1{T ≤τN } + KT 1{T ≤τN } E[XT − XT −1 |FT −1 ] = MT −1 1{T ≤τN } .

Taking N → ∞ shows YT −1 ≥ 0 a.s., induction shows that Yt ≥ 0 for all 0 ≤ t ≤ T . Therefore (Yt )0≤t≤T is a non-negative local martingale in discrete time and hence a true martingale.  Proof of the 1FTAP, easier direction. First we suppose that there is a martingale deflator Y . Let H be an n-dimensional predictable process and let c0 = −H1 · P0 and ct = (Ht − Ht+1 ) · Pt for t ≥ 1, and suppose ct ≥ 0 almost surely for all t ≥ 0. Finally, suppose there is some non-random T > 0 such that HT +1 · PT = 0, or equivalently, cT = HT · PT . To show that H is not an arbitrage, must show that ct = 0 almost surely for all 0 ≤ t ≤ T . To this end, let Mt = Ht+1 · Pt Yt +

t X s=0

40

cs Y s .

Note that MT =

T X

cs Ys .

s=0

Since Ys > 0 for all s, we need only show that MT = 0 almost surely. Since MT ≥ 0 almost surely, we need only show E(MT ) = 0 by the pigeon-hole principle. Now, we rewrite M as Mt =

t X

Hs · (Ps Ys − Ps−1 Ys−1 ).

s=1

Note M is the martingale transform of the predictable process H with respect to the martingale P Y . Hence M is a local martingale. But since MT ≥ 0 a.s. we can conclude that M is a true martingale. Therefore, we have E(MT ) = M0 = 0. as desired.



In the proof above, the mathematical complication comes from the fact that the predictable process H may be unbounded in general. This is why it was necessary to study local martingales. One way to simplify the proof is to assume the sample space Ω is finite. Indeed, if Ω is finite then H can only take a finite number of values and, in particular, H would be uniformly bounded. One might argue that we could assuming the sample space is finite is not such a problem. Indeed, our cartoon model of the financial market is ignoring plenty of other complications of reality. In particular, since prices really move on a discrete grid (because tick sizes are positive) and one could assume that prices are bounded, say, by £10100 , so a large finite sample space might be enough for our modelling needs. There are a couple reasons why we will strive for results that hold on general probability set-ups. Firstly, many popular models are based on random variables with continuous distributions, such as normal random variables. It would be a shame if our theory could not handle such models. Secondly, often it really is possible to prove general results, and since these notes are aimed at a mathematical audience, we are trying to state and prove results with the mimimum of assumptions. The downside, of course, is extra technical work. 4. Elements of the proof of the harder direction of 1FTAP We have already seen the one period case. The full multi-period proof is a little more difficult because of some technicalities involving measurability. We begin with two propositions that show that two of the existential-type results we needed in one-period proof have measurable versions. Proposition. Let f : Rn × Ω → R be such that f (x, ·) is measurable for all x, and that f (·, ω) is continuous and has a unique minimiser X ∗ (ω) for each ω. Then X ∗ is measurable. 41

Proof. Let g : Rn → R be continuous and suppose that it has a unique minimiser x ∈ Rn . This means that for all q 6= x∗ , we have g(x∗ ) < g(q). Let Q ⊂ Rn be countable and dense - for instance, let Q be the set of points with rational coordinates. By the continuity of g and the density of Q, we have that for every q 6= x∗ , there exists a p ∈ Q such that g(p) < g(q). Let A ⊆ Rn be a closed set of the form [a1 , b1 ] × · · · × [an , bn ]. If x∗ ∈ A, then for any q ∈ Ac ∩ Q, there exists a p ∈ A ∩ Q such that g(p) < g(q), since q 6= x∗ and A ∩ Q is dense in B. Conversely, suppose that for any q ∈ Ac ∩ Q, there exists a p ∈ A ∩ Q such that f (p) < f (q). This means inf g(x) ≤ infc g(x). ∗

x∈A ∩Q

x∈A∩Q

By the continuity of g the above inequality implies inf g(x) = infn g(x) = g(x∗ )

x∈A

x∈R



and in particular, we have x ∈ A since A is closed. We have proven that for any closed rectangle A ⊂ Rn we have \ [ {ω : f (p, ω) < f (q, ω)} {ω : X ∗ (ω) ∈ A} = q∈B c ∩Q p∈B∩Q

where Q is a countable dense subset of Rn . Since Borel sigma-field is generated by such rectangles, this implies the measurability of X ∗ .  We also need a useful measurable version of the Bolzano–Weierstrass theorem. Proposition. Let (ξi )i≥1 be a sequence of measurable functions ξi : Ω → Rn such that supi kξi (ω)k < ∞ for all ω ∈ Ω. Then there exists an increasing sequence of integer-valued measurable functions Ij and an Rn -valued measurable function ξ ∗ such that ξIj (ω) (ω) → ξ ∗ (ω) as j → ∞ for all ω ∈ Ω. Proof. First we consider the n = 1 case. Let ξ ∗ (ω) = lim supi ξi (ω). Note ξ ∗ is finitevalued and measurable, and that for every j > 0 there exists an infinite number of i’s such that ξi (ω) ≥ ξ ∗ (ω) − 1/j. Now let Ij = inf{i ≥ j : ξi ≥ ξ ∗ − 1/j}. Since we have the representation of the event {Ij ≤ h} = ∪hi=j {ξi ≥ ξ ∗ − 1/j} for each h ≥ 1, the function Ij is measurable and ξIj → ξ ∗ as desired. Now we prove the claim for any dimension n ≥ 1 by induction. Suppose that the claim is true for dimension n. Let (ξi )i be a sequence of measurable function valued in Rn+1 such that supi kξi (ω)k < ∞. Writing ξi = (ζi , ηi ) where ζi takes values in Rn and ηi takes values in R, we have by assumption the existence of a measurable sequence Ij and a measurable ζ ∗ such that ζIj → ζ ∗ . 42

Notice that (ηIj (ω) (ω))j is bounded for each ω, and hence by the n = 1 case, there exists an increasing measurable sequence Jk and a measurable η ∗ such that ηIJk → η ∗ . In particular, ξIJk → (ζ ∗ , η ∗ ) = ξ ∗ as desired.



We are now ready to return to the proof of the 1FTAP. Main ideas. For each t ≥ 1, define F : Rn × Ω → R by Ft (H, ω) = eH·Pt−1 (ω) + E[e−H·Pt ζˆt |Ft−1 ](ω) where ζt is positive and Ft -measurable, chosen for integrability. We will show that there exists a Ft−1 -measurable minimiser Ht∗ . As before, once we have the existence of this minimiser, we can use ∇Ft (H ∗ ) = 0 to show that E(Zt Pt |Ft−1 ) = Pt−1 where ∗

e−Ht ·Pt ζˆt Zt = H ∗ ·Pt−1 . e t Hence, we can construct a martingale deflator by Yt =

t Y

Zs .

s=1

We now show the main steps to prove that the Ft−1 -measurable minimiser Ht∗ exists. Fix t ≥ 1 and drop it from the notation. Let Fk (H, ω) = F (H, ω) + kHk2 /k. Now for fixed ω, the function Fk (·, ω) is smooth, strictly convex and Fk (H, ω) → ∞ as kHk → ∞. In particular, there exists a unique minimiser Hk (ω), and by the first proposition Hk is Ft−1 -measurable. We will make use of two observations. First, note that Hk enjoys a certain non-degeneracy property. To describe it, let U(ω) = {u ∈ Rn : u · Pt−1 (ω) = 0, P(u · Pt = 0|Ft−1 )(ω) = 1} and let V(ω) = U(ω)⊥ . Note that the minimiser Hk (ω) is in V(ω) for each ω since F (u + v) = F (v) and hence Fk (u + v) ≥ Fk (v) whenever u ∈ U and v ∈ V, with equality only if u = 0. 43

Second, note that F (Hk ) → inf h F (h) almost surely, since lim sup F (Hk ) ≤ lim sup Fk (Hk ) k

k

≤ lim sup Fk (h) k

= F (h) for all h ∈ Rn . The rest of the proof is the same as before. Let A = {sup kHk k < ∞} k

be the Ft−1 -measurable set on which the sequence (Hk (ω))k is bounded. Hence, by the second proposition we can extract a measurable subsequence on which that Hk converges on A to a Ft−1 measurable H ∗ . We then show that P(Ac ) = 0 by first assuming that P(Ac ) > 0 and then using the measurable Bolzano–Weierstrass trick to find an arbitrage, contradicting the assumption that the market is free of arbitrage.  5. Num´ eraires and equivalent martingale measures Now let’s return to our financial model. Definition. A num´eraire is a portfolio η satisfying the self-financing condition ηt+1 · Pt = ηt · Pt almost surely for allt ≥ 1 (i.e. no intermediate consumption is allowed) such that the portfolio value Nt = ηt · Pt = ηt+1 · Pt is strictly positive for all t ≥ 0. Proposition. A market has an arbitrage if and only there exists a terminal consumption arbitrage: a predictable process (Kt )t≥1 such that K1 · P0 = 0, Kt+1 · Pt = Kt · Pt almost surely for all 0 ≤ t ≤ T and KT +1 · PT = 0 almost surely for some non-random time T > 0, where KT · PT ≥ 0 a.s. and P(KT · PT > 0) > 0. Proof. Let η be a num´eraire strategy with corresponding wealth process η · P = N . Let H be a candidate (investment-consumption) arbitrage such that c0 = −H1 · P0 ≥ 0, ct = (Ht − Ht+1 ) · Pt ≥ 0 almost surely for all 1 ≤ t ≤ T and HT +1 · PT = 0 almost surely, so that cT = HT · PT . Let K be the strategy that consists of holding at time t the portfolio Ht but of instead of consuming the amount ct , instead invest this money into the num´eraire portfolio. In notation, K is defined by t−1 X cs Kt = Ht + ηt Ns s=1 44

Note that (Kt − Kt+1 ) · Pt =(Ht − Ht+1 ) · Pt − ηt+1 · Pt

ct Nt

t−1 X cs + (ηt − ηt+1 ) · Pt Ns s=1

=0 so K is a pure investment strategy by the assumption that η is pure-investment. Finally T X cs KT · PT = NT ≥ 0. Ns s=1

In particular, K is a terminal consumption arbitrage if and only if H is an investmentconsumption arbitrage.  Definition. Let P be a market model defined on a probability space (Ω, F, P). The measure P is called the objective (or historical or statistical ) measure for the model. Suppose that there exists a num´eraire portfolio η and let N = η · P . An equivalent martingale measure relative to this num´eraire is any probability measure Q equivalent to P such that the discounted price processes   Pt Nt t≥0 is a martingale under Q. Remark. In many accounts of arbitrage theory, the concept of an equivalent martingale measure has taken centre stage. I believe that its importance has been overstressed. In particular, it is a num´eraire-dependent concept, unlike that of a martingale deflator. For instance, if there are two assets that both num´eraires (for example from the point of view of a British trader, both the euro and the US dollar are num´eraires) then one must be very careful to specify which one is the num´eraire. Theorem (First Fundamental Theorem of Asset Pricing when there is a num´eraire). Suppose the market has is a num´eraire, and fix a non-random time horizon T > 0. The market model (Pt )0≤t≤T has no arbitrage if and only if there exists an equivalent martingale measure relative to the num´eraire. Proof. We already know that there is no arbitrage if and only if there exists a martingale deflator. We now show that there is essentially a one-to-one correspondence between martingale deflators and equivalent martingale measures once a finite horizon T > 0 is specified. Let Y be a process such that {YT > 0} is P-a.s. and such that YT PT is P-integrable. Define a new measure Q by the density dQ YT NT = P . dP E (YT NT ) 45

Our analysis turns on the Bayes formula   PT EP (PT YT |Ft ) Q E |Ft = P NT E (NT YT |Ft ) Suppose Y is a martingale deflator. In this case YT PT is integrable and we have EP (PT YT |Ft ) = Pt Yt . by definition. Also note that Yt Nt − Yt−1 Nt−1 = ηt · (Yt Pt − Yt−1 Pt−1 ) and hence Y N is a local martingale. However, since Y N is non-negative, we know from last section that Y N is a true martingale. In particular EP (NT YT |Ft ) = Nt Yt . By the Bayes formula we have  PT Pt |Ft = E NT Nt and hence P/N is a Q-martingale, i.e. Q is an equivalent martingale measure. Conversely, suppose Q is an equivalent martingale measure. Let   dQ P Zt = E |Ft . dP 

Q

Note that Z is a positive P-martingale. Let Yt = Zt /Nt . Since the random variable PT /NT is Q-integrable by the definition of martingale, we can conclude that PT YT is P-integrable. Furthermore, the process Y is positive and satisfies EP (NT YT |Ft ) = EP (ZT |Ft ) = Zt = Nt Yt . Hence by the Bayes formula  E (PT YT |Ft ) = E P

Q

 PT |Ft EP (NT YT |Ft ) NT

Pt (Nt Yt ) Nt = Pt Y t

=

so that P Y is a P-martingale and hence Y is a martingale deflator.



Remark (Not lectured). Notice that the statement of the version of the fundamental theorem above is for a finite horizon model, as opposed to the version presented earlier. Here is an example that shows that there might be no arbitrage but there does not exist an equivalent martingale measure over the infinite horizon. Let ξ1 , ξ2 , . . . be independent random variables with P(ξi = 1) = p = 1 − q = P(ξi = −1) 46

and let St = ξ1 +. . .+ξt be a simple random walk, where we assume that it is not symmetrical p 6= q. Define a market model a two asset model with respect to the natural filtration Ft = σ(ξ1 , . . . , ξt ) by Pt = (1, St ). In particular, there is a num´eraire with constant price Nt = 1, which can be interpreted as cash. First let us compute all martingale deflators for the model. Fix t and ξ1 , . . . , ξt and let Zu = Yt+1 /Yt if ξt+1 = 1, and Zd = Yt+1 /Yt if ξt+1 = −1. Since P Y is a martingale, we have Yt Zu p + Yt Zd q = Yt (St + 1)Yt Zu p + (St − 1)Yt Zd q = St Yt so that Zu = 1/(2p) and Zd = 1/(2q). Hence, we have shown that all martingale deflators satisfy Yt+1 = Yt (4pq)−1/2 (q/p)ξt+1 /2 and hence Yt = Y0 (4pq)−t/2 (q/p)St . Now fix a horizon T > 0 and let PT be the restriction of P to FT . Let QT be the equivalent measure on FT with density dQT = YT /Y0 . dPT By the above discussion, QT is the equivalent martingale measure for the finite horizon model (Pt )0≤t≤T . It is an easy computation to verify that under the measure QT , the random variables ξ1 , . . . , ξT are independent with 1 QT (ξi = 1) = = QT (ξi = −1). 2 Let us consider the measure Q on F with the property that the random variables ξ1 , ξ2 , . . . are independent with 1 Q(ξi = 1) = = Q(ξi = −1), 2 so that QT is the restriction of Q to FT . Is this measure Q an equivalent martingale for the infinite horizon model (Pt )t≥0 ? While it is true that P is a Q-martingale, it is not true that P and Q are equivalent. Indeed,     St St P → p − q = 1, but Q → 0 = 1. t t Since we have assumed p 6= q, we see that these measures are inequivalent! Indeed, note that P (Yt → 0) = 1, but Q (Yt → ∞) = 1. Hence, the market has no arbitrage, yet there does not exist an equivalent martingale measure over the infinite horizon. 47

6. Contingent claims A contingent claim is any cash payment where the size of the payment is contingent on the prices of other assets or any other variable (for instance, the weather). There are two major types of contingent claims that we will study in these notes: European and American. European: specified by a time horizon T > 0 and FT -measurable random variable ξT modelling the payout at the maturity date T . American: specified by a time horizon T > 0 and an adapted process (ξt )0≤t≤T where ξt models the payout of the claim if the owner of the claim chooses to exercise at time t. Example. An American call option gives the owner of the option the right, but not the obligation, to buy a given stock at any time t ∈ [0, T ] at some fixed strike price K. By the argument above, the payout of the call option exercised at time t is given by ξt = (St − K)+ . 6.1. European claims and the second fundamental theorem of asset pricing. Imagine that you find yourself in a market with prices P = (Pt )t≥0 , and you would like to sell a European contingent claim with payout ξT . What price should you ask at time 0 to off-set this liability at time-T ? One criterion would be to ask for enough money to offset the cost of hedging away the liability by trading in the market. Definition. An investment-consumption strategy super-replicates a European contingent claim with payout ξT if there exists an investment-consumption strategy H such that HT · PT ≥ ξT a.s. The next theorem says that in an arbitrage-free market we can compute the amount of initial capital needed to super-replicate a given claim: Theorem. Suppose the market is free of arbitrage, and suppose the process (ξt )t≥0 has the property that ξY is supermartingale for each martingale deflator Y such that ξY is integrable. Then there exists an investment-consumption strategy H such that H1 · P0 ≤ ξ0 and Ht+1 · Pt ≤ ξt ≤ Ht · Pt a.s. for all t ≥ 1. In particular, the initial capital needed to super-replicate the claim with payout ξT is at most ξ0 . Remark. Given an FT -measurable random variable ξT , we can find the minimal process (ξt )0≤t≤T such that ξY is supermartingale for each martingale deflator Y as follows: let   1 ξt = ess sup E(ξT YT |Ft ) : Y a martingale deflator such that ξT YT is integrable . Yt The notation ess sup denotes the essential supremum and will be explained below. In the mean time, assuming that (ξt )0≤t≤T is finite-valued we can see that ξY is a supermartingale for each Y by first noting that we can express a martingale deflator as a product Yt = Y0 Z1 · · · Zt where E(Zt Pt |Ft−1 ) = Pt−1 . 48

Hence we can apply the dynamic programming principle to assert that ξt = ess sup {E(Zt+1 · · · ZT ξT |Ft ) : Zt+1 , . . . , ZT } = ess sup {E(Zt+1 ξt+1 |Ft ) : Zt+1 } Remark. We now explain the notion of essential supremum used above. Let X be a collection of random variables, and let ¯ X(ω) = sup{X(ω) : X ∈ X } for all ω ∈ Ω. Note that ¯ ≥ X everywhere for all X ∈ X . • X ¯ everywhere. • If Y ≥ X everywhere for all X ∈ X then Y ≥ X ¯ may not be a random But there is a problem: if the collection X is uncountable, then X variable, i.e. a measurable function on Ω. (For example, let Ω = [0, 1] with P Lebesgue measure. Let A be a subset of [0, 1], and let Z = {1{t} (ω) : t ∈ A}. Then ¯ Z(ω) = sup{Z(ω) : Z ∈ Z} = sup{1{t} (ω) : t ∈ A} = 1A (ω). Then Z¯ is a random variable if and only if A ⊂ [0, 1] is measurable.) But we have measure-theoretic work around: ˆ Theorem. Let X be a collection of random variables. There exists random variable X which is valued in R ∪ {+∞} and such that ˆ ≥ X almost surely for all X ∈ X . • X ¯ almost surely. • If Y ≥ X almost surely for all X ∈ X then Y ≥ X (The proof will be outlined in the second example sheet.) Definition. We let Y = ess sup X , the essential supremum. (Returning to the example, we see that Z = 0 almost surely for all Z ∈ Z, so it follows that ess sup Z = 0.) With this motivation, we introduce an important class of claims that can be perfectly hedged: Definition. A European contingent claim with payout ξT is replicable or attainable iff there exists a pure investment strategy H such that HT · PT = ξT almost surely. One of the reasons to single out attainable claims is that there is an unambiguous way to price them according to the no-arbitrage principle: Theorem. Suppose that the market model with n-dimensional price process P has no arbitrage. Let ξT be the payout of an attainable European contingent claim with maturity date T > 0, and let H be the n-dimensional replicating strategy. Suppose the claim has price ξt for 0 ≤ t ≤ T . If the augmented market with (n + 1)dimensional price process (P, ξ) has no arbitrage, then ξt = Ht · Pt almost surely for all 0 ≤ t ≤ T 49

Proof. Let X = H · P . The idea is to that if Xt 6= ξt for some t, then there would be an arbitrage in the augmented market. To construct an arbitrage wait until the first time that the price of the replicating portfolio differs from the price of the claim, and then buy the cheap one, sell the expensive one and pocket the difference. In mathematical notation, fix a T > 0 and let τ = inf{0 ≤ t ≤ T : Xt 6= ξt }, with the usual convention that inf ∅ = +∞. Consider the (n + 1)-dimensional investment strategy ¯ t = sign(ξτ − Xτ )1{t>τ } (Ht , −1) H and consumption ct = |ξτ − Xτ |1{t=τ +1} . ¯ = H ¯ · (P, ξ) and note that X ¯0 = X ¯ T = 0. If the augmented market has no Let X arbitrage, then ct = 0 a.s. for all t, implying τ = ∞ a.s. as claimed.  A difficulty in using the no-arbitrage principle to price an attainable contingent claim is that it requires knowing the replicating strategy. The following theorem gives a formula for the no-arbitrage price of the claim which does not require knowledge of this strategy, just that it exists. Theorem. Suppose that the market model with n-dimensional price process P has no arbitrage, and let ξT be the payout of an attainable European contingent claim with maturity date T > 0. The claim is attainable if and only if there exists an x ∈ R such that E(YT ξT ) = Y0 x for all martingale deflators Y such that YT ξT is integrable. Proof. (‘only if’ direction) Since the claim is attainable there exists a pure investment strategy such that HT · PT = ξT a.s. Note that H · P Y is a local martingale from our calculation in the last chapter. And from result in the example sheet, we see that the assumption that YT ξT is integrable is sufficient to conclude that H · P Y is a true martingale. In particular, we have E(YT ξT ) = E(HT · PT YT ) = xY0 for any Y , where x = H0 · P0 is the initial cost of replication. (‘if’ direction) Define ξˆ by   1 ˆ ξt = ess sup E(ξT YT |Ft ) : Y a martingale deflator Yt ˆ is a supermartingale. Similarly, let and note that ξY   1 ξˇt = ess inf E(ξT YT |Ft ) : Y a martingale deflator Yt ˇ is a submartingale. Since for all ξˆT = ξT = ξˇT and ξˆ0 = x = ξˇ0 , we can and note that ξY ˆ ˇ Letting ξ = ξˆ = ξˇ we have proven that ξY is a martingale for all Y . conclude that ξ = ξ. 50

Now, there exists an investment-consumption strategy H such that Ht+1 ·Pt ≤ ξt ≤ Ht ·Pt almost surely for all t ≥ 1 and H1 · P0 ≤ x. Now fix one such Y and let Mt = (Ht · Pt − ξt )Yt +

t−1 X

(Hs − Hs+1 ) · Ps Ys

s=1

= −ξt Yt +

t X

Hs · (Ys Ps − Ys−1 Ps−1 ).

s=1

In particular, note that M is a local martingale by our usual calculations such that Mt ≥ 0 for t ≥ 1, and hence M is a true martingale. However, E(Mt ) = (H1 · P0 − x)Y0 ≤ 0, and hence Mt = 0 for all t ≥ 0. The conclusion follows.



For the sake of comparison, consider the following result: Theorem. Suppose that the market model with n-dimensional price process P has no arbitrage. Let ξT be the payout of (not necessarily attainable) contingent claim with maturity date T > 0. Suppose the claim has price ξt for 0 ≤ t ≤ T and that the augmented market with (n + 1)dimensional price process (P, ξ) has no arbitrage. Then there exists a martingale deflator Y of the original market such that 1 ξt = E(ξT YT |Ft ) Yt for all 0 ≤ t ≤ T . Proof. This is just the first fundamental theorem of asset pricing applied to the augmented market with prices (P, ξ).  Remark. The message is this: if a claim is attainable it can be priced with any martingale deflator. On the other hand, the most one can say for a general claim is that there exists some martingale deflator that prices the claim. Since attainable claims have unique no-arbitrage prices, we single out the markets for which every claim is attainable: Definition. A market is complete if and only if every European contingent claim is attainable. A market is incomplete otherwise. We can characterise complete markets: Theorem (Second Fundamental Theorem of Asset Pricing). An arbitrage-free market model is complete if and only if there exists a unique martingale deflator Y such that Y0 = 1. The proof is really almost identical to the one-period case, and thus omitted. In discrete time models complete markets have even more (arguably too much) structure: Theorem. If the market model P with n assets is complete, then for each t ≥ 0 the probability space Ω can be partitioned into no more than nt Ft -measurable events of positive probability, and in particular, the n-dimensional random vector Pt takes values in a set of at most nt elements. 51

Proof. Let B1 , . . . , BN be a maximal partition of Ω into disjoint Ft−1 -measurable sets of positive measure. If a random vector Ht is Ft−1 -measurable, then it takes exactly one value on each of the Bj ’s for a total of at most N values H1 , . . . , HN . Hence {H · Pt : H is Ft−1 -meas. } = {H1 · Pt 1B1 + . . . + HN · Pt 1BN : H1 , . . . , HN ∈ Rn } = span{Pti 1Bj : 1 ≤ i ≤ n, 1 ≤ j ≤ N } and the dimension of the space above is nN . The argument above proves that there are at most nN sets of disjoint Ft -measurable sets of positive measure. Induction completes the proof.  6.2. Bonds and bank accounts. Definition. A (risk-free zero-coupon) bond with maturity T > 0 (and unit face value) is a European contingent claim whose payout is ξT = 1 almost surely. Proposition. Suppose the market contains a bond. If the market has no arbitrage, then the bond is a num´eraire. Proof. This is an exercise.



Definition. A risk-free asset is one whose price process is predictable. More generally, a risk-free portfoliois a predictable pure-investment strategy η (that is, such that (ηt − ηt+1 ) · Pt = 0 for all t ≥ 1) and such that ηt · Pt is Ft−1 -measurable for all t ≥ 1. In general, there is no reason to assume that the market has a risk-free asset. However, if there are bonds, we can manufacture one: Proposition. Suppose the arbitrage-free market model has bonds of all maturities. Then there exists a risk-free num´eraire strategy. Proof. Let Bt,T be the time-t price of a bond with maturity T . Note that BT,T = 1 and that Bt,T > 0 almost surely for all 0 ≤ t ≤ T by no-arbitrage. Now define the bank account process β by βt =

t Y (1 + rs ) s=1

where

1

− 1. Bt−1,t Note that β is predictable and strictly positive. Now let’s write Pt = (Bt,1 , Bt,2 , . . .) and let eT = (0, . . . , 0, 1, 0, . . .) be the portfolio of holding exactly one bond of maturity T . Let ηt = βt et . First note that βt = ηt · Pt since Bt,t = et · Pt = 1. Finally note that the predictable process η is a self-financing pure-investment strategy since rt =

(ηt+1 − ηt ) · Pt = (βt+1 et+1 − βt et ) · Pt = βt+1 [(1 + rt+1 )et+1 − et ]Pt =0 as desired.

 52

Note that the above proposition automatically applies to complete markets. Proposition. If a no-arbitrage market is complete, then there is a risk-free num´eraire. Now we explore to useful types of equivalent martingale measures. Definition. An equivalent martingale measure with respect to a risk-less num´eraire (bank account) is called a risk-neutral measure. Definition. An equivalent martingale measure with respect to a bond of maturity T > 0 is called a T -forward measure. These notions coincide for T = 1, but differ in general. However, there is a special case where they coincide for T > 1: Proposition. If the bank account process is not-random (or equivalently the spot interest rate process is not random) then the T -forward measure is risk-neutral. Proof. We need to show that the bond price process is predictable. Let (Bt,T )0≤t≤T be the bond price and (βt )0≤t≤T the bank account. If Q is a risk-neutral measure, we have   Bt,T 1 Q =E |Ft . βt βT But if (βt )0≤t≤T is not random, then βt Bt,T = βT and hence the bond price is not random, and in particular predictable.  7. Super-replication of American claims We now discuss American claims. Here, things are quite different. The canonical example of an American claim is the American put option– a contract which gives the buyer the right (but not the obligation) to sell the underlying stock at a fixed strike price K > 0 at any time between time 0 and a fixed maturity date T . Hence, the payout of the option is (K − Sτ )+ where τ ∈ {0, . . . , T } is a time chosen by the holder of the put to exercise the option. The payout of an American claim is specified by two ingredients: • a maturity date T > 0, • an adapted process (ξt )0≤t≤T . For instance, in the case of an American put, we may take ξt = (K − St )+ . Unlike the European claim, the holder of an American claim can choose to exercise the option at any time τ before or at maturity. However, to rule out clairvoyance, we insist that τ is a stopping time. Now, if an American claim matures at T > 0 and is specified by the payout process (ξt )0≤t≤T , then the actual payout of the claim is modelled by the random variable ξτ , where τ is any stopping time for the filtration taking values in {0, . . . , T }. We can think of the American claim then as a family, indexed by the stopping time τ , of European claims with payouts ξτ . To simplify matters, we make the following assumption in this subsection: The market model P = (Pt )0≤t≤T is complete. 53

Let Y = (Yt )0≤t≤T be the unique martingale deflator such that Y0 = 1. Intuitively, the seller of such a claim should at time 0 charge at least the amount sup E (Yτ ξτ ) τ ≤T

to be sure that he can hedge the option, where the supremum is taken over the set of stopping times smaller than or equal to T . Indeed, this is the case. Theorem. Suppose that the adapted process (ξt )0≤t≤T specifies the payout of an American claim maturing at T > 0. There exists a self-financing trading strategy H such that • Xt ≥ ξt for all 0 ≤ t ≤ T , • Xτ ∗ = ξτ ∗ for some stopping time τ ∗ , and • X0 = supτ ≤T E (Yτ ξτ ). where Xt = Ht · Pt = Ht+1 · Pt . Remark. The strategy H dominates the payout of the American claim at all times, but is conservative in the sense that it exactly replicates the optimally exercised claim. The rest of this subsection is dedicated to proving this theorem. ***** We will need a result of general interest: Theorem (Doob decomposition theorem). Let U be a discrete-time supermartingale. Then there is a unique decomposition Ut = U0 + Mt − At where M is a martingale and A is a predictable non-decreasing process with M0 = A0 = 0. Proof. Let M0 = 0 = A0 and define Mt+1 = Mt + Ut+1 − E(Ut+1 |Ft ) At+1 = At + Ut − E(Ut+1 |Ft ) for t ≥ 0. Since U is assumed to be supermartingale, and hence integrable, the processes M and A are integrable. It is straightforward to check that M is a martingale, and since U is a supermartingale, that A is non-decreasing. Also by induction, we see that At+1 is Ft -measurable. Summing up, Mt − At = M0 − A0 +

t X

(Ms − Ms−1 − As + As−1 )

s=1 t X = (Us − Us−1 ) s=1

= Ut − U0 . To show uniqueness, assume that Ut = U0 + Mt − At = U0 + Mt0 − A0t . Then M − M 0 is a predictable discrete-time martingale, that is, a constant.  54

Now we introduce the key concept in optimal stopping theory: Definition. Let (Zt )0≤t≤T be a given integrable adapted discrete-time process. Define an adapted process (Ut )0≤t≤T by the recursion UT = ZT Ut = max{Zt , E(Ut+1 |Ft )} for 0 ≤ t ≤ T − 1. The process (Ut )0≤t≤T is called the Snell envelope of (Zt )0≤t≤T . Remark. The Snell envelope clearly satisfies both Ut ≥ Zt and Ut ≥ E(Ut+1 |Ft ) almost surely. Thus, another way to describe the Snell envelope of a process is to say it is the smallest supermartingale dominating that process. In our application Z will be the process Y ξ, where Y is the martingale deflator and ξ is the process specifying the payout of the American claim. Theorem. Let (Zt )0≤t≤T be an integrable adapted process, let (Ut )0≤t≤T be its Snell envelope with Doob decomposition Ut = U0 + Mt − At . Let τ ∗ = min{t ∈ {0, . . . , T } : At+1 > 0} with the convention τ ∗ = T on {At = 0 for all t}. Then τ ∗ is a stopping time and Uτ ∗ = U0 + Mτ ∗ = Zτ ∗ . Proof. That τ ∗ is a stopping time follows from the fact that the non-decreasing process (At )0≤t≤T is predictable. Now note that E(Ut+1 |Ft ) = E(U0 + Mt+1 − At+1 |Ft ) = U0 + Mt − At+1 since M is a martingale and A is predictable so that by the definition of Snell envelope U0 + Mt − At = max{Zt , U0 + Mt − At+1 }. Note that Aτ ∗ = 0. On the event {τ ∗ = T } the claimed equality holds. On the set {τ ∗ ≤ T − 1} note that U0 + Mτ ∗ = max{Zτ ∗ , U0 + Mτ ∗ − Aτ ∗ +1 }. But since Aτ ∗ +1 > 0 we must conclude Uτ ∗ = U0 + Mτ ∗ = Zτ ∗ .  Theorem. Let Z be an adapted integrable process and let U be its Snell envelope. Then U0 = sup E(Zτ ). τ ≤T

55

Proof. Since U is a supermartingale, U0 ≥ E(Uτ ) for any stopping time τ by the optional sampling theorem. (See example sheet 2.) But since Ut ≥ Zt by construction, U0 ≥ E(Zτ ) for any stopping time τ . But letting τ ∗ = min{t ∈ {0, . . . , T } : At+1 > 0} where U = U0 + M − A is the Doob decomposition of U , we have U0 = U0 + E(Mτ ∗ ) = E(Zτ ∗ ). again by the optional sampling theorem and the previous result.



Remark. By a similar argument, one can show that Ut = ess supt≤τ ≤T E(Zτ |Ft ) for all 0 ≤ t ≤ T . This formula allows us to define the Snell envelope for the infinite horizon case T = ∞ and also in the continuous time case. Definition. If Z is an integrable adapted proces, a stopping time σ such that E(Zσ ) = supτ ≤T E(Zτ ) is called an optimal stopping time. Obviously the stopping time τ ∗ defined above is an optimal stopping time. Example sheet 2 shows how to find another one. ***** Returning to finance, let (ξt )0≤t≤T be the process specifying the payout of an American option, and let (Ut )0≤t≤T be the Snell envelope of Y ξ with Doob decomposition Ut = U0 + Mt − At . We now will use the assumption that the market is complete: let H be strategy such that XT = (U0 + MT )/YT , where Xt = Ht · Pt . Since XY is a martingale since it is a local martingale from before, and since the market is complete, it is also bounded. By the martingale property, we have Xt Yt = U0 + Mt for all 0 ≤ t ≤ T . In particular, • Xt = (U0 + Mt )/Yt ≥ Ut /Yt ≥ ξt for all 0 ≤ t ≤ T , • Xτ ∗ = ξτ ∗ , and • X0 = supτ ≤T E(Yτ ξτ ), completing the proof of the theorem.

56

CHAPTER 3

Brownian motion and stochastic calculus Despite the elegance of discrete-time financial theory, there is at least one glaring problem: explicit computations are difficult. For instance, the fundamental theorems are stated in terms of state price densities, but it is very difficult to classify them except in a few simple examples. The continuous-time theory has the convenient feature that explicit formulae are easy to find–indeed, one of our first results will be the general formula for a state price density in a continuous-time market model. Before we can describe the continuous-time financial theory, we need to first learn about stochastic integration. Recall that in discrete time, the self-financing condition and budget constraint imply that for the wealth process X corresponding to a pure investment strategy H satisfies t X Hs · (Ys Ps − Ys−1 Ps−1 ) Xt Yt = X0 Y0 + s=1

Recall that when Y is a martingale deflator, the process M = Y P is a martingale and the process XY is a local martingale. The continuous time analogue ought to be something like Z t Xt = X0 + Hs · dMs 0

What does the integral on the right mean? If we assume that the sample paths t 7→ Mt are differentiable, we could interpret the integral as the Lebesgue integral Z t dMs Hs · ds. ds 0 Unfortunately, it turns out that life is not that simple. Now, a theorem of stochastic calculus says that a continuous martingale with everywhere differentiable sample paths is necessarily constant. So if we insist that our price processes have differentiable sample paths, we will have a very boring theory. This chapter is concerned with an integration theory where we use the martingale property, rather than the differentiability of the sample paths, as the key ingredient. This theory is nice, and indeed something like the fundamental theorem of calculus holds. This means we can do explicit computations. The most basic example of a continuous martingale is Brownian motion. We will build up our theory by first defining Brownian motion, to construct the Brownian stochastic integral, and to learn the rules of the resulting calculus. The following chapter will provide an extremely brief introduction to this theory. 57

1. Brownian motion In this section, we introduce one of the most fundamental continuous-time stochastic processes, Brownian motion. As hinted above, our primary interest in this process is that it will be the building block for all of the continuous-time market models studied in these lectures. Definition. A Brownian motion W = (Wt )t≥0 is a collection of random variables such that • W0 (ω) = 0 for all ω ∈ Ω, • for all 0 ≤ t0 < t1 < ... < tn the increments Wti+1 − Wti are independent, and the distribution of Wt − Ws is N (0, |t − s|), • the sample path t 7→ Wt (ω) is continuous all ω ∈ Ω. It is not clear that Brownian motion exists. That is, does there exist a probability space (Ω, F, P) on which the uncountable collection of random variables (Wt )t≥0 can be simultaneously defined in such a way that the above definition holds? The answer, of course, is yes, and the proof of this fact is due to Wiener in 1923. Therefore, the Brownian motion is also often called the Wiener process, especially in the U.S. Although the sample paths of Brownian motion are continuous, they are very irregular. Below is a computer simulation of a one-dimensional Brownian motion: 2. Itˆ o stochastic integration We now have sufficient motivation to construct a stochastic integral with respect to a Wiener process. What follows is the briefest of sketches of the theory. There are now plenty of places to turn for a proper treatment of the subject. For instance, please consult one of the following references: • L.C.G. Rogers and D. Williams, Diffusions, Markov Processes, and Martingales: Volume 2 • I. Karatzas and S.E. Shreve, Brownian Motion and Stochastic Calculus. 2.1. The L2 theory. To get things started, let W be a scalar Brownian motion. We will assume that W is adapted to a filtration (Ft )t≥0 . For the record, we will assume that the T filtration satisfies what are called the usual conditions of right-continuity Ft = >0 Ft+ and that F0 contains all P-null events. These are technical assumptions that ensure the existence of stopping times with the right properties. We also will assume that for each 0 ≤ s < t the increment Wt − Ws is independent of Fs . The first building block of the theory are the simple predictable integrands. Definition. A simple predictable process is an adapted process α = (αt )t≥0 of the form αt (ω) =

N X

1(tn−1 ,tn ] (t)an (ω)

n=1

58

Sample path of Brownian motion 3.5 3.0 2.5 2.0 1.5 W

1.0 0.5 0.0 −0.5 −1.0 −1.5

0

1

2

3

4

5

6

7

8

9

10

t

where an is bounded and Ftn−1 -measurable for some 0 ≤ t0 < t1 < ... < tN < ∞. For simple predictable processes we define the stochastic integral by the formula Z ∞ N X an (Wtn − Wtn−1 ) αs dWs = 0

n=1

Theorem (Itˆo’s isometry). For a simple predictable integrand α, we have "Z 2 # Z ∞  ∞ 2 αs dWs =E αs ds E 0

0

Proof. Note that Z ∞ 2 X X αs dWs = a2n (Wtn − Wtn−1 )2 + 2 am an (Wtm − Wtm−1 )(Wtn − Wtn−1 ). 0

n

m  ≤ → 0. N 2 n=1  Remark. For comparison, consider a continuously differentiable function f : [0, 1] → R. Recall that for such functions there exists a constant C > 0 such that |f (t) − f (s)| ≤ C|t − s| for all s, t ∈ [0, 1]. Hence we have N N X X 2 [f (n/N ) − f ((n − 1)/N )] ≤ C 2 /N 2 n=1

n=1 2

= C /N → 0 Since the quadratic variation of a Brownian motion is positive, the typical Brownian sample path is not a continuously differentiable function of time. 64

With the notion of quadratic variation, we can rewrite Itˆo’s formula once more in a particularly easy to remember form: 1 df (Xt ) = f 0 (Xt )dXt + f 00 (Xt )dhXit 2 In this form, the idea of the proof becomes clear: ˆ ’s formula. Fix a partion of [0, t]. By telescoping a sum and Idea of proof of Ito consider the following second order Taylor approximation we have the following: f (Xt ) − f (X0 ) =

N X

f (Xtn ) − f (Xtn−1 )

n=1 N X

1 f 0 (Xtn−1 )(Xtn − Xtn−1 ) + f 00 (Xtn−1 )(Xtn − Xtn−1 )2 2 n=1 Z t Z t 1 00 ≈ f 0 (Xs )dXs + f (Xs )dhXis . 0 0 2 ≈

 3.2. The multi-dimensional version. We now introduce the vector version of Itˆo’s formula. It is basically the same as before, but with worse notation. An n-dimensional Itˆo process (Xt )t≥0 defined by Z t Z t Xt = X0 + αs dWs + βs ds, 0

0

interpreted component-wise as (i) Xt

=

(i) X0

+

Z tX d

αs(i,k) dWs(k)

Z

t

+

βs(i) ds

0

0 k=1

where (Wt )t≥0 is a d-dimensional Brownian motion so that W (1) , . . . , W (d) , are independent scalar Brownian motions, and the predictable process (αt )t≥0 is valued in the space of n × d matrices, and the predictable process (βt )t≥0 is valued in Rn . We insist that Z tX Z tX n X d n (i,k) 2 |βs(i) |ds < ∞ (αs ) ds < ∞ and 0

0

i=1 k=1

i=1

almost surely for all t ≥ 0 so that all of the integrals make are defined. The aim of this section is to give a formula for the Itˆo decomposition of f (t, Xt ). Now in the scalar case we needed a notion of quadratic variation (dXt )2 = dhXit . In the (i) (j) multi-dimensional case, we now introduce the notion of quadratic co-variation (dXt )(dXt ) = dhX (i) , X (j) it . Theorem. There exists a continuous process of finite variation hX (i) , X (j) i, called the quadratic co-variation of X (i) and X (j) , such that hX (i) , X (j) it = lim n

N X

(i)

(i)

(j)

(j)

(Xnt/N − X(n−1)t/N )(Xnt/N − X(n−1)t/N )

n=1

65

for each t ≥ 0, where the limit is in probability, given by dhX (i) , X (j) it =

d X

αs(i,k) αs(j,k) dt.

k=1

The following multiplication table might help you remember how to compute quadriatic covariation, where W and W ⊥ denote independant Brownian motions: (dt)2 = 0

(dt)(dWt ) = 0

(dWt )2 = dt (dWt )(dWt⊥ ) = 0 Now we are ready for the statement of the theorem: Theorem (Itˆo’s formula, multi-dimensional version). Let f : R+ × Rn → R where (t, x) 7→ f (t, x) be continuously differentiable in the t variable and twice-continuously differentiable in the x variable. Then n

n

n

X ∂f 1 X X ∂ 2f ∂f (i) (t, Xt ) dXt + (t, Xt ) dhX (i) , X (j) it df (t, Xt ) = (t, Xt )dt + ∂t ∂xi 2 i=1 j=1 ∂xi ∂xj i=1 4. Girsanov’s theorem As we have seen in discrete time, the economic notion of an arbitrage-free market model is tied to the existence of an equivalent measure for which the asset prices, when discounted by a num´eraire are martingales. Recall that an equivalent measures is related to a positive random variable via the Radon– Nikodym theorem. Indeed, let (Ω, F, P) be our probability space and let Q be equivalent to P. Then, by the Radon–Nikodym theorem there exists a density Z=

dQ dP

such that Z > 0 has unit P-expectation. Conversely, if Z > 0 and EP (Z) = 1, we can define an equivalent measure Q with density Z. Motivated by above discussion, we aim to understand how martingales arise within the context of the Itˆo stochastic integration theory. Consider the stochastic process (Zt )t≥0 given by 1

Zt = e− 2

Rt 0

R |αs |2 ds+ 0t αs ·dWs

where (Wt )t≥0 is a m-dimensional Brownian motion and (αt )t≥0 is a m-dimensional preRt 2 dictable process with 0 |αs | ds < ∞ a.s. for all t ≥ 0. This process is clearly positive. Furthermore, notice that by Itˆo’s formula we have dZt = Zt αt · dWt 66

so that (Zt )t≥0 is a local martingale, as it is a stochastic integral with respect to a Brownian motion. Recall that since Z is a positive local martingale, it is automatically a supermartingale. Hence, if E(ZT ) = 1 for some non-random T > 0, then (Zt )0≤t≤T is a true martingale. In this case, what happens to the Brownian motion when we change to an equivalent measure with density ZT ? Theorem (Cameron–Martin–Girsanov Theorem). Let (Ω, F, P) be a probability space on which a m-dimensional Brownian motion (Wt )t≥0 is defined, and let (Ft )t≥0 be a filtration satisfying the usual conditions. Let 1

Zt = e− 2

Rt 0

kαs k2 ds+

Rt 0

αs ·dWs

and suppose (Zt )0≤t≤T is a martingale. Define the equivalent measure Q on (Ω, FT ) by the density process dQ = ZT . dP ˆ t )0≤t≤T defined by Then the m-dimensional process (W Z t ˆ αs ds Wt = Wt − 0

is a Brownian motion on (Ω, FT , Q). Now, you may be asking yourself: When is the process (Zt )t≥0 not just a local martingale, but a true martingale? Theorem (Novikov’s criterion). If  1 RT  2 E e+ 2 0 kαs k ds < ∞ then

 1 RT  RT 2 E e− 2 0 kαs k ds+ 0 αs ·dWs = 1. 5. A martingale representation theorem

In this section we will see that all continuous martingales are essentially stochastic integrals with respect to Brownian motion. This will have applications to our continuous-time financial models in the next chapter. Theorem (Itˆo’s Martingale Representation Theorem). Let (Ω, F, P) be a probability space on which a m-dimensional Brownian motion W = (Wt )t≥0 is defined, and let the filtration (Ft )t≥0 be the filtration generated by W . Let X = (Xt )t≥0 be a continuous localRmartingale. Then there exists a unique predictable t m-dimensional process (αt )t>0 such that 0 kαs k2 ds < ∞ almost surely for all t ≥ 0 and Z t Xt = X 0 + αs · dWs . 0

Furthermore, if Xt > 0 for all t ≥ 0 then there exists a predictable β such that ∞ and R RT 1 T 2 Xt = X0 e− 2 0 kβs k ds+ 0 βs ·dWs 67

Rt 0

kβs k2 ds
Hs k2 ds < ∞ a.s. for all t ≥ 0 0

0

69

However, in moving from discrete to continuous time, we have to be careful. We will now see that this condition isn’t strong enough to make our economic analysis interesting. Example. Consider a discrete-time market model with two assets P = (1, S) where S is a simple symmetric random walk: St = ξ1 + . . . + ξt where the random variables ξ1 , ξ2 , . . . are independent and P(ξt = 1) = P(ξt = −1) = 1/2. Obviously this market has no arbitrage as P is a martingale. Nevertheless, let’s explore how to approximate an arbitrage in some sense. Given a predictable process π, let φt =

t−1 X

(πs+1 − πs )Ss

s=1

Then the pair (φ, π) defines a self-financing pure investment strategy with associated wealth process t X Xt = πs (Ss − Ss−1 ). s=1

In particular, X0 = 0. A simple strategy that resembles an arbitrage is constructed as follows: first define the stopping time σ = inf{t ≥ 0 : St > 0}. and consider the strategy with πt = 1{t≤σ} Note that the associated wealth process is Xt = St∧σ . Since σ < ∞ a.s., the conclusion is that if you are willing to wait a while, investing in this strategy will result in an almost sure gain Xσ = 1. But the amount of time you have to wait is very long: one can show that E(σ) = +∞. One can improve upon the above idea by taking larger and larger bets, effectively ‘speeding up the clock’. Indeed, define the stopping time τ = inf{t ≥ 0 : ξt = 1} and consider the strategy πt = 2t−1 1{t≤τ } . In this case, the associated wealth process is Xt = 1 − 2t 1{t≤τ −1} . This is the classical ‘martingale’ or doubling strategy. Note that E(τ ) = 2, so an investor following this strategy does not have to wait very long on average to realise the gain Xτ = 1. But although τ is small on average, it is not bounded, and hence this strategy does not qualify as an arbitrage. 70

Example. A technical problem with continuous time models is that events that will happen eventually can be made to happen in bounded time by speeding up the clock. Consider the market with prices P = (1, W ) where W is a Brownian motion. We will now construct a pure investment trading strategy such that the corresponding wealth process has X0 = 0 and XT = K a.s. where T > 0 is an arbitrary (non-random) time horizon and the constant K > 0 is also arbitrary. More concretely, by writing H = (φ, π), we will find a real-valued adapted process RT (πt )t∈[0,T ] such that 0 πs2 ds < ∞ almost surely, but Z T πs dWs = K a.s 0

Let f : [0, T ] → [0, ∞] be a strictly increasing, differentiable function such that f (0) = 0 and f (T ) = ∞. In particular we assume that f 0 (t) > 0 for t and there exists an inverse function f −1 : [0, ∞] → [0, T ] such that f ◦ f −1 (u) = u. For instance, to be explicit, we may t uT take f (t) = T −t and f −1 (u) = 1+u . Now define (Zu )u≥0 by Z f −1 (u) (f 0 (s))1/2 dWs Zu = 0

Note that Z is a local martingale in the filtration (Ff −1 (u) )u≥0 and that the quadratic variation is Z f −1 (u) f 0 (s)ds hZiu = 0

= f (f −1 (u)) − f (0) = u so by L´evy’s characterisation (Zu )u≥0 is a Brownian motion. Define the stopping time τ by τ = inf{u ≥ 0, Zu = K}. Since (Zu )u≥0 is a Brownian motion, we have τ < ∞ almost surely since supu≥0 Zu = ∞ almost surely. Now let πt = (f 0 (t))1/2 1{t≤f −1 (τ )} and Z t Xt =

πs dWs 0

for 0 ≤ t ≤ T . Note that since Z T 0

πs2 ds

Z

f −1 (τ )

f 0 (s)ds = τ < ∞

= 0

the stochastic integral is well-defined. The strange fact is that (Xt )t∈[0,T ] is a local martingale with X0 = 0, but XT = Zτ = K almost surely. We see that integrand (πs )s∈[0,T ] roughly corresponds to an gambler starting at noon with £0, employing a doubling strategy (with borrowed money) at a quicker and quicker pace, until finally he gains £K almost surely before the clock strikes one o’clock. This situation is rather unrealistic, particularly since the gambler must go arbitrarily far into debt in order to 71

secure the £K winning. Indeed, if such strategies were a good model for investor behaviour, we all could be much richer by just spending some time trading over the internet. The above discussion shows that the integrability necessary to define the stochastic integral is not really sufficient for our needs. At this stage, there are several reasonable options. In this course we will insist that the investor cannot go into debt. Definition. A trading strategy H is admissible iff Ht · Pt ≥ 0 for all t ≥ 0 almost surely . Note that the doubling strategy is not admissible, since the investor now has only a finite credit line. However, a suicide strategy, that is, a doubling strategy in which the object is to lose a fixed amount K by time T , is admissible. 3. Arbitrage and local martingale deflators To see that our restriction to admissible strategies is reasonable, let’s now consider continuous-time arbitrage theory. Definition. An admissible strategy H is called an absolute arbitrage iff there is a nonrandom time T such that H0 · P0 = 0 ≤ HT · PT a.s. and P (HT · PT > 0) > 0. An admissible strategy H is called an arbitrage relative to an admissible strategy K iff there is a non-random time T such that H0 · P0 = K0 · P0 , and HT · PT ≥ KT · PT a.s., P (HT · PT > KT · PT ) > 0. Remark. Note that if H is an absolute arbitrage and K is admissible, then the strategy H + K is an arbitrage relative to K. On the other hand, if H 0 is an arbitrage relative to K, then H 0 − K is an absolute arbitrage only if H 0 − K is admissible. In particular, an absolute arbitrage is an arbitrage relative the strategy K = 0 of holding no assets. In discrete time, the notions of absolute arbitrage and relative arbitrage are essentially equivalent since we did not have to worry about admissibility. In continuous time, we will soon find examples of the surprising fact that there exist continuous-time markets that have relative arbitrage but no absolute arbitrage. Such market models are sometimes considered models of price bubbles. The point of all of this is to warn you to be careful when making arbitrage arguments in continuous time, since reasonable people can disagree on what kind of strategies should be called arbitrages. As in the discrete-time theory, we now introduce martingale deflators. 72

Definition. A (local) martingale deflator is a positive Itˆo process Y such that Y P = (Yt Pt )t≥0 is an n-dimensional (local) martingale. Our continuous-time version of the first fundamental theorem follows. Unfortunately, to get a clean statement of this result we need to up the technical ante. Theorem. Suppose there exists a local martingale deflator for the market model P . If K is an admissible strategy such that the process K · P Y is a true martingale, then there is no arbitrage relative to K. In particular, there is no absolute arbitrage. The proof of this fact is based on an important lemma: Lemma. Suppose H is a self-financing pure investment strategy and let Z t Hs · dPs Xt = Ht · Pt = X0 + 0

Then d(Xt Yt ) = Ht · d(Yt Pt ). for any Itˆo process Y . In particular, if Y is a local martingale deflator and H is admissible then XY is a supermartingale. Proof of lemma. . First note Yt dXt = Yt (Ht · dPt ) and Xt dYt = Ht · Pt dYt . Finally, note that dhX, Y it = (Ht · dPt )(dYt ) =

n X

Hti dhP i , Y it .

i=1

Putting this together with Itˆo’s formula yields d(Xt Yt ) = Yt dPt + Xt dYt + dhX, Y it X = Hti (Yt dPti + Pti dYt + hY, P i it ) i

= Ht · d(Yt Pt ) as claimed. Now if Y is a local martingale deflator, then P Y is a local martingale. In particular the process XY can be expressed as the stochastic integral with respect to a continuous local martingale, and hence is itself a local martingale. Finally, if H is admissible, then XY is a non-negative local martingale. Non-negative local martingales are supermartingales by Fatou’s lemma.  Proof that existence of a local martingale deflator implies no arbitrage. Let Y be a local martingale deflator, and let H and K be admissible strategies such that H0 · P0 = K0 · P0 and HT · PT ≥ KT · PT . Furthermore, suppose that H · P Y is a martingale. We must show that HT · PT = KT · PT . 73

By the above lemma and since Y is non-negative, the process H·P Y is a super-martingale. Hence K0 · P0 Y0 = H0 · P0 Y0 ≥ E(HT · PT YT ) ≥ E(KT · PT YT ) = K0 · P 0 Y0 . This shows that HT · PT YT = KT · PT YT . Since Y is strictly positive, the conclusion now follows.  Remark. Note that the above theorem doesn’t say that no relative arbitrage implies the existence of a local martingale deflator. A weaker version notion of relative arbitrage, called ‘free-lunch-with-vanishing-risk,’ is needed to have the converse implication. See the recent book of Delbaen and Schachermayer The Mathematics of Arbitrage for an account of the modern theory. Remark. Here is an example of a market with a relative arbitrageR and no absolute T arbitrage. Fix T > 0 and let (πt )0≤t≤T be a predictable process such that 0 πs2 ds < ∞ and Rt let St = 0 πs dWs where W is a Brownian motion. Suppose St ≤ 1 for all t ≤ T and ST = 1 almost surely. (See the previous section on doubling strategies for an explicit construction of such a process π.) Now consider the market with prices P = (1, S). Note that P is a two-dimensional local martingale, hence there exists a martingale deflator – just set Yt = 1 for all t. Therefore, there is no absolute arbitrage. However, consider the strategy K = (1, −1). Note that Kt ·Pt = 1−St ≥ 0 so K is admissible. We will show that there exists an arbitrage relative to K. Indeed, let H = (1, 0). Note that H0 ·P0 = 1 = K0 ·P0 = 1 but HT ·PT = 1 > KT ·PT = 0. The point of this example is that the asset with price S seems like a good deal - it costs nothing at time 0 but pays a positive amount at time 1. However, holding one share of the asset, corresponding to the strategy (0, 1) = H − K is not admissible. 4. The structure of local martingale deflators In this section we will parametrise a fairly general Itˆo market with n = d + 1 assets. All assets in this market are num´eraires, and we use the notation P = (B, S). We will assume the dynamics of the prices are given by the following equations dBt = Bt rt dt dSti = Sti

µit dt +

m X

! σtij dWtj

for i = 1, . . . , d

j=1

where the processes r, µi , σ ij are predictable and suitably integrable, and the W j are independent Brownian motions. The first asset can be thought of as a bank account, and the random variable rt is the spot interest rate at time t. The (random) ordinary differential equation can be solved: Rt

Bt = B0 e 74

0

rs ds

The d assets can be thought of as risky stocks. The random variablePµit is interpreted as the mean instantaneous return of asset i, while the spot volatility is ( j (σtij )2 )1/2 . Note that Itˆo’s formula yields Rt i 1 P R t P ij ij 2 j Sti = S0i e 0 [µs − 2 j (σs ) ]ds+ 0 j σs dWs . We will use the notation   11  1  σt · · · σt1m µt ..  .. µt =  ...  and σt =  ... . . dm d1 d σt · · · σt µt for the d × 1 vector of means and d × m matrix of volatilities, respectively. With this more explicit parametrisation, we can describe the structure of state price densities: Rt Theorem. Let λ be a predictable m-dimensional process such that 0 kλs k2 ds < ∞ a.s. for all t ≥ 0 and that σt λt = µt − rt 1 for almost all (t, ω) > where 1 = (1, · · · , 1) is the d × 1 vector with the constant 1 in each component. Let Rt Rt 2 Yt = Y0 e− 0 (rs +kλs k /2)ds− 0 λs ·dWs for a constant Y0 > 0 – or in equivalent differential form dYt = Yt (−rt dt − λt · dWt ). Then Y is a state price density. Furthermore, if the filtration is generated by the m-dimensional Brownian motion W , all state price densities have this form. Remark. The m-dimensional random vector λt appearing the theorem is a generalisation of the Sharpe ratio. The process λ = (λt )t≥0 is often called the market price of risk, for the state price density, since it measures in some sense the excess return of the stocks per unit of volatility. Proof. We need to show that Y B and Y S are local martingales. Note that by Itˆo’s formula d(Yt Bt ) = −Yt Bt λt · dWt so Y B is a local martingale since it is the stochastic integral with respect to a Brownian motion W . Also, by Itˆo’s formula d(Yt Sti ) = Yt Sti [−rt + µit − (σt λt )i ] + Yt Sti (σti. − λt ) · dWt = Yt Sti (σti. − λt ) · dWt where we have used the identity σt λt = µt − rt 1 to cancel the dt term. Conversely, if the filtration is generated by the Brownian motion, the martingale representation theorem says that all positive local martingales M are of the form 1

Mt = M0 e− 2

Rt 0

kλs k2 ds−

75

Rt 0

λs ·dWs

for some predictable λ, or in differential form dMt = −Mt λt · dWt . Hence, if Y B = M is a positive local martingale then dYt = −Yt (rt dt + λt · dWt ) by Itˆo’s formula. Furthermore, if Y S is a local martingale, then Itˆo’s formula shows that in order to cancel the drift we must have the identity σt λt = µt − rt 1.  In the discrete time case, corresponding to a martingale deflator Y , there is an equivalent martingale measure (with respect to the bank account) with the Radon–Nikodym density BT YT dQ = . dP B0 Y0 for some time horizon T > 0. Recall, that the discrete time case, the definition of martingale deflator implies that BY is a true martingale. What if Y is a local martingale deflator, so the product BY is only a local martingale. Therefore, we must ceck that BY is a true martingale in order to claim the density above defines an equivalent probability measure. In discrete time, there is no problems since positive local martingales are true martingales. However, in continuous time, we must be more careful. Theorem. Suppose that λ is a predictable process such that σt λt = µt − rt 1. If 1

Rt

Rt

2

Mt = e− 2 0 kλs k ds− 0 λs ·dWs is a true martingale, then the measure Q defined by dQ = MT dP is an equivalent martingale measure (with respect to the bank account). In particular, the dynamics of the stock prices are given by ! X ij ˆ tj dSti = Sti rt dt + σt dW j

ˆ t = Wt + where W

Rt 0

λs ds is a Q-Brownian motion.

Proof. Note that by Itˆo’s formula !  i X ij St Sti d = [µit − rt ]dt + σt dWtj Bt Bt j =

Sti X ij j σ (λ dt + dWtj ) Bt j t t

=

Sti X ij ˆ j σ dWt . Bt j t 76

R ˆ t = Wt + t λs ds is a Q-Brownian motion. Therefore, Now Girsanov’s theorem says that W 0 each S i /B is the stochastic integral with respect to a Q-Brownian motion, and hence is a Q-local martingale as claimed.  As before, given a market model P we can introduce a contingent claim. Recall that a European contingent claim maturing at a time T > 0 is modelled as random variable ξ that is FT -measurable. We shall assume that there exists at least one martingale deflator, so that, in particular, there are no absolute arbitrages. 5. Replication and super-replication First a simple result: Theorem. Suppose H is an admissible super-replication strategy of ξT and Y a local martingale deflator. Then 1 Ht · Pt ≥ E(ξT YT |Ft ). Yt Proof. This is the same as the proof of that the existence of a local martingale deflator implies no arbitrage. E(ξT YT |Ft ) ≤ E(HT · PT YT |Ft ) ≤ Ht · Pt Yt since H · P Y is a supermartingale.



Now we will impose more structure, by assuming that the market model P = (B, S) has dynamics dBt = Bt rt dt dSti = Sti

µit dt +

m X

! σtij dWtj

for i = 1, . . . , d

j=1

as before, or in vector notation, these equations can be written as dSt = diag(St )(µt dt + σt dWt ) where 

s1

0

  diag(s1 , . . . , sd ) =  

0 .. .

s2 .. .

··· ... ..

0



 0  . ..  .  sd all state price densities Y are of

. 0 0 ··· We will work in the filtration generated by W , so that the form dYt = Yt (−rt dt − λt · dWt ). where σt λt = µt − rt 1. The following will serve as a version of the second fundamental theorem of asset pricing in continuous time. 77

Theorem. Suppose the filtration is generated by W , and suppose m = d and that the d × d matrix σt is invertible for all (t, ω), so that in particular, there is a unique (up to scaling) martingale deflator Y of the form dYt = Yt (−rt dt − λt · dWt ). where λt = σt−1 (µt − rt 1). Let ξT be non-negative, FT -measurable and such that ξT YT is integrable. Then there exists an admissible strategy H such that 1 Ht · Pt = EP (YT ξT |Ft ). Yt In particular, the strategy replicates the payout ξT . Remark. That is to say, the quantity E(YT ξT )/Y0 is the minimal amount of money needed to replicate the claim among reasonable trading strategies. Of course, if you could employ a doubling strategy, you could replicate the claim with strictly less money. Of course, you could also replicate the claim with more initial capital by running a suicide strategy on top of the replication strategy. Remark. If BY is a true martingale, then the above formula can be written   ξT Q Ht · Pt = Bt E |Ft BT where Q is the unique equivalent martingale measure relative to the bank account, i.e. the risk-neutral measure. Proof. Let Mt = E(YT ξT |Ft ). Then M is a martingale, and since the filtration is generated by the Brownian motion W the martingale representation theorem tells us that there exists a d-dimensional predictable process α such that dMt = αt · dWt . By Itˆo’s formula we have   Mt (Mt λt + αt ) Mt = d rt dt + · (dWt + λt dt). Yt Yt Yt Now let

  1 Mt πt = diag(St ) and φt = − πt · S t Yt Bt Yt Note that φt Bt + πt · St = Mt /Yt and that (after some tedious algebra)   Mt φt dBt + πt · dSt = d . Yt −1

(Mt λt (σt> )−1

+ αt )

This means H = (φ, π) is a self-financing strategy and Mt Ht · Pt = for all 0 ≤ t ≤ T. Yt 78

It is admissible since Mt ≥ 0 and satisfies XT = MT /YT = ξT and X0 = E(ξT YT )/Y0 as desired.  If we consider the equation σt λt = µt − rt 1 where σt is an d × m matrix, one expects from the rules of linear algebra for there to be no solution if m < d, exactly one solution if m = d, and many solutions if m > d. Of course, this is not a theorem, just a rule of thumb. Financially, the rule of thumb becomes: m < d ‘⇒’ m = d ‘⇒’ m > d ‘⇒’

The market has arbitrage. The market has no arbitrage and is complete. The market has no arbitrage and is incomplete.

6. The Black–Scholes model and formula We will consider the simplest possible model of the type studied introduced above. Consider the case of a market with two assets. We will assume that all coefficients are constant, so the price dynamics are given by the pair of equations dBt = Bt r dt dSt = St (µ dt + σdWt ) for real constants r, µ, σ where σ > 0. We will assume that the filtration is generate by the scalar Brownian motion W . This is often called the Black–Scholes model. We are interested in finding the replication cost of a European contingent claim with payout ξT = g(ST ), where g is a given function which we assume to be non-negative and suitably integrable. We know from before that the unique state price density with Y0 = 1 is given by Yt = e−(r+λ

2 /2)t−λW t

where λ = (µ − r)/σ. Hence, from our existential result there is a trading strategy H which replicates the payout with time t cost 1 Xt = E[YT g(ST )|Ft ]. Yt This is where we see the advantage of working with equivalent martingale measures rather than state price densities. Indeed, define the equivalent martingale measure Q by the density dQ 2 = e−λ T /2−λWT dP ˆ t = Wt + λt is a and recall that by the Cameron–Martin–Girsanov theorem the process W Q-Brownian motion. The price of the stock can be written explicitly: St = S0 e(µ−σ

2 /2)t+σW t

79

= S0 e(r−σ

2 /2)t+σ W ˆ

t

and hence h   i ˆT (r−σ 2 /2)T +σ W Ht · Pt = e E g S0 e |Ft h   i 2 ˆ ˆ = e−r(T −t) EQ g St e(r−σ /2)(T −t)+σ(WT −Wt ) |Ft Z ∞   −z2 /2 √ (r−σ 2 /2)(T −t)+σ T −tz e −r(T −t) √ dz. g St e =e 2π −∞ −r(T −t)

Q

A famous example is the case of the European call option where the payout function is of the form g(S) = (S − K)+ . In this case, we have the the Nobel-prize-winning Black–Scholes formula:   √ log(K/St ) Ct (T, K) =St Φ − √ + (r/σ + σ/2) T − t σ T −t   √ log(K/St ) −r(T −t) Φ − √ − Ke + (r/σ − σ/2) T − t σ T −t Rx 2 where Φ(x) = −∞ √12π e−y /2 dy is the standard normal distribution function. (You are asked to derive this formula on Example Sheet 3.) We have argued that the martingale representation theorem asserts the existence of replicating strategy H, but unfortunately, it gives us no information about how to compute H. This problem will be tackled in the next section. 7. Markovian markets and the Black–Scholes PDE We now have a sufficient condition that a contingent claim can be replicated. However, at this stage we can only assert the existence of a replicating strategy for a given claim, but we do not yet know how to actually compute it. This problem is the subject of this section. The first step is to pose a model for the asset prices (Bt , St )t≥0 . A good model should give a reasonable statistical fit to the actual market data. Furthermore, a useful model is one in which the prices and hedges of contingent claims can be computed reasonably easily. In this section, we will study models in which the asset prices are Markov processes. These models are useful in the above sense, though there seems to be some controversy over how well they fit actual market data. Now suppose that the d + 1 assets have Itˆo dynamics which can be expressed as dBt = Bt r(t, St ) dt dSt = diag(St )(µ(t, St )dt + σ(t, St )dWt ) where the nonrandom functions r : [0, ∞) × Rd → R, µ : [0, ∞) × Rd → Rd and σ : [0, ∞) × Rd → Rd×m are given. Notice that this is a special case of the set-up of the last section, as now (with an abuse notation) rt (ω) = r(t, St (ω)), µt (ω) = µ(t, St (ω)), and σt (ω) = σ(t, St (ω)). In this special situation, the asset prices (St )t≥0 are a d-dimensional Markov process. 80

The next theorem says how to find a replicating strategy for a contingent claim maturing at time T with payout ξT = g(ST ) for some non-random function g : Rd → [0, ∞). Theorem. Suppose the function V : [0, T ] × Rd → [0, ∞) satisfies the partial differential equation d

d

d

X ∂V ∂ 2V ∂V 1 XX + rS i i + ai,j S i S j i j = rV ∂t ∂S 2 i=1 j=1 ∂S ∂S i=1 V (T, S) = g(S) where a = σσ > , and where all functions in the PDE are evaluated at the same point (t, S) ∈ [0, T ) × Rd . Then there exists an-admissible strategy H such that Ht · Pt = V (t, St ). In particular, this strategy replicates the contingent claim with payout g(ST ). Furthermore, if H = (φ, π) then the strategy can be calculated as   ∂V ∂V (t, St ), . . . , d (t, St ) . πt = grad V (t, St ) = ∂S 1 ∂S and φt =

V (t, St ) − πt · St . Bt

The above theorem says that if the market model is Markovian, the price (i.e. replication cost) of a claim contingent on the future risky asset prices can be written as a deterministic function V of the current market prices. Furthermore, the pricing function V can be found by solving a certain linear parabolic partial differential equation2 with terminal data to match the payout of the claim. Solving this equation may be difficult to do by hand, but it can usually be done by computer if the dimension d is reasonably small. And most importantly for the banker selling such a contingent claim: the replicating portfolio πt can be calculated as the gradient of the pricing function V with respect to the spatial variables, evaluated at time t and current price St . Proof. By Itˆo’s formula we have X ∂V ∂V 1 X ∂ 2V i dV (t, St ) = dt + dSt + dhS i , S j it i i j ∂t ∂S 2 i,j ∂S ∂S i ! X ∂V ∂V 1 X ∂ 2V i j ij = + S S a dt + dSti i ∂t 2 i,j ∂S i ∂S j ∂S i ! X ∂V X ∂V =r V − S i i dt + + dSti i ∂S ∂S i i 2sometimes

called the Feynman–Kac PDE. If r = 0, the PDE reduces to the (backward) Kolmogorov

equation. 81

where we have used the assumption that V solves a certain PDE to go from the second to third line above. Now letting φ and π be as in the statement of the theorem we have that V (t, St ) = φt Bt + πt · St dV (t, St ) = φt dBt + πt · dSt . Hence H = (φ, π) is a self-financing strategy with associated wealth process Xt (H) = V (t, St ) as claimed. It is admissible since V ≥ 0 by assumption.  We have seen that there are two distinct ways to find replication costs for certain contingent claims: by computing expectations or by solving a PDE. Furthermore, the PDE method also gives the replicating portfolio. But how do you solve the PDE? In many cases, the easiest way to solve the PDE is to compute the expectations. This is illustrated by the Black–Scholes model: Example (Black–Scholes continued). Let’s return to the Black–Scholes model dBt = Bt rdt dSt = St (µdt + σdWt ) with constant coefficients r, σ, µ. If we would like to replicate a claim with payout g(ST ), the previous theorem says we should solve the Black–Scholes PDE ∂V ∂V 1 ∂ 2V + rS + σ 2 S 2 2 = rV ∂t ∂S 2 ∂S V (T, S) = g(S) Now, let’s specialise to the case of the call option where g(S) = (S − K)+ . From last section we have   √ log(K/S) + (r/σ + σ/2) T − t V (t, S) =SΦ − √ σ T −t   √ log(K/S) −r(T −t) − Ke Φ − √ + (r/σ − σ/2) T − t . σ T −t The delta, i.e. the replicating portfolio, in this case is (by a miracle of algebra)   √ log(K/S) ∂V (t, S) = Φ − √ + (r/σ + σ/2) T − t . ∂S σ T −t Note that an agent attempting to replicate a call option using the Black–Scholes theory will always hold a fraction of shares of the underlying stock between 0 and 1. Also note that since the sensitivity of the portfolio to the price of the underlying, is given by the formula   √ ∂ 2V 1 log(K/S) √ (t, S) = φ − √ + (r/σ + σ/2) T − t ∂S 2 Sσ T − t σ T −t 2

where φ(x) = √12π e−x /2 . Since the gamma is always positive, the hedger will buy more shares of the underlying if the price goes up. 82

8. Black–Scholes volatility What made the Black–Scholes formula so popular after its publication in 1973 is the fact that the right-hand-side depends only on six quantities: the current calendar time t, the option’s maturity time T , the option’s strike K, the spot interest rate r, the underlying stock’s price St at time t, and a volatility parameter σ. Of these six numbers, only the volatility parameter is neither specified by the option contract nor quoted in the market. To use the Black–Scholes formula to find the price of real call options, one must first estimate the volatility σ. 8.1. Estimation: statistics. In the Black–Scholes model, the drift µ and volatility σ are not directly observable. Nevertheless, they can be estimated by appealing to standard statistical theory. Suppose that we have observed the stock price (St )−T ≤t≤0  . If we sample at Sti are independent times ti = (i/n − 1)T , we see that the n random variables Yi = log St i−1 with distribution Yi = (µ − σ 2 /2)(ti − ti−1 ) + σ(Wti − Wti−1 ) ∼ N (aT /n, σ 2 T /n) where a = µ − σ 2 /2. The maximum likelihood estimator of a is n 1X a ˆ= Yi T i=1 and of σ 2 is

n 1X 2 ˆ σ = (Yi − a ˆT /n)2 . T i=1

Notice that this estimator a ˆ can be rewritten as 1 a ˆ = log(S0 /S−T ), T and hence does not depend on n! That is to say, there is no advantage going to ever higher and higher frequency data to estimate the drift µ. Fortunately, a careful reading of the previous section shows that the drift parameter µ is not needed to find either replication cost or the replicating strategy. This is good news for the Black–Scholes theory.3 On the other hand, the variance of σ ˆ 2 is 2σ 4 /n → 0 as n → ∞. Hence, there is some hope of accurately estimating the volatility parameter by sampling the historical stock prices regularly enough. If one was to truly believe that the stock price was a geometric Brownian motion, that is, of the form St = S0 eat+σWt , then one could insert the value σˆ2 into the Black–Scholes formula to obtain the price of a call option. Notice that we have done the statistics under the objective measure P, not the equivalent martingale measure Q. 3However,

this is very bad news for optimal investment. For instance, consider the problem of maximising E log(XT ) over all admissible trading strategies. It turns out that the optimal fraction of wealth to hold in the stock is given by πt∗ St µ−r = . ∗ Xt σ2 However, this formula is useless unless the parameters on the right-hand side can be estimated accurately. 83

8.2. Calibration: implied volatility. A completely different approach to find the volatility parameter is to observe the prices of contingent claims from the market, and then try to work out which σ to put into the Black–Scholes formula to get the right price. The Black–Scholes formula says that in the context of a Black–Scholes model the call price is given by Ct (T, K) = C BS (t, T, K, St , r, σ)

(*)

for an explicit function C BS written out the previous section. But in reality we do not know σ but can observe the call prices. Therefore, rather than compute the call price from the parameters, we turn the story around by defining the implied volatility of the option to be unique number σ such that equation (*) holds. We denote by Σt (T, K) the implied volatility of at time t of an option with maturity T and strike K. If the market was still pricing call options by Black–Scholes formula, then there would exist one parameter σ such that Σt (T, K) = σ for all 0 ≤ t < T and K > 0. However, in realworld markets, is is usually the case that the implied volatility surface (T, K) 7→ Σt (T, K) is not flat. Indeed, for fixed T , the graph of the function K 7→ Σt (T, K) often resembles a convex parabola4 at least for strikes K close to the money, i.e. such that Ke−r(T −t) /St ≈ 1. That is why practictioner refer to the function K 7→ Σt (T, K) as the implied volatility smile or smirk. One could either conclude Black–Scholes model is the true model of the stock price and that the market is mispricing options, or that the Black–Scholes model does not quite match reality. The second approach is more prudent. Then, why even consider implied volatility? As Rebonato famously put it: Implied volatility is the wrong number to put into wrong formula to obtain the correct price. However, thanks to the enormous influence of the Black–Scholes theory, the implied volatility is now used as a common language to quote option prices. 8.3. Robustness of Black–Scholes - NOT LECTURED!. As argued above, since real markets tend to exhibit implied volatility smiles, the Black–Scholes model cannot be considered an adequate description of how stock prices fluctuate. However, it should be considered an approximation of reality, and we will now do a calculation to see how to quantify how good this approximation is. Suppose a banker wants to sell a contingent claim with payout ξT = g(ST ). The banker believes that the underlying stocks prices are given by the Black–Scholes model, so that the initial price of the claim is given by V (0, S0 , σ) where Z ∞ −z 2 /2 √ −r(T −t) (r−σ 2 /2)(T −t)+σ T −tz e V (t, S, σ) = e g(S0 e ) √ dz, 2π −∞ for some σ to be determined. Now, the claim is already traded on the market with initial price ξ0 , so the banker chooses a σ = σ ˆ such that V (0, S0 , σ ˆ ) = ξ0 , i.e. σ ˆ is the initial implied volatility for the claim. Now, the banker wants to hedge away the liability associated with the payout of the claim, so again believing the Black–Scholes theory, he puts the initial wealth of X0 = ξ0 in 4...but

be careful: for large K, the graph can grow no faster than 84

p

2 log K/(T − t). See example sheet 4.

his account and holds a portfolio of ∂V (t, St , σ ˆ) ∂S shares of the stock at all times. His wealth then evolves as πt =

(*)

dXt = r(Xt − πt St )dt + πt dSt .

The banker knows that according to the Black–Scholes theory his strategy should replicate the claim XT = g(ST ) a.s. Suppose that the true dynamics of the market are given by dBt = Bt rdt dSt = St (µdt + σt dWt ) where r and µ are the same constants as before, but now (σt )t≥0 is some predictable process. How big is the hedging error XT − g(ST )? First note that V solves the Black–Scholes PDE: ∂V ∂V 1 2 2 ∂ 2V + rS + σ ˆ S = rV. ∂t ∂S 2 ∂S 2 V (T, S) = g(S). Now note that by Itˆo’s formula and the Black–Scholes PDE ∂V ∂V 1 ∂ 2V dV (t, St , σ ˆ) = dt + dSt + dhSit ∂t ∂S 2 ∂S 2 1 ∂ 2V = rV dt + πt (dSt − rSt dt) + St2 (σt2 − σ ˆ 2 ) 2 dt 2 ∂S Subtracting this equation from equation (*) and solving yields Z 1 T r(T −t) 2 ∂ 2V e (ˆ σ − σt2 )St2 2 (t, St , σ ˆ )dt = XT − V (T, ST , σ ˆ ) − erT (X0 − V (0, S0 , σ ˆ )) 2 0 ∂S = XT − g(ST ) since X0 = ξ0 by assumption and ξ0 = V (0, S0 , σ ˆ ) by the definition of σ ˆ. The above formula show that the naive Black–Scholes hedger does reasonably well in a world where the implied volatility is close to the actual spot volatility. For many claims, 2 such as call options, the gamma ∂∂SV2 is positive. Therefore, the naive hedgers strategy may fall short if the implied volatility is smaller than the realised spot volatility. 9. Local volatility models In the previous section we have considered the Black–Scholes model–a two asset market model in which the risky asset price is a geometric Brownian motion. The Black–Scholes formula gives an explicit representation of the prices Ct (T, K) of call options in this model in terms of the calendar time t, the current stock price St , spot interest rate r, the option maturity T and strike K, and a volatility parameter σ. 85

However, since the implied volatility surface Σt (T, K) of real-world option prices is usually not flat, practitioners and researchers have proposed various generalisations of the Black– Scholes model to better match the observed implied volatility surface. We now consider another Markovian model which can match a given implied volatility surface exactly. We consider a model given by dBt = Bt r dt dSt = St (µ dt + σ(t, St )dWt ). That is, the idea is replace the constant volatility parameter in Black–Scholes model with a local volatility function σ : [0, ∞) × (0, ∞) → (0, ∞). We will assume that σ is smooth and bounded from below and above. As always, let Q be the equivalent martingale measures with density RT R 1 T 2 dQ = e− 2 0 λs ds− 0 λs dWs dP ˆ t = dWt − λt dt defines a where λt = (µ − r)/σ(t, St ). Recall that by Girsanov’s theorem dW Q-Brownian motion. The next theorem in the present context is usually attributed to Dupire’s 1994 paper. Theorem. Suppose that C0 (T, K) = EQ [e−rT (ST − K)+ ] Then

∂C0 ∂C0 σ(T, K)2 2 ∂ 2 C0 (T, K) + rK (T, K) = K (T, K). ∂T ∂K 2 ∂K 2 Remark. We have already seen a PDE for the replication cost of options in Markovian models. In that PDE, the solution V (t, St ) was the time-t value of a replication strategy for the given claim, and the derivatives were respect to the calendar time t and the current price of the underlying asset St . In contrast, Dupire’s PDE is for the initial replication cost of a call option, and the derivatives are with respect to the maturity date T and the strike K.

Remark. The point of the above theorem is this: Suppose you believe that the stock price is generated by a local volatility model, but you do not know what the local volatility function is. If you can observe today’s call price surface {C0 (T, K) : T > 0, K > 0} then you can solve for the local volatility in Dupire’s PDE to arrive at Dupire’s formula !1/2 ∂C0 0 2[ ∂C (T, K) + rK (T, K)] ∂T ∂K σ(T, K) = . ∂ 2 C0 2 K ∂K 2 (T, K) Furthermore, assuming Dupire’s PDE has a unique solution (it will if σ is smooth and bounded as assumed) then we have found a model that can reproduce the observed call prices. Of course, plugging the call surface C0 (T, K) = C BS (t = 0, T, K, S0 , r, σ0 ) into Dupire’s formula yields !1/2 ∂C0 0 2[ ∂C (T, K) + rK (T, K)] ∂T ∂K = σ0 , ∂ 2 C0 2 K ∂K 2 (T, K) as it should. In general, however, the local volatility surface need not be flat. 86

Sketch of proof of Dupire’s formula. To outline the argument, we proceed formally Z T Z 1 T + + (ST − K) = (S0 − K) + 1{St ≥K} dSt + δK (St )dhSit 2 0 0  Z T 1 2 2 + 1{St ≥K} St r + δK (St )St σ(t, St ) dt = (S0 − K) + 2 0 Z T ˆt + 1{St ≥K} St σ(t, St )dW 0

where we have appealed to Itˆo’s formula5 with g(x) = (x − K)+ , g 0 (x) = 1[K,∞) (x), and g 00 (x) = δK (x), the Dirac delta ‘function’. Now, by the assumption of smoothness and the bounds on the volatility function, the Q-law of the random variable ST has a density function fT . Computing expected values of both sides Z TZ ∞ Z 1 T rT + ft (y)y r dy dt + (1) e C0 (T, K) = (S0 − K) + ft (K)K 2 σ(t, K)2 dt 2 0 0 K and then differentiating both sides with respect to T yields   Z ∞ 1 ∂C0 rT (T, K) + rC0 (T, K) = fT (y)y r dy + fT (K)K 2 σ(T, K)2 . e ∂T 2 K Now we use the following the Breeden–Litzenberger identities Z ∞ Z ∞ rT e C0 (T, K) = fT (y)y dy − K fT (y) dy K K Z ∞ rT ∂C0 fT (y) dy (T, K) = − e ∂K K ∂ 2 C0 erT (T, K) = fT (K) ∂K 2 to finish the argument.



9.1. Moment generating functions for stochastic volatility. In this section we revisit the computation of call prices as a Fourier integral. To apply this technique, we need to be able to compute the moment generating function for some interesting models. We first consider a general stochastic volatility model: dBt = Bt rdt dSt = St (rdt +

√ vt dWtS )

dvt = A(vt )dt + B(vt )dWtv Here W S and W v are assumed to be correlated Brownian motions in a fixed equivalent martingale measure Q, with correlation ρ. Correlated Brownian motions can be constructed, 5A

version of Itˆ o’s formula for non-smooth convex functions, called Tanaka’s formula, can actually be rigorously stated in terms of a quantity called local time. 87

for instance, by letting W v and W ⊥ be independent Brownian motions and let p WtS = ρWtv + 1 − ρ2 Wt⊥ . Theorem. For each θ ∈ Θ, let F (·, ·; θ) solve the PDE √ 1 2 ∂ 2F ∂F ∂F 2 + [θr + (θ − θ)v/2]F + (A + θ vBρ) + B =0 ∂t ∂v 2 ∂v 2 with boundary condition F (T, v; θ) = 1 then (Mt )0≤t≤T is a local martingale where Mt = eθ log St F (t, v; θ). Proof. This is just another application of Itˆo’s formula.



The significance of this result is that if we can prove the local martingale is a true martingale, then E[eθ log ST ] = eθ log S0 F (0, v0 ; θ) and hence we have found the moment generating function. To use this result, we need to solve a PDE in one spacial variable. Since the PDE for the option prices would involve two spacial variables (v, S), we are in a better position finding the moment generating function first via the above theorem, though we still need to evaluate the Bromwich integral. 9.2. The Heston model. We now explore a model where the moment generating function can be computed explicitly. It was introduced by Heston in 1993: dBt = Bt rdt



vt dWtS ) √ dvt = λ(¯ v − vt )dt + c vt dWtv

dSt = St (rdt +

with hW S , W v i = ρt. This is just a special case of the stochastic volatility model in the √ previous subsection with A(v) = λ(¯ v − v) and B(v) = γ v for some positive constants λ, v¯, γ. In this model the squared volatility v is a mean-reverting process , i.e. an ergodic Markov process, at least under Q. The interpretation of v¯ is the level of mean reversion, while λ is the speed of mean reversion. We will come across the stochastic process in the context of the Cox–Ingersoll–Ross rate model. It was first studied by Feller in the 1950s. The Heston PDE is then ∂F ∂F 1 ∂ 2F + [θr + (θ2 − θ)v/2]F + [λ¯ v + (θcρ − λ)v] + c2 v 2 = 0. ∂t ∂v 2 ∂v It turns out that this PDE can be solved explicitly. The trick is to make the ansatz F (t, v; θ) = eR(T −t;θ)v+Q(T −t;θ) . Note that the boundary condition F (T, v; θ) = 1 force R(0; θ) = Q(0; θ) = 0. The PDE becomes 1 ˙ − Q˙ + [θr + (θ2 − θ)v/2] + [λ¯ −Rv v + (θcρ − λ)v]R + c2 vR2 = 0, 2 88

where the dot indicates differentiation with respect to the time variable. Notice that the equation can be written in the form α(T − t; θ)v + β(T − t; θ) = 0. Now, the above equation should hold for all v so α(T − t; θ) = 0 = β(T − t; θ), i.e 1 R˙ = (θ2 − θ)/2 + (θcρ − λ)R + c2 R2 2 ˙ Q = θr + λ¯ v R. The equation for R is a Riccati equation which can be solved explicitly. In fact, we do not even have to make any tricky substitutions, separation of variables and partial fractions work well enough: 1 R˙ = c2 (R − R+ )(R − R− ) 2 1 R˙ = c2 ⇒ (R − R+ )(R − R− ) 2   1 1 1 1 ⇒ R˙ = c2 − R+ − R− R − R+ R − R− 2   1 − R(τ )/R+ ⇒ log = γτ 1 − R(τ )/R− ⇒ R(τ ; θ) = (θ2 − θ)

eγ(θ)τ − 1 (γ(θ) − θcρ + λ)eγ(θ)τ + (γ(θ) + θcρ − λ)

p where γ(θ) = (λ − θcρ)2 − (θ2 − θ)c2 and R± (θ) = [(λ − θcρ)2 ± γ(θ)]/c2 . And the second equation can be solved Z τ Q(τ ; θ) = θrτ + λ¯ v R(s; θ)ds 0     (θ2 − θ)λ¯ v (γ(θ) − θcρ + λ)eγ(θ)τ + (γ(θ) + θcρ − λ) 2λ¯ v = θr + τ − 2 log γ(θ) + θcρ − λ c 2γ(θ) It can be shown that for θ ∈ Θ that EQ (eθ log ST ) = eθ log S0 +R(T ;θ)v0 +Q(T ;θ) . What is the point of this calculation? Although the formula for the moment generating function is hard to call beautiful, it is very explicit. In particular, given the set of model parameters (v0 , λ, v¯, c), the function can be evaluated very quickly on a computer, and hence the Bromwich integral for call prices can be computed numerically quickly. Hence, it is possible to calibrate the Heston model to market data in a reasonable amount of time. This is one of the main reasons for its popularity. 10. American claims in local volatility models - NOT LECTURED! This section is not examinable, but you might find it interesting. We have previously considered American claims in a general complete market in discrete time. Our main tool was the Snell envelope. In this section we we will consider American 89

claims in a a continuous time model. We will make a Markovian assumption so that PDE techniques are available. We work in a market with d + 1 assets, a bank account with dynamics dBt = Bt rdt and d stocks with dynamics dSt = diag(St )(µ dt + σ(t, St )dWt ) where r ≥ 0 and µ ∈ Rd are constants, σ : [0, ∞) × Rd → Rd×m is a given function and W is an m-dimensional Brownian motion. Consider the problem faced by a banker who has sold an American contingent claim with maturity T which pays g(St ) if the claim is exercised at time t ∈ [0, T ], and who wishes to trade in the underlying market to hedge his exposure to the optimally exercised claim. The main result is this: Theorem. Let L be the differential operator defined by LV =

d X i=1

d

d

2 ∂V 1 XX i j ∂ V rS + ai,j S S − rV ∂S i 2 i=1 j=1 ∂S i ∂S j i

where a = σσ > . Suppose V : [0, T ] × Rd → [0, ∞) solves the variational inequality   ∂V + LV, g − V = 0 max ∂t V (T, S) = g(S). Let X be the wealth process started with X0 = V (0, S0 ) and with πt = gradV (t, St ) shares of stock at time t. Then Xt ≥ g(St ) for all t ∈ [0, T ], and there exists a stopping time τ∗ such that Xτ∗ = g(Sτ∗ ). Remark. To see where this variational inequality comes from, let’s consider heuristically the Snell envelope of Y ξ which should satisfy an equation like Zt = max{Yt ξt , E[Zt+δ |Ft ]} where δ > 0 is a small increment of time, the process ξ specifies the payout of the claim and Y is the state price density. First, let U = Z/Y and let Q the equivalent martingale measure corresponding to Y and the num´eraire B. Then U should satisfy Ut = max{ξt , EQ [e−rδ Ut+δ |Ft ]}. Now, since ξt = g(St ) and S is a Markov process under Q, we suspect that Ut = V (t, St ) for some function V . By Itˆo’s formula Z t+δ Z t+δ ∂V −rδ −rs ˆs (s, Ss )dW e V (t + δ, St+δ ) = V (t, St ) + e LV (s, Ss )ds + e−rs ∂S t t ˆ is a Q-Brownian motion. Assuming the stochastic integral is mean-zero, we have where W  Z t+δ  Q −rs V (t, S) = max g(S), V (t, S) + E e LV (s, Ss )ds|St = S . t

90

Subtracting V (t, S) from both sides and sending δ ↓ 0 yields the variational inequality appearing in the theorem. Remark. It should not be too surprising that the differential operator L also appeared in our discussion of hedging European contingent claims. Proof. Let X be the wealth process. As usual we have dXt = rXt dt + πt · (dSt − rSt dt). Also by Itˆo’s formula we have d

dV (t, St ) =

d

∂V 1 XX ∂ 2V + ai,j S i S j i j ∂t 2 i=1 j=1 ∂S ∂S

! dt + πt · dSt

and hence by subtraction (*)

d(Xt − V (t, St )) = [r(Xt − V (t, St )) − LV (t, St )]dt

Since X0 = V (0, S0 ) and LV ≤ 0 by assumption, equation (*) implies Xt ≥ V (t, St ) for all t ∈ [0, T ]. The fact that Xt ≥ g(St ) follows from the assumption that g − V ≤ 0. Finally, let τ∗ = inf{t ≥ 0 : V (t, St ) = g(St )}. We will show that Xτ∗ = V (τ∗ , Sτ∗ ) = g(Sτ∗ ). Note that {τ∗ = 0} we have Xτ∗ = X0 = V (0, S0 ) = V (τ∗ , Sτ∗ ) by assumption. Also, on {τ∗ > 0} note that on the interval t ∈ [0, τ ∗], we have LV (t, St ) = 0 and hence by equation (*) we have Xt = V (t, St ).  Remark. We can identify two subsets of [0, T ] × Rd as follows S = {(t, S) : g(S) − V (t, S) = 0} C = {(t, S) : g(S) − V (t, S) < 0} The set S is called the stopping region since if (t, St ) ∈ S it is optimal (from the point of view of the buyer) to exercise the American claim. Similarly, the set C is called the continuation region, since if (t, St ) ∈ C it is optimal to wait. See the figure. 91

In general, it is impossible to find an explicit solution to the American options PDE. However, there is one case6 where all the calculations can be done. Example (Infinite horizon put in the Black–Scholes model). Let d = 1 and assume σ > 0 is constant. That is, the stock price is given by the Black–Scholes model. We will study the American option PDE in the case of the infinite horizon put, where g(S) = (K − S)+ and T = ∞. Since (St )t≥0 is a time-homogeneous process, we can restrict ourselves to functions V that depend on S but not on t. Also, since the payout function is decreasing, we guess that there exists some q ∈ (0, K) such that we should continue if S > q and stop if S < q. Hence we guess that V (S) = K − S if S < q 1 2 2 00 (C) σ S V + rSV 0 − rV = 0 if S > q 2 By the usual techniques of ODEs, the general solution to equation (C) is given by

(S)

V (S) = A0 S + A1 S −a for some constants A0 and A1 , where a = 2r/σ 2 . Since we expect V (S) → 0 as S → ∞, we can conclude that A0 = 0. It remains to solve for the constants q and A1 . Since we expect V to be continuous at S = q, we have A1 q −a = K − q. To find another equation, we assume the smooth pasting condition that the derivative V 0 is continuous at S = q: −aA1 q −(a+1) = −1. From this, we have K a+1 aa aK and A1 = . q= a+1 (a + 1)a+1 Finally, we must check that this candidate function V satisfies the variational PDE...

6going

back at least to H. McKean. Appendix: A free boundary problem for the heat equation arising from a problem in mathematical economics. Industrial Management Review 6: 32-39. (1965) 92

CHAPTER 5

Interest rate models 1. Bond prices and interest rates In this last chapter, we explore models for the interest rate term structure. The basic financial instruments in this setting are the zero-coupon bonds. Definition. A (zero-coupon) bond with maturity T is a European contingent claim that pays exactly1 one unit of currency at time T . We denote by P (t, T ) the price at time t ∈ [0, T ] of the bond. To get a feel for how we should model the bond prices, note that a typical sample path t 7→ P (t, T ) of a zero-coupon bond price will look similar to the sample path of any other asset price. However, note that at maturity the bond is worth its principal value, so P (T, T ) = 1. On the other hand, since people prefer to be paid sooner rather than later, the map T 7→ P (t, T ) is usually decreasing. Of course, there are only a finite number of maturities of bonds traded on the the fixed income market. But since this number is very large, it is common practice to represent the zero-coupon bond prices as a continuous curve, rather than a discrete set of points.

Rather than speak of bond prices, it is often easier to speak of interest rates. A popular interest rate is the yield y(t, T ) at time t of a bond maturing at time T defined by the 1We

assume that the bond issuer is absolutely credit worthy, and there is exactly zero probability of default. Therefore, we are not discussing corporate bonds, mortgage-backed securities or the debt of some countries (for instance, Russia famously defaulted in 1998). In fact, there is probably no real-world example of a perfectly risk-free bond. Nevertheless, many practictioners probably still regard U.S. Treasury bonds, which are backed by the ‘full faith and credit’ of the U.S. government, as virtually risk-free. Though with the current political situation in Washington, this may well change. 93

formula

1 log P (t, T ). T −t For us, a more useful interest rate is the forward rate f (t, T ) at time t for maturity T , defined by ∂ log P (t, T ). f (t, T ) = − ∂T The yield curve, the forward rate curve and the bond price curve contain the same information, since RT P (t, T ) = e−(T −t) y(t,T ) = e− t f (t,s)ds y(t, T ) = −

The term structure of interest rates refers the function T 7→ P (t, T ), or equivalently, the price data encoded in either of the functions T 7→ y(t, T ) or T 7→ f (t, T ). There are at least two perspectives to bond price modelling. One is to assume that bonds are derivative securities, where the underlying asset is a bank or money market account. A complementary perspective is to consider the bonds as fundamental and the bank account as a derivative asset (see example sheet 1). We will mostly explore the first perspective in this chapter, but will return to the second perspective in our study of HJM models. 2. Bank accounts to bond prices and interest rates Adopting the first perspective mentioned above, we assume that there is a num´eraire asset, the bank account, with price dynamics dBt = Bt rt dt where the process r = (rt )t≥0 is called the spot interest rate or the short interest rate. Of course, the above differential equation has the solution Rt

Bt = B0 e

rs ds

0

.

Now, we formulate a condition so that for any collection of maturities T1 < . . . < Td , the market (Bt , P (t, T1 ), . . . , P (t, Td ))t∈[0,T1 ] has no arbitrage. Theorem. There is no arbitrage relative to the num´eraire if there exists an equivalent measure Q such that the discounted bond price process (P (t, T )/Bt )t∈[0,T ] is a local martingale for all T > 0. In particular, there is no arbitrage if P (t, T ) = EQ (e−

RT t

rs ds

|Ft )

for all 0 ≤ t ≤ T . Notice that if P (t, T ) = EQ (e−

RT t

rs ds

|Ft ),

and r is suitably well-behaved, we can differentiate the bond price with respect to maturity to recover the forward rate: f (t, T ) =

EQ (rT e− EQ (e− 94

RT t

RT t

rs ds

rs ds

|Ft )

|Ft )

.

Notice that lim f (t, T ) = rt T ↓t

so the short rate is the left-hand end point of the forward rate curve. (The long rate limT ↑∞ f (t, T ) is the far right-hand end of the curve.) From common experience, it seems that we should like to model the interest rate (rt )t≥0 as a non-negative process. Indeed, if rt ≥ 0 for all t ≥ 0 then the map T 7→ P (t, T ) is decreasing. However, for the sake of tractablility, this modelling requirement is frequently dropped. 3. Short rate models We begin with a market that has just the bank account B. We will consider an Itˆo process short interest rate model of the form drt = at dt + βt dWt for adapted process (at )t≥0 and (bt )t≥0 , and a Brownian motion (Wt )t∈R+ for P. Note that while in a complete stock market model there was only one equivalent martingale measure, no such choice is possible since the short rate is not traded. However, we know that there is no arbitrage if the market somehow picks an equivalent martingale measure Q to price the bonds. We will assume that the market price of risk is given by the process (λt )t≥0 so that ˆt drt = αt dt + βt dW ˆ t = dWt + λt dt defines a Brownian motion for the measure Q whose martingale where dW density process M is given by dMt = −Mt λt dWt , and where αt = at − βt λt defines the risk-neutral drift. Since we are interested in pricing and hedging, there is no need to model the processes (at )t≥0 and (λt )t≥0 separately. However, we must be careful to realize that is impossible to estimate the distribution of the random variable αt directly from a time series rt1 , . . . , rtn . 3.1. Vasicek model. In 1977, Vasicek proposed the following model for the short rate: ˆt drt = λ(¯ r − rt )dt + σdW for a parameter r¯ > 0 interpreted as a mean short rate, a mean-reversion parameter λ > 0, and a volatility parameter σ > 0. This stochastic differential equation can be solved explicitly to yield Z t −λt −λt ˆ s. rt = e r0 + (1 − e )¯ r+ e−λ(t−s) σdW 0

Note that the short interest rate in the Vasicek model follows an Ornstein–Uhlenbeck process, and in particular, that for each t ≥ 0 the random variable rt is Gaussian under the measure Q with Z t σ2 Q Q −λt −λt E (rt ) = e r0 + (1 − e )¯ r and Var (rt ) = e−2λ(t−s) σ 2 ds = (1 − e−2λt ). 2λ 0 95

Moreover,  one  can show that the process is ergodic and converges to the invariant distriσ2 bution N r¯, 2λ . In particular, we have Z 1 T rs ds → r¯ Q − almost surely. T 0 Please note, however, that in the present framework we can say absolutely nothing about the distribution of rt for the objective measure P, unless we have a model for the market price of risk. Since the short rate rt is Gaussian, the advantage of this type of model is that it is relatively easy to compute prices, for instance of bonds, explicitly. A disadvantage of this model is that there is a chance that rt < 0 for some time t > 0. Recall that a normal random variable can take any real value, both positive and negative. However, for sensible parameter values, the Q-probabilty of the event {rt < 0} is pretty small. We have learned from example sheet 3 that Z TZ t Z T Z T −λt −λt ˆ s dt e−λ(t−s) σdW [e r0 + (1 − e )¯ r]dt + rt dt = 0 0 0 0  Z T Z T Z T −λ(t−s) −λt −λt ˆs e dt σdW [e r0 + (1 − e )¯ r]dt + = 0 s 0  Z T Z σ2 T −λt −λt −λt 2 (1 − e ) dt [e r0 + (1 − e )¯ r]dt, 2 ∼N λ 0 0 under Q, so that, using the moment generating function of a Gaussian random variable we have RT

P (0, T ) = EQ [e− 0 rt dt ]  Z T   σ2 −λt −λt −λt 2 = exp − e r0 + (1 − e )¯ r − 2 (1 − e ) dt 2λ 0 so that

σ2 (1 − e−λt )2 2 2λ By the time-homogeneity of the Vasicek model, we can actually deduce the formula f (0, T ) = e−λt r0 + (1 − e−λt )¯ r−

f (t, t + x) = rt e

−λx

−λx

+ r¯(1 − e

σ2 ) − 2 (1 − e−λx )2 2λ

This formula says that for the Vasicek model, the forward rates at time t are an affine function of the short rate at time t. (An affine function is of the form g(x) = ax + b, that is, its graph is a line.) 4. Markovian short rate models We now study the case when the short rate is Markovian. Assume that ˆt drt = α(t, rt )dt + β(t, rt )dW for some non-random functions α : R+ × R → R and β : R+ × R → R. 96

As we have learned for Markovian stock models, the price of contingent claims can be expressed in terms the solution of a PDE: Theorem. Fix T > 0 and suppose V : [0, T ] × R → R satisfies the PDE ∂ 2V ∂V 1 ∂V (t, r) + α(t, r) (t, r) + β(t, r)2 2 (t, r) = rV (t, r) ∂t ∂r 2 ∂r V (T, r) = 1 Suppose P (t, T ) = V (t, rt ). Then the discounted price process e−

Rt 0

rs ds

P (t, T )

is a Q-local martingale. Proof. Itˆo’s formula implies   Rt Rt d e− 0 rs ds V (t, rt ) = −rt e− 0 rs ds V (t, rt )dt   R ∂V ∂V 1 ∂ 2V − 0t rs ds +e (t, rt ) dt + (t, rt )drt + (t, rt )dhrit ∂t ∂r 2 ∂r2  R ∂V 1 ∂ 2V − 0t rs ds ∂V (t, rt ) + α(t, rt ) (t, rt ) + β(t, rt )2 2 (t, rt ) = e ∂t ∂r 2 ∂r  Rt ∂V ˆt −rt V (t, rt ) dt + e− 0 rs ds (t, rt )β(t, rt )dW ∂r Since the drift vanishes by assumption, so (P (t, T )/Bt )t∈[0,T ] is a local martingale.  Remark. In the proof of the preceding theorem, notice that we can only conclude that R − 0t rs ds Mt = e P (t, T ) is a local martingale since we are using Itˆo’s formula. When is it a true martingale? Here is a sufficient condition. Suppose that we can show that rt ≥ 0 and that 0 ≤ P (t, T ) ≤ 1 for all t ≥ 0. In this case, we would have 0 ≤ Mt ≤ 1 and hence M is a true martingale (recall that bounded local martingales are true martingales). In particular, we have the formula  RT  Q − t rs ds P (t, T ) = E e |Ft . 4.1. Cox–Ingersoll-Ross model. In 1985, Cox, Ingersoll, and Ross proposed the following model for the short rate: √ ˆt drt = λ(¯ r − rt ) + σ rt dW for a parameter r¯ > 0 interpreted as a mean short rate, a mean-reversion parameter λ > 0, and a volatility parameter σ > 0. The process (rt )t∈R+ satisfying the above stochastic differential equation is often called a square-root diffusion or CIR process, though this stochastic process was studied as early as 1951 by Feller. This process was also used by Heston to model the spot volatility process in an equity market. Althought the CIR stochastic differential equation cannot be solved explicitly, one can say quite a lot about this process. For instance, one can show that the process is ergodic and its invariant disribution is a gamma distribution with mean r¯. An advantage of this model over the Vasicek model is that the short rate rt is non-negative for all t ≥ 0. Furthermore, explicit formula are still available for the bond prices. 97

We can also use the above theorem to compute bond prices. Indeed, fix T > 0 and consider the PDE ∂V 1 ∂ 2V ∂V (t, r) + λ(¯ r − r) (t, r) + σ 2 r 2 (t, r) = rV (t, r) ∂t ∂r 2 ∂r V (T, r) = 1. As we did in the Heston model, we can make the ansatz V (t, r) = erR(T −t)+Q(T −t) for some functions A and B which satisfy the boundary conditions R(0) = Q(0) = 0. Substituting this into the PDE yields ˙ − Q) ˙ + λ(¯ (−Rr r − r)R +

σ2 2 rR = r 2

This time we have σ2 R˙ = −λR + R2 − 1 2 ˙ Q = λ¯ rR. The equation for R is a Riccati equation, whose solution is 2(eγτ − 1) R(τ ) = − (γ + λ)eγτ + (γ − λ) Z τ Q(τ ) = λ¯ rR(s)ds 0



where γ = λ2 + 2σ 2 . The bond prices are too messy to write down, but the forward rates are given by f (t, t + x) =

4γ 2 eγx 2λ¯ r(eγx − 1) r + . t [(γ + λ)eγx + (γ − λ)]2 (γ + λ)eγx + (γ − λ)

In particular, the forward rates for the CIR model are again given by an affine function of the short rate. 5. The Heath–Jarrow–Morton framework Starting from a short-rate model, the derived bond prices are necessarily Itˆo processes. There is no arbitrage in a factor model since, by construction, there exists an equivalent martingale measure Q such that all discounted bond prices (P (t, T )/Bt )t∈[0,T ] are local martingales. The insight of Heath, Jarrow, and Morton in 1992 was that we can change perspectives by modelling the bond prices directly. Motivation. Indeed, suppose we start out with just the bond market, but without the bank account. We can construct the bank account by considering an investor holding his wealth in just-maturing bonds. More concretely, suppose at time 0 the investor has B0 units of wealth. Fix a sequence 0 ≤ t0 < t1 < . . . of times and suppose that during the interval (ti−1 , ti ] the investor holds all of his wealth in the bond which matures at time ti . If the 98

investor’s wealth at time t is denoted by Bt , and the number of shares of the just-maturing bond by πt , the budget constraint is Bti−1 = πti P (ti−1 , ti ) and the self-financing condition is Bti = πti since P (t, t) = 1 for all t. Hence, the rate of change of the wealth is given by Bti − Bti−1 ti − ti−1

Bti−1 1 − P (ti−1 , ti ) Pti−1 (ti ) ti − ti−1

=

By taking the limit as ti − ti−1 → 0, we can define the spot rate by rt = −

∂ P (t, T )|T =t ∂T

so that dBt = Bt rt dt as before. The usual formulation of the HJM idea is in terms of the forward rates. As usual, we put ourselves in the context of a probability space (Ω, F, Q) on which we can define a ˆ t )t≥0 . d-dimensional Brownian motion (W Theorem. Suppose for each T , the forward rate process (f (t, T ))t∈[0,T ] has dynamics df (t, T ) = b(t, T )dt +

m X

ˆ t(i) σ (i) (t, T )dW

i=1

for some suitably regular adapted processes (b(t, T ))t∈[0,T ] and (σ (i) (t, T ))t∈[0,T ] . Let the short rate be given by rt = f (t, t) and the bank account dynamics by dBt = Bt rt dt. Finally, let the bond prices be given by P (t, T ) = e−

RT t

f (t,s) ds

.

The discounted bond prices P (t, T )/Bt are local martingales if and only if b(t, T ) =

d X

Z

(i)

σ (t, T )

T

σ (i) (t, s)ds.

t

i=1

Remark. The upshot of the HJM result is that the drift and the volatilty of the forward rate dynamics cannot be prescribed independently. Indeed, they must be related by the famous formula Z T b(t, T ) = σ(t, T ) · σ(t, s)ds, t

usually called the HJM drift condition. Notice that this drift/volatility contraint is not present in the factor models from the previous sections. 99

The difference with the short rate models is that we are now trying to model the dynamics of the whole term structure. Indeed, in the HJM framework, we can initialize the model with any initial forward rate curve T 7→ f (0, T ). Nevertheless, note that any of the short rate or factor models can be put into the HJM framework, just by choosing the initial forward rate curve to match the one predicted by the model. Proof. Applying some formal manipulations (we assume enough regularity that we can appeal to a stochastic Fubini theorem) we have Z t  Z T Z T d rs ds + f (t, s) ds =(rt − f (t, t))dt + df (t, s) ds 0 t t Z T  Z T  ˆt = b(t, s) ds dt + σ(t, s) ds dW t

t

Now, fixing T and letting Mt = P (t, T )/Bt we have by Itˆo’s formula that ! Z 2 Z T  Z T 1 T ˆ t. b(t, s) ds dt − Mt σ(t, s) ds dW σ(t, s)ds − dMt = Mt 2 t t t The process M is a local martingale if and only if the dt term vanishes, so that Z 2 Z T 1 T = σ(t, s)ds b(t, s) ds. 2 t t And since this identity should hold for all T , we can differentiate both sides to recover the HJM drift condition.  We conclude this section with some examples. In these examples, the forward rates are Gaussian under the measure Q, and hence are vulnerable to the criticism that there is a positive probability that the rates become negative. 5.1. Ho–Lee. (1986) This model is the simplest possible model HJM model. Let d = 1 and σ(t, T ) = σ0 be constant. Then ˆ t. df (t, T ) = σ02 (T − t) dt + σ0 dW or ˆ t. f (t, T ) = f (0, T ) + σ02 (T t − t2 /2) + σ0 W Here is an unusual feature of this model: if the initial forward rate curve T 7→ f (0, T ) is bounded from below, then for positive times t the forward rates f (0, T ) → ∞ as T → ∞. The short rate is then given by ˆ t. rt = f (0, t) + σ02 t2 /2 + σ0 W Hence the Ho–Lee model corresponds to the following short rate model: ˆ t. drt = (f00 (t) + σ02 t)dt + σ0 dW 100

5.2. Vasicek–Hull–White. (1990) Again let d = 1 but now σ(t, T ) = σ0 e−λ(T −t) for positive constants σ0 and λ. Then df (t, T ) =

σ02 −λ(T −t) ˆ t. e (1 − e−λ(T −t) )dt + σ0 e−λ(T −t) dW λ

The short rates are given by Z t σ02 −λ(t−s) −λ(t−s) ˆs σ0 e−λ(t−s) dW = f (0, t) + e (1 − e )ds + 0 λ 0 Z t σ02 −λt 2 ˆs = f (0, t) + 2 (1 − e ) + σ0 e−λ(t−s) dW 2λ 0 Z

rt

t

The short rate dynamics are given by   Z t σ02 −λt −λt 0 ˆ ˆ s dt σ0 e−λ(t−s) dW drt = f0 (t) + e (1 − e ) dt + σ0 dWt − λ λ 0   0 σ02 f0 (t) ˆt + f0 (t) + 2 (1 − e−2λt ) − rt dt + σ0 dW =λ λ 2λ Hence, the Hull–White extension of the Vasicek essentially replaces the mean interest rate r¯ with a time-varying, but non-random, mean rate r¯(t). 5.3. Kennedy. (1994) Note that for the HJM models discussed above, the forward rates are given by Z t Z T Z t ˆ u. σ(u, T ) · dW σ(u, s)ds du + σ(u, T ) · f (t, T ) = f (0, T ) + 0

u

0

If σ is not random, then the distribution of f (t, T ) under the risk-neutral measure Q is Gaussian with mean Z t Z T Q E [f (t, T )] = f0 (T ) + σ(u, T ) · σ(u, s)ds du 0

u

and covariance Z

s∧t

σ(u, S) · σ(u, T )du.

Q

Cov [f (s, S), f (s, T )] = 0

Kennedy reversed this logic, and considered a Gaussian random field {f (t, T ) : 0 ≤ t ≤ T } with mean µ(t, T ) and covariance C(s, t; S, T ). Suppose that covariance has the special form C(s, t; S, T ) = cs∧t (S, T ) so that, for each fixed T > 0, the increments of (f (t, T ))t∈[0,T ] are independent. Then the discounted bound prices are local martingales (actually true martingales since everything is Gaussian and we can compute the conditional expectations by hand) when the mean is given by Z T µ(t, T ) = f (0, T ) + ct∧s (s, T )ds. 0

101

An advantage of this formulation of the Gaussian HJM model is that one is no longer restricted to finite dimensional Brownian motions, and, therefore, there is much more flexibility to specify the correlation of the increments. For instance, one choice is to have the correlation of the increments decay exponentially in the difference of the maturities: ‘corr(df (t, t + x), df (t, t + y) = e−β|x−y| .’ Since the operator on L2 (R+ ) with kernel e−β|x−y| is not of finite rank, the above correlation could not be realised by a finite rank HJM model. However, since the operator is positive definite, it can be the correlation of a Gaussian random field. Actually, this model can be realised as an HJM model driven by an infinite dimensional Brownian motion. See the book Interest Rate Models: an Infinite Dimensional Stochastic Analysis Perspective by Ren´e Carmona and me for details.

102

CHAPTER 6

Crashcourse on probability theory These notes are a list of many of the definitions and results of probability theory needed to follow the Advanced Financial Models course. Since they are free from any motivating exposition or examples, and since no proofs are given for any of the theorems, these notes should be used only as a reference. A table of notation is in the appendix. 1. Measures Definition. Let Ω be a set. A sigma-field on Ω is a non-empty set F of subsets of Ω such that (1) if A ∈ F then Ac ∈ F, S (2) if A1 , A2 , . . . ∈ F then ∞ i=1 Ai ∈ F. The terms sigma-field and sigma-algebra are interchangeable. The Borel sigma-field B on R is the smallest sigma-field containing every open interval. More generally, if Ω is a topological space, for instance Rn , the Borel sigma-field on Ω is the smallest sigma-field containing every open set. Definition. Let Ω be a set and let F be a sigma-field on Ω. A measure µ on the measurable space (Ω, F) is a µ : F → [0, ∞] such that (1) µ(∅) = 0 P∞ S (2) if A1 , A2 , . . . ∈ F are disjoint then µ( ∞ i=1 µ(Ai ). i=1 Ai ) = Theorem. There exists a unique measure Leb on (R, B) such that Leb(a, b] = b − a for every b > a. This measure is called Lebesgue measure. Definition. A probability measure P on (Ω, F) is a measure such that P(Ω) = 1. Let Ω be a set, F a sigma-field on Ω, and P a probability measure on (Ω, F). The triple (Ω, F, P) is called a probability space. The set Ω is called the sample space, and an element of Ω is called an outcome. A subset of Ω which is an element of F is called an event. Let A ∈ F be an event. If P(A) = 1 then A is called an almost sure event, and if P(A) = 0 then A is called a null event. The phrase ‘almost surely’ is often abbreviated a.s. A sigma-field is called trivial if each of its elements is either almost sure or null. 2. Random variables Definition. Let (Ω, F, P) be a probability space. A random variable is a function X : Ω → R such that the set {ω ∈ Ω : X(ω) ≤ t} is an element of F for all t ∈ R. 103

Let A be a subset of R, and let X be a random variable. We use the notation {X ∈ A} to denote the set {ω ∈ Ω : X(ω) ∈ A}. For instance, the event {X ≤ t} denotes {ω ∈ Ω : X(ω) ≤ t}. The distribution function of X is the function FX : R → [0, 1] defined by FX (t) = P(X ≤ t) for all t ∈ R. We also use the term random variable to refer to measurable functions X from Ω to more general spaces. In particular, we call a function X : Ω → Rn a random variable or random vector if X(ω) = (X1 (ω), . . . , Xn (ω)) and Xi is a random variable for each i ∈ {1, . . . , n}. Definition. Let A be an event in Ω. The indicator function of the event A is the random variable 1A : Ω → {0, 1} defined by  1 if ω ∈ A 1A (ω) = 0 if ω ∈ Ac for all ω ∈ Ω. 3. Expectations and variances Definition. Let X be a random variable on (Ω, F, P). The expected value of X is denoted by E(X) and is defined as follows • X is simple, i.e. takes only a finite number of values x1 , . . . , xn . n X E(X) = xi P(X = xi ). i=1

• X ≥ 0 almost surely. E(X) = sup{E(Y ) : Y simple and 0 ≤ Y ≤ Xa.s.} Note that the expected value of a non-negative random variable may take the value ∞. • Either E(X + ) or E(X − ) is finite. E(X) = E(X + ) − E(X − ) • X is vector valued and E(|X|) < ∞. E[(X1 , . . . , Xd )] = (E[X1 ], . . . , E[Xd ]) A random variable X is integrable iff E(|X|) < ∞ and is square-integrable iff E(X 2 ) < ∞. The terms expected value, expectation, and mean are interchangeable. The variance of an integrable random variable X, written Var(X), is Var(X) = E{[X − E(X)]2 } = E(X 2 ) − E(X)2 . The covariance of square-integrable random variable X and Y , written Cov(X, Y ), is Cov(X, Y ) = E{[X − E(X)][Y − E(Y )]} = E(XY ) − E(X)E(Y ). If neither X or Y is almost surely constant, then their correlation, written ρ(X, Y ), is ρ(X, Y ) =

Cov(X, Y ) . Var(X)1/2 Var(Y )1/2 104

Random variables X and Y are called uncorrelated if Cov(X, Y ) = 0. Theorem. Let X and Y be integrable random variables. • linearity: E(aX + bY ) = aE(X) + bE(Y ) for constants a, b. • positivity: Suppose X ≥ 0 almost surely. Then E(X) ≥ 0 with equality if and only if X = 0 almost surely. Definition. For p ≥ 1, the space Lp is the collection of random variables such that E(|X|p ) < ∞. The space L∞ is the collection of random variables which are bounded almost surely. Theorem (Jensen’s inequality). Let X be a random variable and g : R → R be a convex function. Then E[g(X)] ≥ g(E[X]) whenever the expectations exist. If g is strictly convex, the above inequality is strict unless X is constant. 1 p

Theorem (H¨older’s inequality). Let X and Y be random variables and let p, q > 1 with + 1q = 1. If X ∈ Lp and Y ∈ Lq then E(XY ) ≤ E(|X|p )1/p E(|Y |q )1/q

with equality if and only if either X = 0 almost surely or X and Y have the same sign and |Y | = a|X|p−1 almost surely for some constant a ≥ 0. The case when p = q = 2 is called the Cauchy–Schwarz inequality. Definition. A random variable X is called discrete if X takes values in a countable set; i.e. there is a countable set S such that X ∈ S almost surely. If X is discrete, the function pX : R → [0, 1] defined by pX (t) = P(X = t) is called the mass function of X. The random variable X is absolutely continuous (with respect to Lebesgue measure) if and only if there exists a function fX : R → [0, ∞) such that Z t fX (x)dx P(X ≤ t) = −∞

for all t ∈ R, in which case the function fX is called the density function of X. If X is a random vector taking values in Rn , then the density of X, if it exists, is the function fX : Rn → [0, ∞) such that Z P(X ∈ A) = fX (x)dx A n

for all Borel subsets A ⊆ R . Theorem. Let the function g : R → R be such that g(X) is integrable. If X is a discrete random variable with probability mass function pX taking values in a countable set S then X E(g(X)) = g(t) pX (t). t∈S

If X is an absolutely continuous random variable with density function fX then Z ∞ E(g(X)) = g(x) fX (x) dx. −∞

105

More generally, if X is a random vector valued in Rn with density fX and g : Rn → R then Z E(g(X)) = g(x) fX (x) dx. Rn

4. Special distributions Definition. Let X be a discrete random variable taking values in Z+ with mass function pX . The random variable X is called • Bernoulli with parameter p if pX (0) = 1 − p and pX (1) = p. where 0 < p < 1. Then E(X) = p and Var(X) = p(1 − p). • binomial with parameters n and p, written X ∼ bin(n, p), if   n k pX (k) = p (1 − p)n−k for all k ∈ {0, 1, . . . , n} k where n ∈ N and 0 < p < 1. Then E(X) = np and Var(X) = np(1 − p). • Poisson with parameter λ if pX (k) =

λk −λ e for all k = 0, 1, 2, . . . k!

where λ > 0. Then E(X) = λ. • geometric with parameter p if pX (k) = p(1 − p)k−1 for all k = 1, 2, 3, . . . where 0 < p < 1. Then E(X) = 1/p. Definition. Let X be a continuous random variable with density function fX . The random variable X is called • uniform on the interval (a, b), written X ∼ unif(a, b), if fX (t) =

1 for all a < t < b b−a

for some a < b. Then E(X) = a+b . 2 • normal or Gaussian with mean µ and variance σ 2 , written X ∼ N (µ, σ 2 ), if   1 (x − µ)2 fX (t) = √ exp − for all t ∈ R 2σ 2 2πσ for some µ ∈ R and σ 2 > 0. Then E(X) = µ and Var(X) = σ 2 . • exponential with rate λ, if fX (t) = λe−λt for all t ≥ 0 for some λ > 0. Then E(X) = 1/λ. 106

If X is a random vector valued in Rn with density   1 −1 −n/2 −1/2 fX (x) = (2π) det(V ) exp − (x − µ) · V (x − µ) 2 for a positive definite n × n matrix V and vector µ ∈ Rn , then X is said to have the ndimensional normal (or Gaussian) distribution with mean µ and variance V , written X ∼ Nn (µ, V ). Then E(Xi ) = µi and Cov(Xi , Xj ) = Vij . 5. Conditional probability and expectation, independence Definition. Let B be an event with P(B) > 0. The conditional probability of an event A given B, written P(A|B), is P(A ∩ B) P(A|B) = . P(B) The conditional expectation of X given B, written E(X|B), is E(X 1B ) E(X|B) = . P(B) Theorem (The law of total probability). Let B1 , B2 , . . . be disjoint, non-null events such S B = Ω. Then that ∞ i=1 i ∞ X P(A) = P(A|Bi )P(Bi ) i=1

for all events A. Definition. Let A1 , A2 , . . . be events. If Y \ P(Ai ) P( Ai ) = i∈I

i∈I

for every finite subset I ⊂ N then the events are said to be independent. Random variables X1 , X2 , . . . are called independent if the events {X1 ≤ t1 }, {X2 ≤ t2 }, . . . are independent. The phrase ‘independent and identically distributed’ is often abbreviated i.i.d. Theorem. If X and Y are independent and integrable, then E(XY ) = E(X)E(Y ). 6. Probability inequalities Theorem (Markov’s inequality). Let X be a positive random variable. Then E(X) P(X ≥ ) ≤  for all  > 0. Corollary (Chebychev’s inequality). Let X be a random variable with E(X) = µ and Var(X) = σ 2 . Then σ2 P(|X − µ| ≥ ) ≤ 2  for all  > 0. 107

7. Characteristic functions Definition. The characteristic function of a real-valued random variable X is the function φX : R → C defined by φX (t) = E(eitX ) √ for all t ∈ R, where i = −1. More generally, if X is a random vector valued in Rn then φX : Rn → C defined by φX (t) = E(eit·X ) is the characteristic function of X. Theorem (Uniqueness of characteristic functions). Let X and Y be real-valued random variables with distribution functions FX and FY . Let φX and φY be the characteristic functions of X and Y . Then φX (t) = φY (t) for all t ∈ R if and only if FX (t) = FY (t) for all t ∈ R. 8. Fundamental probability results Definition (Modes of convergence). Let X1 , X2 , . . . and X be random variables. • Xn → X almost surely if P(Xn → X) = 1 • Xn → X in Lp , for p ≥ 1, if E|X|p < ∞ and E|Xn − X|p → 0 • Xn → X in probability if P(|Xn − X| > ) → 0 for all  > 0 • Xn → X in distribution if FXn (t) → FX (t) for all points t ∈ R of continuity of FX Theorem. The following implications hold:  Xn → X almost surely  or ⇒ Xn → X in probability ⇒ Xn → X in distribution  Xn → X in Lp , p ≥ 1 Furthermore, if r ≥ p ≥ 1 then Xn → X in Lr ⇒ Xn → X in Lp . Definition. Let A1 , A2 , . . . be events. The term eventually is defined by [ \ {An eventually} = An N ∈N n≥N

and infinitely often by {An infinitely often} =

\ [

An .

N ∈N n≥N

[The phrase ‘infinitely often’ is often abbreviated i.o.] Theorem (The first Borel–Cantelli lemma). Let A1 , A2 , . . . be a sequence of events. If ∞ X

P(An ) < ∞

n=1

then P(An infinitely often) = 0. 108

Theorem (The second Borel-Cantelli lemma). Let A1 , A2 , . . . be a sequence of independent events. If ∞ X P(An ) = ∞ n=1

then P(An infinitely often) = 1. Theorem (Monotone convergence theorem). Let X1 , X2 , . . . be positive random variables with Xn ≤ Xn+1 almost surely for all n ≥ 1, and let X = supn∈N Xn . Then Xn → X almost surely and E(Xn ) → E(X). Theorem (Fatou’s lemma). Let X1 , X2 , . . . be positive random variables. Then E(lim inf Xn ) ≤ lim inf E(Xn ). n↑∞

n↑∞

Theorem (Dominated convergence theorem). Let X1 , X2 , . . . and X be random variables such that Xn → X almost surely. If E(supn≥1 |Xn |) < ∞ then E(Xn ) → E(X). Theorem (A strong law of large numbers). Let X1 , X2 , . . . be independent and identically distributed integrable random variables with common mean E(Xi ) = µ. Then X 1 + . . . + Xn → µ almost surely. n Theorem (Central limit theorem). Let X1 , X2 , . . . be independent and identically distributed with E(Xi ) = µ and Var(Xi ) = σ 2 for each i = 1, 2, . . ., and let X1 + . . . + Xn − nµ √ Zn = . σ n Then Zn → Z in distribution, where Z ∼ N (0, 1).

109

R R+ N C Z Z+ Ac

the the the the the the the

set of real numbers set of non-negative real numbers [0, ∞) set of natural numbers {1, 2, . . .} set of complex numbers set of integers {. . . , −2, −1, 0, 1, 2, . . .} set of non-negative integers {0, 1, 2, . . .} complement of a set A, Ac = {ω ∈ Ω, ω ∈ / A}

FX pX fX φX

the the the the

distribution function of a random variable X mass function of a discrete random variable X density function of an absolutely continuous random variable X characteristic function of X

E(X) Var(X) Cov(X, Y ) E(X|B)

the the the the

expected value of the random variable X variance of X covariance of X and Y conditional expectation of X given the event B

a∧b a∨b a+ lim supn↑∞ xn lim inf n↑∞ xn

min{a, b} max{a, b} max{a, 0} the limit superior of the sequence x1 , x2 , . . . the limit inferior of the sequence x1 , x2 , . . .

a·b |a|

Euclidean inner (or dot) product in Rn , a · b = Euclidean norm in Rn , |a| = (a · a)1/2

X∼ν

the random variable X is distributed as the probability measure ν the indicator function of the event A the normal distribution with mean µ and variance σ 2 the n-dimensional normal distribution with mean µ ∈ Rn and variance V ∈ Rn×n the binomial distribution with parameters n and p the uniform distribution on the interval (a, b)

1A

N (µ, σ 2 ) Nn (µ, V ) bin(n, p) unif(a, b) Lp

the set of random variables X with E|X|p < ∞ Table 1. Notation

110

Pn

i=1

ai b i

Index

T -forward measure, 53 1FTAP continuous time, 73 discrete time, 36 one-period, 13 2FTAP continuous time, 77 discrete time, 21, 51

Cameron–Martin–Girsanov theorem, 67 Cauchy residue theorem, 26 Cauchy–Schwarz inequality, 105 central limit theorem, 109 characterisation of replicable (attainable) claims one-period, 20 characterisation of super-replication one-period, 19 Chebychev’s inequality, 107 CIR model, 97 complete market discrete time, 51 one-period, 21 conditional expectation existence and uniqueness, 33 given a sigma-field, 33 given an event, 107 continuation region, 91 contour integration, 26 Cox–Ingersoll–Ross model, 97

a.s., 103 absolutely continuous random variable, 105 adapted process, 29 admissible trading strategy, 72 almost sure event, 103 American contingent claims, 53 arbitrage absolute continuous time, 72 discrete time, 32 one-period, 10 relative continuous time, 72 attainable discrete-time, 49 one-period, 20

density function, 105 discounted price relative to a num´eraire, 45 discrete random variable, 105 dominated convergence theorem, 109 Doob decomposition, 54 Dupire’s formula, 86

Bernoulli random variable, 106 binomial random variable, 106 Black–Scholes formula, 80 Black–Scholes model, 79 Black–Scholes PDE, 82 bond, 93 Borel sigma-field, 103 Borel–Cantelli lemmas, 108, 109 Breeden–Litzenberger formula, 25 Bromwich integral, 26 Brownian motion, 58

equivalent martingale measure discrete time, 45 one-period, 16 equivalent measures, 15 exponential random variable, 106 Fatou’s lemma, 109 Feynman–Kac PDE, 81 filtration, 29 forward measure, 53 forward rate, 94 fundamental theorem of asset pricing first

call option American, 48 European, 17 111

continuous time, 73 discrete time, 36 one-period, 13 second continuous time, 77 discrete time, 21, 51

mass function, 105 mean-reverting process, 88 measurable with respect to a sigma-field, 33 measure, 103 probability, 103 monotone convergence theorem, 109 multivariate Gaussian, 107 multivariate normal distribution, 107

Gaussian random variable, 106 Gaussian random vector, 107 geometric random variable, 106 Girsanov’s theorem, 67

natural filtration of a process, 35 normal random variable, 106 normal random vector, 107 Novikov’s criterion, 67 null event, 103 num´eraire, 14

H¨ older’s inequality, 105 Heath–Jarrow–Morton drift condition, 99 Heston model, 88 historical probability measure, 16, 45 HJM drift condition, 99 Ho–Lee model, 100 Hull–White extension of Vasicek, 101

objective probability measure, 16, 45 optimal stopping time, 56 Poisson random variable, 106 predictable discrete time, 31 predictable process continuous time, 60 predictable sigma-field, 60 previsible discrete time, 31 pricing kernel, 12 probability density function, 105 probability mass function, 105 put option, 22 put-call parity, 22, 23

i.i.d., 107 implied volatility, 84 incomplete market discrete time, 51 one-period, 21 independent events, 107 independent random variables, 107 indicator function, 104 integrable random variable, 104 interest rate term structure, 94 Itˆ o process, 62 Itˆ o’s formula multi-dimensional version, 66 scalar version, 62 Itˆ o’s isometry, 59

quadratic co-variation, 65 quadratic variation, 64 Radon–Nikodym derivative, 15 Radon–Nikodym theorem, 15 replicable discrete-time, 49 one-period, 20 replication asymptotic, 23 Riccati equation, 89 risk-neutral measure, 17, 53 riskless one-period, 17

Jensen’s inequality, 105 Kennedy model, 101 Kolmogorov equation, 81 law of iterated expectations, 34 Lebesgue measure, 103 local martingale, 37 local volatility, 86 long interest rate, 95 market price of risk, 75 Markov’s inequality, 107 martingale, 34 martingale deflator one-period, 12 martingale representation theorem, 67 martingale transform, 36

self-financing continuous time, 69 discrete time, 31 short interest rate, 94 sigma-algebra, 103 sigma-field, 103 112

simple predictable process, 58 simple random variable, 104 smile, implied volatility, 84 smirk, implied volatility, 84 smooth pasting, 92 Snell envelope discrete time, 55 spot interest rate, 94 square-integarable random variable, 104 state price density, 12 statistical probability measure, 16, 45 stochastic discount factor, 12 stochastic integral discrete time, 36 stopping region, 91 stopping time, 36 strike price, 17 strong law of large numbers, 109 submartingale, 39 suicide strategy, 72 super-replication, 48 asymptotic, 23 supermartingale, 39 term structure of interest rates, 94 tower property, 34 trivial sigma-field, 30, 103 uniform random variable, 106 usual conditions, 58 Vasicek model, 95 Wiener process, 58 yield curve, 93 zero-coupon bond, 93

113