February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Chapter 1 PROBABILITY DISTRIBUTION AND STATISTICS

Key Points of Learning Random variable, Joint probability distribution, Marginal probability distribution, Conditional probability distribution, Expected value, Variance, Covariance, Correlation, Independence, Normal distribution function, Chi-square distribution, Student-t distribution, F-distribution, Data types and categories, Sampling distribution, Hypothesis, Statistical test

1.1. PROBABILITY Joint probability, marginal probability, and conditional probability are important basic tools in financial valuation and regression analyses. These concepts and their usefulness in financial data analyses will become clearer at the end of the chapter. To motivate the idea of a joint probability distribution, let us begin by looking at a time series plot or graph of two financial economic variables over time: Xt and Yt , for example, S&P 500 Index aggregate price-to-earnings ratio Xt , and S&P 500 Index return rate Yt . The values or numbers that variables Xt and Yt will take are uncertain before they happen, i.e. before time t. At time t, both economic variables take realised values or numbers xt and yt . xt and yt are said to be realised jointly or simultaneously at the same time t. Thus, we can describe their values as a joint pair (xt , yt ). If their order is preserved, it is called an ordered pair. Note that the subscript t represents the time index. The P/E or price-to-earnings ratio of a stock or a portfolio is a financial ratio showing the price paid for the stock relative to the annual net income or profit per share earned by the firm for the year. The reciprocal of the P/E ratio is called the earnings yield. The earnings yield or E/P reflects the risky annual accounting rate of return, R, on the stock. This is easily shown by the relationship $E = $P × R. In other words, P/E = 1/R. 1 FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

Electronic copy available at: http://ssrn.com/abstract=1884698

February 1, 2011

13:41

9in x 6in

2

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics 60%

40%

20%

0%

-20%

-40% S&P 500 INDEX RETURN RATE S&P 500 INDEX AGGREGATE P/E RATIO -60% 1870

1890

1910

1930

1950

1970

1990

2010

YEAR

Figure 1.1. S&P 500 Index Portfolio Return Rate and Price-Earning Ratio 1872–2009 (Data from Prof Shiller, Yale University).

In Fig. 1.1, it seems that low return corresponded to, or lagged high P/E especially at the beginnings of the years 1929–1930, 1999–2002, and 2008–2009. Conversely, high returns followed relatively low P/E ratios at the beginnings of the years 1949–1954, 1975–1982, and 2006–2007. We shall explore the issue of the predictability of stock return in more details in Chap. 8. The idea that random variables correspond with each other over time or that display some form of association is called a statistical correlation which is defined, or which has interpretative meaning, only when there is the existence of a joint probability distribution describing the random variables. In Fig. 1.2, we plot the U.S. national aggregate consumption versus national disposable income in US$ billion. Disposable income is defined as Personal Income less personal taxes. Personal Income is National Income less corporate taxes and corporate-retained earnings. In turn, National Income is Gross Domestic Product (GDP) less depreciation and indirect business taxes such as sales tax. GDP is essentially the total dollar output or gross income of the country. If we include repatriations from citizens working abroad, then it becomes Gross National Product (GNP). In Fig. 1.2, it appears that consumption increases in disposable income. The relationship is approximately linear. This is intuitive as on a per capita basis, we

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

Electronic copy available at: http://ssrn.com/abstract=1884698

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

3

$9,600

CONSUMPTION

$9,200

$8,800

$8,400

$8,000

$7,600

$7,200 $7,000

$8,000

$9,000

$10,000

$11,000

DISPOSABLE INCOME Figure 1.2. U.S.Annual NationalAggregate Consumption versus Disposable Income 1999–2009 (Data from Federal Reserve Board of U.S. in $billion).

would expect that for each person, when his or her disposable income rises, he or she would consume more. In life-cycle models of financial economics theory, some types of individual preferences could lead to consumption as an increasing function of individual wealth that consists of inheritance as well as fresh income. Sometimes, the analysis on income also breaks it down into a permanent part and a transitory part. More of these could be read in economics articles on life-cycle models and hypotheses. In Fig. 1.3, we evaluate the annual year-to-year change in consumption and disposable income and plot them on an X–Y graph. The point P1 refers to the bivariate values (x1 , y1 ), where x1 is change in disposable income and y1 is change in consumption in 2000. P2 refers to the bivariate values (x2 , y2 ), where x2 is change in disposable income and y2 is change in consumption in 2001, and so on. Subscripts to x and y indicate time. It may be construed as the end of a time period and the beginning of the next time period. In this case, subscript 1 refers to time t1 , end of year 2000. FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

4

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics $400

CHANGE IN CONSUMPTION

P1 P6

$300

P5 P7

P4

P8

$200

P2

P3

$100

$0 P10

P9 $-100 $40

$80

$120 $160 $200 $240 $280 $320 $360

CHANGE IN DISPOSABLE INCOME Figure 1.3. U.S. Annual Year-to-Year Change in National Aggregate Consumption versus Change in Disposable Income 2000–2009 (Data from Federal Reserve Board of U.S. in $billion).

The pattern in Fig. 1.3 reveals that disposable income change dropped from t = 1 to t = 2, then rose back at t = 3. After that, there was a sharp drop at t = 4 before a wild swing back up at t = 5, and so on. The changes seem to be cyclical. A cyclical but decreasing trend can be seen in consumption. However, what is more interesting is that consumption and disposable income visibly increased and decreased together. Thus, if we construe consumption as the purchases of goods and services, then the plot displays the positive income effect on such effective demand. Theoretically, each Xt and each Yt for every time t is a random variable. A random variable is a variable that takes on different values each with a given probability. It is a variable with an associated probability distribution. For the above scatter plot, since Xt and Yt occur jointly together in (Xt , Yt ), the pair is a bivariate random variable, and thus has a joint bivariate probability distribution. There are two generic classes of probability distributions: discrete probability distribution, where the random variable takes on only a countable set of possible values, and continuous probability distribution, where the random variable takes

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics Table 1.1.

5

Discrete Bivariate Joint Probability of Two Stock Return Rates. Xt+1

Yt+1

P(xt+1 , yt+1 )

a1

a2

a3

a4

a5

a6

b1 b2 b3 b4 b5 b6 P(xt+1 )

0.005 0.015 0.015 0.03 0.02 0.015 0.1

0.03 0.02 0.025 0.03 0.06 0.035 0.2

0.03 0.04 0.05 0.07 0.04 0.02 0.25

0.015 0.015 0.02 0.08 0.05 0.02 0.2

0.005 0.005 0.015 0.025 0.045 0.005 0.1

0.01 0.02 0.05 0.035 0.02 0.015 0.15

P(yt+1 ) 0.095 0.115 0.175 0.27 0.235 0.11 1

on an uncountable number of possible values. In what follows, we construct a bivariate discrete probability distribution of the return rates on two stocks. Let t denote the day number. Thus, time t = 1 is the end of day 1, t = 2 is end of day 2, and so on. Let Pt be the price in $ of stock ABC at time t. Let Xt+1 be stock ABC’s holding or discrete return rate at time t + 1. Xt+1 = Pt+1 /Pt − 1. The corresponding continuously compounded return rate at t + 1 is ln(Pt+1 /Pt ), which is approximately Xt+1 when Xt+1 is close to 0. Another stock XYZ has discrete return rate Yt+1 at time t + 1. In Table 1.1, we must take care to distinguish between random variable Xt+1 and the realised value it takes in an outcome, e.g. xt+1 ≡ a3 . For example, a3 could be 0.03 or 3%. In the bivariate discrete probability distribution shown in the table, Xt+1 takes one of six possible values viz. a1 , a2 , a3 , a4 , a5 , and a6 . The probability of any one of these six events or outcomes is given by P(Xt+1 = xt+1 ≡ ak ), or in short P(xt+1 ), and is shown in the last row of the table. The probability function P(.) for discrete probability distribution is also called a probability mass function (pmf). We should think of a probability or chance as a one-to-one function that maps or assigns a number in [0, 1] ⊂ R to each realised value of the random variable. R denotes the real line or (−∞, +∞). Likewise, the probability of any one of the six outcomes of the random variable Yt+1 is given by P(yt+1 ) and is shown in the last column of the table. Note that the probabilities of events that make up all the possibilities must sum up to 1. The joint probability of event or outcome with realised values (xt+1 , yt+1 ) is given by P(Xt+1 = xt+1 , Yt+1 = yt+1 ). These probabilities are shown in the table. For example, P(a3 , b5 ) = 0.04. This means that the probability or chance of Xt+1 = a3 and Yt+1 = b5 simultaneously occurring is 0.04 or 4%. Clearly, the sum of all the joint probabilities within the table must equal 1.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

6

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

The marginal probability of Yt+1 = b3 in the context of the (bivariate) joint probability distribution is the probability that Yt+1 takes the realised value yt+1 ≡ b3 regardless of the simultaneous value of xt+1 . We write this marginal probability as PY (Yt+1 = b3 ). The subscript Y to probability function P(.) is to highlight that it is marginal probability of Y . Sometimes, this is omitted. Note that this marginal probability is also a univariate probability. In this case, PY (b3 ) = P(a1 , b3 ) + P(a2 , b3 ) + P(a3 , b3 ) + P(a4 , b3 ) + P(a5 , b3 ) + P(a6 , b3 ). Notice that we simplify the notations indicating the aj ’s and bk ’s are values xt+1 and yt+1 , respectively, where the context is understood. In a full summation notation, PY (Yt+1 = b3 ) =

6

P(Xt+1 = aj , Yt+1 = b3 ).

j=1

This is obviously the sum of numbers in the row involving b3 and is equal to 0.175. The marginal probability of Xt+1 = a2 is given by: PX (Xt+1 = a2 ) =

6

P(Xt+1 = a2 , Yt+1 = bk ) = 0.2.

k=1

Thus, given the joint probability distribution, the marginal probability distribution of any one of the joint random variables can be found. What is 6j=1 6k=1 P(Xt+1 = aj , Yt+1 = bk )? Employing the concept of marginal probability that we just learned, 6 6

P(Xt+1 = aj , Yt+1 = bk ) =

j=1 k=1

6

PX (Xt+1 = aj ) = 1.

j=1

In the bivariate probability case, we know that future risk or uncertainty is characterised by one and only one of the 36 pairs of values (aj , bk ) that will occur. Suppose the event has occurred, and we know only that it is event {Xt+1 = a2 } that occurred, but without knowing which of the events b1 , b2 , b3 , b4 , b5 , or b6 had occurred in simultaneity. An interesting question is to ask what is the probability that {Yt+1 = b3 } had occurred, given that we know {Xt+1 = a2 } occurred. This is called a conditional probability and is denoted by P(Yt+1 = b3 |Xt+1 = a2 ). The symbol “|” represents “given” or “conditional on”. From Table 1.1, we focus on the column where it is given that {xt+1 ≡ a2 } occurred. This is shown in Table 1.2. The highlighted 0.025 is the joint probability of (a2 , b3 ). FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

7

Table 1.2. Joint Probability of Two Stock Return Rates when Xt+1 = a2 . 0.03 0.02 0.025 0.03 0.06 0.035

Intuitively, the higher (lower) this number, the higher (lower) is the conditional probability that b3 in fact had occurred simultaneously. Given that a2 had occurred, we are finding the conditional probability given {xt+1 ≡ a2 }, which is in itself a proper probability distribution and thus must have probabilities that add to 1. Then, the conditional probability must be the relative size of 0.025 to the other joint probabilities in the above column. We recall Bayes’ rule on event sets, that: P(A | B) =

P(A ∩ B) , P(B)

where A and B are events or event sets in a universe. We can think of the outcome {Xt+1 = a2 } as event B and the outcome {Yt+1 = b3 } as event A. Events can be more general, as occurrences {Xt+1 = aj }, {Yt+1 = bk }, {Xt+1 = aj , Yt+1 = bk } are all events or event sets. More exactly, P(b3 | a2 ) =

0.025 P(a2 , b3 ) = = 0.125. PX (a2 ) 0.2

In general, P(Yt+1 = bk | Xt+1 = aj ) =

P(Xt+1 = aj , Yt+1 = bk ) PX (Xt+1 = aj )

P(Xt+1 = aj , Yt+1 = bk ) = 6 . k=1 P(Xt+1 = aj , Yt+1 = bk ) When we move from discrete probability distribution, where event sets consist of discrete elements, to continuous probability distribution, where event sets are continuous, such as intervals on a real line, we have to deal with continuous functions. The continuous joint probability density function (pdf) of bivariate (Xt+1 , Yt+1 ) is represented by a continuous function f(x, y) where Xt+1 = x and FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

8

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

Yt+1 = y, and x, y are usually numbers on the real line R. Note that we simplify the notations of the realised values by dropping their time subscripts here. For a continuous probability distribution, the events are described not as point values e.g. x = 3, y = 4, but rather as intervals, e.g. event A = {(x, y): − 2 < y < 3} and event B = {(x, y): 0 < x < 9.5}. Then,

9.5 3

P(A, B) = P(0 < x < 9.5, −2 < y < 3) = 0

−2

f(x, y) dy dx.

The “support” for a random variable such as Xt+1 is the range of x. For joint normal densities, the ranges are usually (−∞, ∞). Thus, Yt+1 also has the same support. It is usually harmless to use (−∞, ∞) as supports even if the range is finite [a, b], since the probabilities of null events (−∞, a) and (b, ∞) are zeros. However, when more advanced mathematics is involved, it is typically better to be precise. In addition, notice that probability is essentially an integral of a function, whether continuous or discrete, and is area under the pdf curve. The marginal probability density function of Xt+1 and Yt+1 are given by: fX (x) =

∞

−∞

f(x, y) dy

∞ and fY (y) = −∞ f(x, y) dx. Notice that while f(x, y) is a function containing both x and y, fY (y) is a function containing only y since x is integrated out. Likewise, fX (x) is a function that contains only x. The conditional probability density functions are: f(x | y) = f(x, y)/fY (y) and f(y | x) = f(x, y)/fX (x). These conditional pdf’s contain both x and y in their arguments.

1.2. EXPECTATIONS The expected value of random variable Xt+1 is given by E(Xt+1 ) =

6

aj PX (aj ) = µX for the discrete distribution in Table 1.1,

j=1

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

9

and for continuous pdf, E(Xt+1 ) =

∞

−∞

xfX (x) dx = µX .

The conditional expected value or the conditional expectation of Xt+1 | b4 is given by: E(Xt+1 | b4 ) =

6

aj P(aj | b4 ) for the discrete distribution in Table 1.1,

j=1

and for continuous pdf, E(Xt+1 | y) =

∞

−∞

xf(x | y) dx.

Notice that for the continuous pdf, the conditional expected value given y is a function containing only y. This means that one can further evaluate more specific conditional expectations based on given sets of y values e.g. {y: − 2 < y < 3}. Then, E(Xt+1 | −2 < y < 3) is found via: ∞ ∞ f(x, −2 < y < 3) xf(x | −2 < y < 3) dx = x 3 ∞ dx −∞ −∞ −2 −∞ f(x, y) dx dy ∞ 3 x f(x, y) dy dx −2 −∞ = 3 ∞ −2 −∞ f(x, y) dx dy 3 ∞

=

−2

xf(x, y) dx dy −∞ . 3 −2 fY (y) dy

The interchange of integrals in the last step in the above equation uses the Fubini Theorem assuming some mild regularity conditions satisfied by the functions. The variance of a continuous random variable Xt+1 is given by ∞ 2 (x − µx )2 fX (x) dx. var(Xt+1 ) = σX = −∞

Variance measures the degree of movement or the variability of the random variable itself. The standard deviation (s.d.) of a random variable Xt+1 is the square root of the variance. Standard deviation is sometimes referred to as volatility and sometimes as “risk” in the finance literature.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

10

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

The covariance between two continuous random variables Xt+1 and Yt+1 is given by: cov(Xt+1 , Yt+1 ) = σXY =

∞

−∞

∞

−∞

(x − µx )(y − µy )f(x, y) dx dy.

Covariance measures the degree of co-movements between two random variables. If the two random variables tend to move together, i.e. when one increases (decreases), the probability of the other increasing (decreasing) is high, then the covariance will be a positive number. If they vary inversely, then the covariance will be a negative number. If there is no co-moving relationship and each random variable moves independently, then their covariance is zero. Notice that the covariance is also an expectation or integral. The co-movement of two random variables is typically better characterised by their correlation coefficient that is the covariance normalised or divided by their s.d.’s. corr(Xt+1 , Yt+1 ) = ρXY =

σXY . σ X σY

One other advantage of using the correlation coefficient than the covariance is that the correlation coefficient is not denominated in the value units of X or Y but is a ratio. It is important to understand that the correlation measures association but not causality. In Fig. 1.3, clear changes in consumption and income are strongly positively correlated. Suppose one concludes that increasing consumption will increase income, the resulting action will be disastrous. Or, even if one simply concludes (based on some understanding of macroeconomics theory or by intuition) that increased income causes increased consumption, it may still be premature, as there are so many other possibilities and qualifications. For example, some other variables such as general education level could lead to increases in both income and consumption. Or, suppose we think of Yt+1 as GDP and Xt+1 as population. Both increase with time due to various economic and geo-political reasons. But, it will be disastrous for policy implication to think that increasing population leads to or causes increase in GDP. This has to assume fairly constant employment and output per capita.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

11

For general random variables X and Y (dropping time subscripts), we can write their means, variances, and covariance as follows. E(X) = µX E(Y ) = µY var(X) = E(X − µX )2 = E(X2 ) − µ2X var(Y ) = E(Y − µY )2 = E(Y 2 ) − µ2Y cov(X, Y ) = E(X − µX )(Y − µY ) = E(XY ) − µX µY . Covariances are linear operators. A function is f : A → B or {f : f(a) = b; a ∈ A, b ∈ B} in which A is the domain set and B the range set and each a is mapped onto one and only one element b in B. We can think of an operator as a special case of a function where the domain and range consist of normed space such as a vector space. These technicalities are not important except in more advanced courses. Now consider N number of random variables Xi , where i = 1, 2, . . ., N. A very useful property of a covariance is shown below. n N N N cov Xi , Xj = E [Xi − E(Xi )] [Xj − E(Xj )] i=1

j=1

i=1

=E

N N

j=1

[Xi − E(Xi )][Xj − E(Xj )]

i=1 j=1

=

N N

E{[Xi − E(Xi )][Xj − E(Xj )]}

i=1 j=1

=

N N

cov(Xi , Xj ).

i=1 j=1

A special case of the above is var(X + Y ) = cov(X + Y, X + Y ) = cov(X, X) + cov(X, Y ) + cov(Y, X) + cov(Y, Y ) = var(X) + var(Y ) + 2 cov(X, Y ).

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

12

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

A convenient property of a correlation coefficient ρ is that it lies between −1 and +1. This is shown as follows. For any real θ, var(X − θY ) = σX2 + θ 2 σY2 − 2θρσX σY ≥ 0. Put θ = ρ σσXY . Then, σX2 + ρ2 σX2 − 2ρ2 σX2 ≥ 0. Thus, for any random variable X and Y , σX2 (1 − ρ2 ) ≥ 0, and hence (1 − ρ2 ) ≥ 0, or ρ2 ≤ 1. Therefore, −1 ≤ ρ ≤ 1.

1.3. DISTRIBUTIONS Continuous probability distributions are commonly employed in regression analyses. The commonest probability distribution is the normal (Gaussian) distribution. The pdf of a normally distributed random variable X is given by f(x) = √

1 x−µ 2 σ

1 2πσ 2

e− 2

for −∞ < x < ∞,

where the mean of x is µ and the s.d. of x is σ. µ and σ are given constants. +∞ xf(x) dx = µ E(X) = −∞

Var(X) = E(X − µ)2 +∞ = (x − µ)2 f(x) dx −∞

= σ2. The cumulative distribution function (cdf) of X is x f(x)dx. F(X) = −∞

d

We can write the distribution of X as X ∼ N(µ, σ 2 ) in which the arguments indicate the mean and variance of the normal random variable. Suppose we define a corresponding random variable

Z=

X−µ σ

or

X = µ + σZ,

where the symbol “=” means “to define”. The second “equality” is interpreted as not just equivalence in distribution, but that whenever Z takes value z, then X

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

13

takes value x = µ + σz. Then, E(Z) = 0

and Var(Z) = 1.

Since a constant multiple of a normal random variable is normally distributed and a sum of normal random variables is also a normal random variable, then d Z ∼ N(0, 1). Z has pdf f x−µ and is called the standard normal variable. σ For normal distribution N(µ, σ 2 ), x−µ σ x−µ dz, f F(X) = σ −∞ where f x−µ is the standard normal pdf and z = x−µ σ σ . The standard normal cdf is often written as (z). For the standard normal Z, P(a ≤ z ≤ b) = (b) − (a). The normal distribution is a familiar workhorse in statistical estimation and testing. The normal distribution pdf curve is “bell-shaped”. Areas under the curve are associated with probabilities. Figure 1.4 shows a standard normal pdf N(0,1) and the associated probability as area under the curve. The corresponding z values of random variable (r.v.) Z can be seen in the following standard normal distribution Table 1.3. For example, the probability P(−∞ < Z < 1.5) = 0.933. This same probability can be written as P(−∞ ≤ Z < 1.5) = 0.933, P(−∞ < Z ≤ 1.5) = 0.933, or P(−∞ ≤ Z ≤ 1.5) = 0.933. This is because for continuous pdf, P(Z = 1.5) = 0. From the symmetry of the normal pdf, P(−a < Z < ∞) = P(−∞ < Z < a), we can also compute the following. Total area from −∞ to ∞

5%

Z 0

a = –1.645 Figure 1.4.

Standard Normal Probability Density Function of Z.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

14

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics Table 1.3. Z

Area under curve from –∞ to z

Z

Area under curve from –∞ to z

0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 1.100 1.282 1.300 1.400 1.500

0.500 0.539 0.579 0.618 0.655 0.691 0.726 0.758 0.788 0.816 0.841 0.864 0.900 0.903 0.919 0.933

1.600 1.645 1.700 1.800 1.960 2.000 2.100 2.200 2.300 2.330 2.400 2.500 2.576 2.600 2.700 2.800

0.945 0.950 0.955 0.964 0.975 0.977 0.982 0.986 0.989 0.990 0.992 0.994 0.995 0.996 0.997 0.998

P (Z > 1.5) = 1 − P (−∞ < Z ≤ 1.5) = 1 − 0.933 = 0.067. P (−∞ < Z ≤ −1.0) = P (Z > 1.0) = 1 − 0.841 = 0.159. P (−1.0 < Z < 1.5) = P (−∞ < Z < 1.5) − (−∞ < Z ≤ −1.0) = 0.933 − 0.159 = 0.774. P (Z ≤ −1.0 or Z ≥ 1.5) = 1 − P (−1.0 < Z < 1.5) = 1 − 0.774 = 0.226. Several values of Z under N(0,1) are commonly encountered, viz. 1.282, 1.645, 1.960, 2.330, and 2.576. P (Z > 1.282) = 0.10 or 10%. P (Z < −1.645 or Z > 1.645) = 0.10 or 10%. P (Z > 1.960) = 0.025 or 2.5%. P (Z < −1.960 or Z > 1.960) = 0.05 or 5%. P (Z > 2.330) = 0.01 or 1%. P (Z < −2.576 or Z > 2.576) = 0.01 or 1%. The case for P (Z < −1.645) = 5% is shown in Fig. 1.4. The bivariate normal distribution of random variables X, Y is given by f(x, y) =

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

1 1 e− 2 q , 2πσX σY 1 − ρ2

(1.1)

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

15

where 1 q= 1 − ρ2

x − µX σX

2

x − µX − 2ρ σX

y − µY σY

+

y − µY σY

2

x,y) = ρ. and cov( σX σY The multivariate normal distribution pdf (p-variate normal pdf) is given by −1 1 1 T (x − µ) , f x1 , x2 , . . . , xp = exp − (x − µ) 2 (2π)p/2 ||1/2

where x is the vector of random variables X1 to Xp , µ is the p × 1 vector of means of x, and is the p × p covariance matrix of x. If p = 2 is substituted into the above, the bivariate pdf shown in Eq. (1.1) be obtained. can k The kth moment of random variable X is x f(x) dx where f(x) is the pdf of X.k If µ = E(X) is the mean of X, the kth central moment of X is (x − µ) f(x) dx. Notice that the variance is the second central moment of X. The third central moment ÷ variance3/2 is known as skewness. The fourth central moment ÷ variance2 is known as kurtosis. The normal distribution r.v. X ∼ N(µ, σ 2 ) has mean µ, variance σ 2 , skewness 0, and kurtosis that is equal to 3. Hence, the standard normal variate Z ∼ N(0, 1) has a mean 0, variance 1, skewness 0, and kurtosis 3. Many financial variables, e.g. daily stock returns, currency rate of change, etc. display skewness as well as large kurtosis compared with the benchmark normal distribution with symmetrical pdf, skewness = 0, and kurtosis = 3. Departure from normality is illustrated by a pdf in Fig. 1.5. The shaded area in Fig. 1.5 shows a normal pdf. The unshaded curve shows pdf of a random variable with negative skewness, a kurtosis larger than that of the normal random variable, and mean µ < 0. The concept of stochastic independence between random variables is important. Two random variables X and Y are said to be stochastically independent if and only if their joint pdf can be expressed as follows: f(X, Y ) = fx (X)fy (Y ). One implication of the above is that for any function h(.) of X and any function g(.) of Y , their expectation can be found as: E(h(X)g(Y )) = E(h(X))E(g(Y )). FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

16

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

f(x) Negative or left skewness (longer left tail)

Fat tails with kurtosis > 3

x

µ Figure 1.5.

0

Example of a Pdf with Negative Skewness and Large Kurtosis.

A special case is the covariance operator. If X and Y are (stochastically) independent, then it implies that their covariance is zero: cov(X, Y ) = E(X − µX )(Y − µY ) = E(X − µX )E(Y − µY ) = 0. The converse is not always true. It is true only for special cases such as when X and Y are jointly normally distributed. When X and Y are jointly normally distributed, then if they have zero covariance, they are stochastically independent. For bivariate normal pdf, conditional pdf g(x | y) =

f(x, y) . fY (y)

Or, 1√

g(x | y) =

2πσX σY

q

1−ρ2

− 1 √ e 2 σY 2π

1

e− 2

y−µY σY

2

x−µ

y−µ

2

1 X −ρ Y − 1 σX σY 2 = e 2(1−ρ ) √ σX 2π 1 − ρ2 σX 1 2 − 1 2 2 [(x−µX )−ρ σY (y−µY )] = e 2(1−ρ )σX 2πσX2 (1 − ρ2 )

=

1

e

2 2πσX|Y

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

−

1 2 2σX|Y

(x−µX|Y )2

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

17

2 where σX|Y = (1 − ρ2 )σx2 is the variance of X conditional on Y = y, and µX|Y = µX + ρ σσXY (y − µY ) is the mean of X conditional on Y = y. There are some common continuous probability distributions that are related d

to the normal distribution. If random variable X ∼ N(µ, σ 2 ), then random vari2 ∼ χ12 is a chi-square distribution with 1 degree of freedom. able V = X−µ σ If X1 , X2 , X3 , . . . , Xn are n random variables each independently drawn from the same population distribution N(µ, σ 2 ), or think of {Xi }i=1 to n as a random 2 sample of size n, then ni=1 Xiσ−µ ∼ χn2 is a chi-square distribution with n degrees of freedom. d

d

If X ∼ N(0, 1), and V ∼ χr2 , and both X and V are stochastically independent, then

√X Vr −1

d

is a Student-t distribution with r degrees of freedom. If U ∼ χr21 , Ur −1 d

d

1 ∼ Fr1 ,r2 V ∼ χr22 , and both U and V are stochastically independent, then Vr−1 2 is an F -distribution with degrees of freedom r1 and r2 . If random variable d X ∼ N µ, σ 2 and Y = exp(X) or X = ln(Y ), then Y is a random variable with a lognormal distribution.

1.4. STATISTICAL ESTIMATION Suppose a random variable X with a fixed normal distribution N(µ, σ 2 ) is given. Suppose there is a random draw of a number or an outcome from this distribution. This is the same as stating that random variable X takes a realised value x. Let this value be x1 ; it may be say 3.89703. Suppose we repeatedly make random draws and thus form a sample of n observations: x1 , x2 , x3 , . . . , xn−1 , xn . This is called a random sample with a sample size of n. Each xi comes from the same distribution N(µ, σ 2 ), but each of xi and xj are realisations from independent sampling. We next compute a statistic, which is a function of the realised values {xk }, k = 1, 2, . . . , n. Consider a statistic, the sample mean. x¯ = n1 nk=1 xk . Another common sample statistic is the unbiased sample variance 1 s = (xk − x¯ )2 . n−1 n

2

k=1

Each time we select a random sample of size n, we obtain a realisation x¯ . Thus, x¯ is itself a realisation of a random variable, and this r.v. can be FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

18

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

denoted by ¯n = X

1 Xk , n N

k=1

¯ n is a where Xk above is clearly the random variable from N(µ, σ 2 ) itself. X random variable and its probability distribution is called the sampling distribution of the mean or perhaps more clearly, the distribution of the sample mean. ¯ n? What is the exact probability distribution of X ¯ n) = E(X

1 1 1 E Xk = E(Xk ) = µ = µ. n n n n

n

k=1

¯ n) = var(X

1 var n2

n

k=1

n

Xk =

k=1

1 n2

k=1

n

var(Xk ) =

k=1

nσ 2 σ2 . = n2 n

¯ n is a normal random variable, therefore, Since X σ2 ¯ Xn ∼ N µ, . n The standardised normal random variable then becomes √ ¯ n − µ) ¯n−µ n(X X ∼ N(0, 1). = σ σ2 n

On the other hand, E(s2 ) = σ 2 . But s2 itself is a sampling distribution. 2

d

2 2 (n − 1) σs 2 ∼ χn−1 . It can be seen that E(χn−1 ) = n − 1, the number of degrees of freedom of the chi-square random variable. Therefore, √ ¯ n −µ) n(X σ s2 σ2

√ ¯ n − µ) n(X = s

is distributed as Student-t with (n−1) degrees of freedom and zero mean. Denote the random variable with t-distribution, n − 1 degrees of freedom, as tn−1 . Then, √ ¯ n − µ) d n(X ∼ tn−1 . s Suppose we find (−a, +a), a > 0, such that Prob(−a ≤ tn−1 ≤ +a) = 95%. Since tn−1 is symmetrically distributed, then Prob(−a ≤ tn−1 ) = 97.5% and FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

19

Prob(tn−1 ≤ +a) = 97.5%. Thus, √ ¯ n − µ) n(X ≤ a = 0.95. Prob −a ≤ s ¯ n + a √s ) = 0.95. ¯ n − a √s ≤ µ ≤ X Also, Prob(X n n d

Suppose x1 , x2 , x3 , . . . , xn−1 , xn are randomly sampled from X ∼ N(µ, σ 2 ). Sample size n = 30. The t-statistic value such that Prob(t29 ≤ a) = 97.5% is a = 2.045. Then, s s ¯ n − 2.045 √ ≤ µ ≤ X ¯ n + 2.045 √ Prob X = 0.95. 30 30 Hence, the 95% confidence interval estimate of µ is given by s s ¯ n + 2.045 √ ¯ n − 2.045 √ , X X 30 30 when estimated s is entered.

1.5. STATISTICAL TESTING In many situations, there is a priori (or ex-ante) information about the value of the mean µ, and it may be desirable to use observed data to test if the information is correct. µ is called a parameter of the population or fixed distribution N(µ, σ 2 ). A statistical hypothesis is an assertion about the true value of the population parameter, in this case µ. A simple hypothesis specifies a single value for the parameter, while a composite hypothesis will specify more than one value. We will work with the simple null hypothesis H0 (sometimes this is called the maintained hypothesis), which is what is postulated to be true. The alternative hypothesis HA is what will be the case if the null hypothesis is rejected. Together the values specified under H0 and HA should form the total universe of possibilities of the parameter. For example, H0 : µ = 1 HA : µ = 1. A statistical test of the hypothesis is a decision rule that, given the inputs from the sample values and hence sampling distribution, chooses to either reject or else not reject (intuitively similar in meaning to “accept”) the null H0 . Given this rule, the set of sample outcomes or sample values that lead to the rejection of the H0 is called the critical region. If H0 is true but is rejected, a Type I error is committed. If H0 is false but is accepted, a Type II error is committed. FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

20

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

tn–1

X

–a Figure 1.6.

0

+a

Critical Region Under the Null Hypothesis H0 : µ = 1.

The statistical rule on H0 : µ = 1, HA : µ = 1, is that if the test statistic ¯ tn−1 = (X√ns−1) which is t-distributed with (n−1) degrees of freedom, falls within n

the critical region (shaded), defined as {tn−1 < −a or tn−1 > +a}, a > 0, as shown in Fig. 1.6, then H0 is rejected in favour of HA . Otherwise, H0 is not rejected and is “accepted”. If H0 is true, then the t-distribution would be correct, and therefore the probability of rejecting H0 would be the area of the critical region, say 5% in this case. Notice that for n = 61, P(−2.00 < t60 < 2.00) = 0.95. Moreover, the t-distribution is symmetrical, so each of the right and left shaded tails makes up 2.5%. This is called a two-tailed test with a significance level of 5%. The significance level is the probability of committing a Type I error when H0 is true. In the above example, if the sample t-statistic is 1.045, then it is 1.045) = 2 × 0.15 = 0.30 or 30%. Another way to verify the test is that if the p-value < test significance level, reject H0 ; otherwise H0 cannot be rejected. In theory, if we reduce the probability of Type I error, the probability of Type II error increases, and vice versa. This is illustrated in Fig. 1.7. Suppose H0 is false, and µ > 1, so the true tn−1 distribution is represented by the dotted curve in Fig. 1.7. The critical region {tn−1 < −2.00 or tn−1 > 2.00} remains the same, so the probability of committing Type II error is 1− sum of shaded areas. Clearly, this probability increases as we reduce the critical region in order to reduce Type I error. Although it is ideal to reduce both types of errors, the tradeoff forces us to choose between the two. In practice, we fix the probability of Type I error when H0 is true, i.e. determine a fixed significance level e.g. 10%, 5%, or 1%. The power of a test is the probability of rejecting H0 when it is false. FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

21

pdf f(X) tn−1

X −2

0

2

Figure 1.7.

Thus, power = 1− P(Type II error). Or, power equals the shaded area in Fig. 1.7. Clearly, this power is a function of the alternative parameter value µ = 1. We may determine such a power function of µ = 1. Thus, reducing significance level also reduces power and vice versa. In statistics, it is customary to want to design a test so that its power function of µ = 1 equals or exceeds that of any other test with equal significance level for all plausible parameter values µ = 1 in HA . If this test is found, it is called a uniformly most powerful test. We have seen the performance of a two-tailed test. Sometimes, we embark instead on a one-tailed test such as H0 : µ = 1, HA : µ > 1, in which we theoretically rule out the possibility of µ < 1, i.e. P(µ < 1) = 0. In this case, it makes sense to limit the critical region to only the right side, for when µ > 1, then tn−1 will become larger. Thus, at the one-tail 5% significance level, the critical region under H0 is {tn−1,95% > 1.671} for n = 61 where tn−1,95% is the 95th percentile of the t distribution with n − 1 d.f.

1.6. DATA TYPES Consider the types of data series that are commonly encountered in regression analyses. There are four generic types, viz. (a) (b) (c) (d)

Time series, Cross-sectional, Pooled time series cross-sectional, and Panel/longitudinal/micropanel.

Time series are the most prevalent in empirical studies in finance. They are data indexed by time. Each data point is a realisation of a random variable at a

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

22

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

particular point in time. The data occur as a series over time. A sample of such data is typically a collection of the realised data over time such as the history of ABC stock’s prices on a daily basis from 1970 January 2 till 2002 December 31. Cross-sectional data are also common in finance. An example is the reported annual net profit of all companies listed on an exchange for a specific year. If we collect the cross sections for each year over a 20-year period, then we have a pooled time series cross section of companies over 20 years. Panel data are less used in finance. They are data collected by tracking specific individuals or subjects over time and across subjects. The nature of data also differs according to the following categories. (a) Quantitative, (b) Ordinal e.g. very good, good, average, and poor, and (c) Nominal/categorical e.g. married/not married, college graduate/nongraduate. Quantitative data such as return rates, prices, volume of trades, etc. have the least limitations and therefore the greatest use in finance. These data provide not only ordinal rankings or comparisons of magnitudes, but also exact degrees of comparisons. There are some limitations and therefore special considerations to the use of the other categories of data. In the treatment of ordinal and nominal data, we may have to use specific tools such as dummy variables in regression.

1.7. PROBLEM SET (1.1) X, Y, Z are r.v.’s with a joint pdf f (X, Y, Z) that is integrable. Show using the concept of marginal pdf’s that E(X + Y + Z) = E(X) + E(Y) + E(Z) by integrating over (X + Y + Z). N (1.2) Show how one could express cov( N i=1 Xi , j=1 Xj ) in terms of the N by N covariance matrix N×N ? (1.3) The following is the probability distribution table of a trivariate U1 , U2 , and U3 . U1 −1 −1 −1 −1 1 1 1 −2 −2 2 2 −2 −2 2 U2 U3 −3 3 −3 3 −3 3 −3 P(U1 ,U2 ,U3 ) 0.125 0.125 0.125 0.125 0.125 0.125 0.125

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

1 2 3 0.125

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

23

Find the bivariate probability distribution P(U1 , U2 ). Find the marginal P(U3 ). (1.4) In the probability distribution table of a trivariate U1 , U2 , and U3 , U1 −1 −1 −1 −1 1 1 1 U2 −2 −2 2 2 −2 −2 2 −3 3 −3 3 −3 3 −3 U3 P(U1 ,U2 ,U3 ) 0.125 0.125 0.125 0.125 0.125 0.125 0.125

1 2 3 0.125

after finding P(U1 , U2 ), suppose Yi = bXi + Ui , i = 1, 2, and X1 = 1, X2 = 2, (i) Find E(Ui )’s and cov(U1 , U2 ). 2 (ii) Find the probability distribution of estimator bˆ = i=1 Xi Yi 2 ( i=1 Xi2 )−1 . This probability distribution of the estimator is called ˆ the sampling distribution of b. (iii) Find the mean and variance of bˆ from its probability distribution. (1.5) X and Y have joint pdf f(X, Y) = exp(−X − Y) for 0 < X, Y < ∞, and pdf is 0 elsewhere. Find the marginal pdf’s of X and Y . Are X and Y stochastically dependent? (1.6) X and Y have a joint pdf f(X, Y) = 1 in the set {0 ≤ X ≤ 2, 0 ≤ Y ≤ X/2}. (i) Find the marginal distributions of X and Y . (ii) Find the variances of X and Y , and the covariance of X and Y . (iii) Find the conditional means E(X | Y), E(Y | X), and conditional variances var(X | Y), var(Y | X). (1.7) Xit is distributed as independent univariate normal, N(0, 1) for i = 1, 2, 3, and t = 1, 2, . . . , 60. Yt = 0.5X1t + 0.3X2t + 0.2X3t . What are the mean and the standard deviation of Yt ? If a computer program runs and churns out 3K number of random values Zj belonging to univariate normal N(0, 1) distribution, and Wi = 0.5Z3i−2 + 0.3Z3i−1 + 0.2Z3i for i = 1, 2, . . . , K, what is the variance of the sampling mean K−1 K i=1 Wi ? 1 (1.8) Suppose r.v. Xi ∼ N(0, 60 ) for i = 1, 2, . . . , K, and Xi and Xj are independent when i = j. If AXi ∼ N(0, 1) where A is a constant, what is A? If random vector Y = (X1 , X2 , . . . , XK ), what is the distribution of YYT ? (1.9) If cov(a, b) = 0.1, cov(c, a) = 0.2, cov(d, a) = 0.3, and x = b+2c+3d, what is cov(a, x)?

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

24

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

(1.10) Suppose X, Y , and Z are jointly distributed as follows. Probability

X

Y

Z

0.5 0.5

+1 −1

−1 0

0 +1

Find cov(X, Y), cov(X, Z), and cov(Y, Z).

FURTHER RECOMMENDED READINGS 1. Mood, A.M., E.A. Graybill and D.C. Boes, Third or later editions, Introduction to the Theory of Statistics, McGraw-Hill publisher. 2. Hogg, R.V. and A.T. Craig, Introduction to Mathematical Statistics, Fourth or later editions, Collier MacMillan publisher.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Chapter 1 PROBABILITY DISTRIBUTION AND STATISTICS

Key Points of Learning Random variable, Joint probability distribution, Marginal probability distribution, Conditional probability distribution, Expected value, Variance, Covariance, Correlation, Independence, Normal distribution function, Chi-square distribution, Student-t distribution, F-distribution, Data types and categories, Sampling distribution, Hypothesis, Statistical test

1.1. PROBABILITY Joint probability, marginal probability, and conditional probability are important basic tools in financial valuation and regression analyses. These concepts and their usefulness in financial data analyses will become clearer at the end of the chapter. To motivate the idea of a joint probability distribution, let us begin by looking at a time series plot or graph of two financial economic variables over time: Xt and Yt , for example, S&P 500 Index aggregate price-to-earnings ratio Xt , and S&P 500 Index return rate Yt . The values or numbers that variables Xt and Yt will take are uncertain before they happen, i.e. before time t. At time t, both economic variables take realised values or numbers xt and yt . xt and yt are said to be realised jointly or simultaneously at the same time t. Thus, we can describe their values as a joint pair (xt , yt ). If their order is preserved, it is called an ordered pair. Note that the subscript t represents the time index. The P/E or price-to-earnings ratio of a stock or a portfolio is a financial ratio showing the price paid for the stock relative to the annual net income or profit per share earned by the firm for the year. The reciprocal of the P/E ratio is called the earnings yield. The earnings yield or E/P reflects the risky annual accounting rate of return, R, on the stock. This is easily shown by the relationship $E = $P × R. In other words, P/E = 1/R. 1 FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

Electronic copy available at: http://ssrn.com/abstract=1884698

February 1, 2011

13:41

9in x 6in

2

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics 60%

40%

20%

0%

-20%

-40% S&P 500 INDEX RETURN RATE S&P 500 INDEX AGGREGATE P/E RATIO -60% 1870

1890

1910

1930

1950

1970

1990

2010

YEAR

Figure 1.1. S&P 500 Index Portfolio Return Rate and Price-Earning Ratio 1872–2009 (Data from Prof Shiller, Yale University).

In Fig. 1.1, it seems that low return corresponded to, or lagged high P/E especially at the beginnings of the years 1929–1930, 1999–2002, and 2008–2009. Conversely, high returns followed relatively low P/E ratios at the beginnings of the years 1949–1954, 1975–1982, and 2006–2007. We shall explore the issue of the predictability of stock return in more details in Chap. 8. The idea that random variables correspond with each other over time or that display some form of association is called a statistical correlation which is defined, or which has interpretative meaning, only when there is the existence of a joint probability distribution describing the random variables. In Fig. 1.2, we plot the U.S. national aggregate consumption versus national disposable income in US$ billion. Disposable income is defined as Personal Income less personal taxes. Personal Income is National Income less corporate taxes and corporate-retained earnings. In turn, National Income is Gross Domestic Product (GDP) less depreciation and indirect business taxes such as sales tax. GDP is essentially the total dollar output or gross income of the country. If we include repatriations from citizens working abroad, then it becomes Gross National Product (GNP). In Fig. 1.2, it appears that consumption increases in disposable income. The relationship is approximately linear. This is intuitive as on a per capita basis, we

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

Electronic copy available at: http://ssrn.com/abstract=1884698

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

3

$9,600

CONSUMPTION

$9,200

$8,800

$8,400

$8,000

$7,600

$7,200 $7,000

$8,000

$9,000

$10,000

$11,000

DISPOSABLE INCOME Figure 1.2. U.S.Annual NationalAggregate Consumption versus Disposable Income 1999–2009 (Data from Federal Reserve Board of U.S. in $billion).

would expect that for each person, when his or her disposable income rises, he or she would consume more. In life-cycle models of financial economics theory, some types of individual preferences could lead to consumption as an increasing function of individual wealth that consists of inheritance as well as fresh income. Sometimes, the analysis on income also breaks it down into a permanent part and a transitory part. More of these could be read in economics articles on life-cycle models and hypotheses. In Fig. 1.3, we evaluate the annual year-to-year change in consumption and disposable income and plot them on an X–Y graph. The point P1 refers to the bivariate values (x1 , y1 ), where x1 is change in disposable income and y1 is change in consumption in 2000. P2 refers to the bivariate values (x2 , y2 ), where x2 is change in disposable income and y2 is change in consumption in 2001, and so on. Subscripts to x and y indicate time. It may be construed as the end of a time period and the beginning of the next time period. In this case, subscript 1 refers to time t1 , end of year 2000. FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

4

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics $400

CHANGE IN CONSUMPTION

P1 P6

$300

P5 P7

P4

P8

$200

P2

P3

$100

$0 P10

P9 $-100 $40

$80

$120 $160 $200 $240 $280 $320 $360

CHANGE IN DISPOSABLE INCOME Figure 1.3. U.S. Annual Year-to-Year Change in National Aggregate Consumption versus Change in Disposable Income 2000–2009 (Data from Federal Reserve Board of U.S. in $billion).

The pattern in Fig. 1.3 reveals that disposable income change dropped from t = 1 to t = 2, then rose back at t = 3. After that, there was a sharp drop at t = 4 before a wild swing back up at t = 5, and so on. The changes seem to be cyclical. A cyclical but decreasing trend can be seen in consumption. However, what is more interesting is that consumption and disposable income visibly increased and decreased together. Thus, if we construe consumption as the purchases of goods and services, then the plot displays the positive income effect on such effective demand. Theoretically, each Xt and each Yt for every time t is a random variable. A random variable is a variable that takes on different values each with a given probability. It is a variable with an associated probability distribution. For the above scatter plot, since Xt and Yt occur jointly together in (Xt , Yt ), the pair is a bivariate random variable, and thus has a joint bivariate probability distribution. There are two generic classes of probability distributions: discrete probability distribution, where the random variable takes on only a countable set of possible values, and continuous probability distribution, where the random variable takes

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics Table 1.1.

5

Discrete Bivariate Joint Probability of Two Stock Return Rates. Xt+1

Yt+1

P(xt+1 , yt+1 )

a1

a2

a3

a4

a5

a6

b1 b2 b3 b4 b5 b6 P(xt+1 )

0.005 0.015 0.015 0.03 0.02 0.015 0.1

0.03 0.02 0.025 0.03 0.06 0.035 0.2

0.03 0.04 0.05 0.07 0.04 0.02 0.25

0.015 0.015 0.02 0.08 0.05 0.02 0.2

0.005 0.005 0.015 0.025 0.045 0.005 0.1

0.01 0.02 0.05 0.035 0.02 0.015 0.15

P(yt+1 ) 0.095 0.115 0.175 0.27 0.235 0.11 1

on an uncountable number of possible values. In what follows, we construct a bivariate discrete probability distribution of the return rates on two stocks. Let t denote the day number. Thus, time t = 1 is the end of day 1, t = 2 is end of day 2, and so on. Let Pt be the price in $ of stock ABC at time t. Let Xt+1 be stock ABC’s holding or discrete return rate at time t + 1. Xt+1 = Pt+1 /Pt − 1. The corresponding continuously compounded return rate at t + 1 is ln(Pt+1 /Pt ), which is approximately Xt+1 when Xt+1 is close to 0. Another stock XYZ has discrete return rate Yt+1 at time t + 1. In Table 1.1, we must take care to distinguish between random variable Xt+1 and the realised value it takes in an outcome, e.g. xt+1 ≡ a3 . For example, a3 could be 0.03 or 3%. In the bivariate discrete probability distribution shown in the table, Xt+1 takes one of six possible values viz. a1 , a2 , a3 , a4 , a5 , and a6 . The probability of any one of these six events or outcomes is given by P(Xt+1 = xt+1 ≡ ak ), or in short P(xt+1 ), and is shown in the last row of the table. The probability function P(.) for discrete probability distribution is also called a probability mass function (pmf). We should think of a probability or chance as a one-to-one function that maps or assigns a number in [0, 1] ⊂ R to each realised value of the random variable. R denotes the real line or (−∞, +∞). Likewise, the probability of any one of the six outcomes of the random variable Yt+1 is given by P(yt+1 ) and is shown in the last column of the table. Note that the probabilities of events that make up all the possibilities must sum up to 1. The joint probability of event or outcome with realised values (xt+1 , yt+1 ) is given by P(Xt+1 = xt+1 , Yt+1 = yt+1 ). These probabilities are shown in the table. For example, P(a3 , b5 ) = 0.04. This means that the probability or chance of Xt+1 = a3 and Yt+1 = b5 simultaneously occurring is 0.04 or 4%. Clearly, the sum of all the joint probabilities within the table must equal 1.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

6

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

The marginal probability of Yt+1 = b3 in the context of the (bivariate) joint probability distribution is the probability that Yt+1 takes the realised value yt+1 ≡ b3 regardless of the simultaneous value of xt+1 . We write this marginal probability as PY (Yt+1 = b3 ). The subscript Y to probability function P(.) is to highlight that it is marginal probability of Y . Sometimes, this is omitted. Note that this marginal probability is also a univariate probability. In this case, PY (b3 ) = P(a1 , b3 ) + P(a2 , b3 ) + P(a3 , b3 ) + P(a4 , b3 ) + P(a5 , b3 ) + P(a6 , b3 ). Notice that we simplify the notations indicating the aj ’s and bk ’s are values xt+1 and yt+1 , respectively, where the context is understood. In a full summation notation, PY (Yt+1 = b3 ) =

6

P(Xt+1 = aj , Yt+1 = b3 ).

j=1

This is obviously the sum of numbers in the row involving b3 and is equal to 0.175. The marginal probability of Xt+1 = a2 is given by: PX (Xt+1 = a2 ) =

6

P(Xt+1 = a2 , Yt+1 = bk ) = 0.2.

k=1

Thus, given the joint probability distribution, the marginal probability distribution of any one of the joint random variables can be found. What is 6j=1 6k=1 P(Xt+1 = aj , Yt+1 = bk )? Employing the concept of marginal probability that we just learned, 6 6

P(Xt+1 = aj , Yt+1 = bk ) =

j=1 k=1

6

PX (Xt+1 = aj ) = 1.

j=1

In the bivariate probability case, we know that future risk or uncertainty is characterised by one and only one of the 36 pairs of values (aj , bk ) that will occur. Suppose the event has occurred, and we know only that it is event {Xt+1 = a2 } that occurred, but without knowing which of the events b1 , b2 , b3 , b4 , b5 , or b6 had occurred in simultaneity. An interesting question is to ask what is the probability that {Yt+1 = b3 } had occurred, given that we know {Xt+1 = a2 } occurred. This is called a conditional probability and is denoted by P(Yt+1 = b3 |Xt+1 = a2 ). The symbol “|” represents “given” or “conditional on”. From Table 1.1, we focus on the column where it is given that {xt+1 ≡ a2 } occurred. This is shown in Table 1.2. The highlighted 0.025 is the joint probability of (a2 , b3 ). FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

7

Table 1.2. Joint Probability of Two Stock Return Rates when Xt+1 = a2 . 0.03 0.02 0.025 0.03 0.06 0.035

Intuitively, the higher (lower) this number, the higher (lower) is the conditional probability that b3 in fact had occurred simultaneously. Given that a2 had occurred, we are finding the conditional probability given {xt+1 ≡ a2 }, which is in itself a proper probability distribution and thus must have probabilities that add to 1. Then, the conditional probability must be the relative size of 0.025 to the other joint probabilities in the above column. We recall Bayes’ rule on event sets, that: P(A | B) =

P(A ∩ B) , P(B)

where A and B are events or event sets in a universe. We can think of the outcome {Xt+1 = a2 } as event B and the outcome {Yt+1 = b3 } as event A. Events can be more general, as occurrences {Xt+1 = aj }, {Yt+1 = bk }, {Xt+1 = aj , Yt+1 = bk } are all events or event sets. More exactly, P(b3 | a2 ) =

0.025 P(a2 , b3 ) = = 0.125. PX (a2 ) 0.2

In general, P(Yt+1 = bk | Xt+1 = aj ) =

P(Xt+1 = aj , Yt+1 = bk ) PX (Xt+1 = aj )

P(Xt+1 = aj , Yt+1 = bk ) = 6 . k=1 P(Xt+1 = aj , Yt+1 = bk ) When we move from discrete probability distribution, where event sets consist of discrete elements, to continuous probability distribution, where event sets are continuous, such as intervals on a real line, we have to deal with continuous functions. The continuous joint probability density function (pdf) of bivariate (Xt+1 , Yt+1 ) is represented by a continuous function f(x, y) where Xt+1 = x and FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

8

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

Yt+1 = y, and x, y are usually numbers on the real line R. Note that we simplify the notations of the realised values by dropping their time subscripts here. For a continuous probability distribution, the events are described not as point values e.g. x = 3, y = 4, but rather as intervals, e.g. event A = {(x, y): − 2 < y < 3} and event B = {(x, y): 0 < x < 9.5}. Then,

9.5 3

P(A, B) = P(0 < x < 9.5, −2 < y < 3) = 0

−2

f(x, y) dy dx.

The “support” for a random variable such as Xt+1 is the range of x. For joint normal densities, the ranges are usually (−∞, ∞). Thus, Yt+1 also has the same support. It is usually harmless to use (−∞, ∞) as supports even if the range is finite [a, b], since the probabilities of null events (−∞, a) and (b, ∞) are zeros. However, when more advanced mathematics is involved, it is typically better to be precise. In addition, notice that probability is essentially an integral of a function, whether continuous or discrete, and is area under the pdf curve. The marginal probability density function of Xt+1 and Yt+1 are given by: fX (x) =

∞

−∞

f(x, y) dy

∞ and fY (y) = −∞ f(x, y) dx. Notice that while f(x, y) is a function containing both x and y, fY (y) is a function containing only y since x is integrated out. Likewise, fX (x) is a function that contains only x. The conditional probability density functions are: f(x | y) = f(x, y)/fY (y) and f(y | x) = f(x, y)/fX (x). These conditional pdf’s contain both x and y in their arguments.

1.2. EXPECTATIONS The expected value of random variable Xt+1 is given by E(Xt+1 ) =

6

aj PX (aj ) = µX for the discrete distribution in Table 1.1,

j=1

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

9

and for continuous pdf, E(Xt+1 ) =

∞

−∞

xfX (x) dx = µX .

The conditional expected value or the conditional expectation of Xt+1 | b4 is given by: E(Xt+1 | b4 ) =

6

aj P(aj | b4 ) for the discrete distribution in Table 1.1,

j=1

and for continuous pdf, E(Xt+1 | y) =

∞

−∞

xf(x | y) dx.

Notice that for the continuous pdf, the conditional expected value given y is a function containing only y. This means that one can further evaluate more specific conditional expectations based on given sets of y values e.g. {y: − 2 < y < 3}. Then, E(Xt+1 | −2 < y < 3) is found via: ∞ ∞ f(x, −2 < y < 3) xf(x | −2 < y < 3) dx = x 3 ∞ dx −∞ −∞ −2 −∞ f(x, y) dx dy ∞ 3 x f(x, y) dy dx −2 −∞ = 3 ∞ −2 −∞ f(x, y) dx dy 3 ∞

=

−2

xf(x, y) dx dy −∞ . 3 −2 fY (y) dy

The interchange of integrals in the last step in the above equation uses the Fubini Theorem assuming some mild regularity conditions satisfied by the functions. The variance of a continuous random variable Xt+1 is given by ∞ 2 (x − µx )2 fX (x) dx. var(Xt+1 ) = σX = −∞

Variance measures the degree of movement or the variability of the random variable itself. The standard deviation (s.d.) of a random variable Xt+1 is the square root of the variance. Standard deviation is sometimes referred to as volatility and sometimes as “risk” in the finance literature.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

10

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

The covariance between two continuous random variables Xt+1 and Yt+1 is given by: cov(Xt+1 , Yt+1 ) = σXY =

∞

−∞

∞

−∞

(x − µx )(y − µy )f(x, y) dx dy.

Covariance measures the degree of co-movements between two random variables. If the two random variables tend to move together, i.e. when one increases (decreases), the probability of the other increasing (decreasing) is high, then the covariance will be a positive number. If they vary inversely, then the covariance will be a negative number. If there is no co-moving relationship and each random variable moves independently, then their covariance is zero. Notice that the covariance is also an expectation or integral. The co-movement of two random variables is typically better characterised by their correlation coefficient that is the covariance normalised or divided by their s.d.’s. corr(Xt+1 , Yt+1 ) = ρXY =

σXY . σ X σY

One other advantage of using the correlation coefficient than the covariance is that the correlation coefficient is not denominated in the value units of X or Y but is a ratio. It is important to understand that the correlation measures association but not causality. In Fig. 1.3, clear changes in consumption and income are strongly positively correlated. Suppose one concludes that increasing consumption will increase income, the resulting action will be disastrous. Or, even if one simply concludes (based on some understanding of macroeconomics theory or by intuition) that increased income causes increased consumption, it may still be premature, as there are so many other possibilities and qualifications. For example, some other variables such as general education level could lead to increases in both income and consumption. Or, suppose we think of Yt+1 as GDP and Xt+1 as population. Both increase with time due to various economic and geo-political reasons. But, it will be disastrous for policy implication to think that increasing population leads to or causes increase in GDP. This has to assume fairly constant employment and output per capita.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

11

For general random variables X and Y (dropping time subscripts), we can write their means, variances, and covariance as follows. E(X) = µX E(Y ) = µY var(X) = E(X − µX )2 = E(X2 ) − µ2X var(Y ) = E(Y − µY )2 = E(Y 2 ) − µ2Y cov(X, Y ) = E(X − µX )(Y − µY ) = E(XY ) − µX µY . Covariances are linear operators. A function is f : A → B or {f : f(a) = b; a ∈ A, b ∈ B} in which A is the domain set and B the range set and each a is mapped onto one and only one element b in B. We can think of an operator as a special case of a function where the domain and range consist of normed space such as a vector space. These technicalities are not important except in more advanced courses. Now consider N number of random variables Xi , where i = 1, 2, . . ., N. A very useful property of a covariance is shown below. n N N N cov Xi , Xj = E [Xi − E(Xi )] [Xj − E(Xj )] i=1

j=1

i=1

=E

N N

j=1

[Xi − E(Xi )][Xj − E(Xj )]

i=1 j=1

=

N N

E{[Xi − E(Xi )][Xj − E(Xj )]}

i=1 j=1

=

N N

cov(Xi , Xj ).

i=1 j=1

A special case of the above is var(X + Y ) = cov(X + Y, X + Y ) = cov(X, X) + cov(X, Y ) + cov(Y, X) + cov(Y, Y ) = var(X) + var(Y ) + 2 cov(X, Y ).

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

12

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

A convenient property of a correlation coefficient ρ is that it lies between −1 and +1. This is shown as follows. For any real θ, var(X − θY ) = σX2 + θ 2 σY2 − 2θρσX σY ≥ 0. Put θ = ρ σσXY . Then, σX2 + ρ2 σX2 − 2ρ2 σX2 ≥ 0. Thus, for any random variable X and Y , σX2 (1 − ρ2 ) ≥ 0, and hence (1 − ρ2 ) ≥ 0, or ρ2 ≤ 1. Therefore, −1 ≤ ρ ≤ 1.

1.3. DISTRIBUTIONS Continuous probability distributions are commonly employed in regression analyses. The commonest probability distribution is the normal (Gaussian) distribution. The pdf of a normally distributed random variable X is given by f(x) = √

1 x−µ 2 σ

1 2πσ 2

e− 2

for −∞ < x < ∞,

where the mean of x is µ and the s.d. of x is σ. µ and σ are given constants. +∞ xf(x) dx = µ E(X) = −∞

Var(X) = E(X − µ)2 +∞ = (x − µ)2 f(x) dx −∞

= σ2. The cumulative distribution function (cdf) of X is x f(x)dx. F(X) = −∞

d

We can write the distribution of X as X ∼ N(µ, σ 2 ) in which the arguments indicate the mean and variance of the normal random variable. Suppose we define a corresponding random variable

Z=

X−µ σ

or

X = µ + σZ,

where the symbol “=” means “to define”. The second “equality” is interpreted as not just equivalence in distribution, but that whenever Z takes value z, then X

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

13

takes value x = µ + σz. Then, E(Z) = 0

and Var(Z) = 1.

Since a constant multiple of a normal random variable is normally distributed and a sum of normal random variables is also a normal random variable, then d Z ∼ N(0, 1). Z has pdf f x−µ and is called the standard normal variable. σ For normal distribution N(µ, σ 2 ), x−µ σ x−µ dz, f F(X) = σ −∞ where f x−µ is the standard normal pdf and z = x−µ σ σ . The standard normal cdf is often written as (z). For the standard normal Z, P(a ≤ z ≤ b) = (b) − (a). The normal distribution is a familiar workhorse in statistical estimation and testing. The normal distribution pdf curve is “bell-shaped”. Areas under the curve are associated with probabilities. Figure 1.4 shows a standard normal pdf N(0,1) and the associated probability as area under the curve. The corresponding z values of random variable (r.v.) Z can be seen in the following standard normal distribution Table 1.3. For example, the probability P(−∞ < Z < 1.5) = 0.933. This same probability can be written as P(−∞ ≤ Z < 1.5) = 0.933, P(−∞ < Z ≤ 1.5) = 0.933, or P(−∞ ≤ Z ≤ 1.5) = 0.933. This is because for continuous pdf, P(Z = 1.5) = 0. From the symmetry of the normal pdf, P(−a < Z < ∞) = P(−∞ < Z < a), we can also compute the following. Total area from −∞ to ∞

5%

Z 0

a = –1.645 Figure 1.4.

Standard Normal Probability Density Function of Z.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

14

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics Table 1.3. Z

Area under curve from –∞ to z

Z

Area under curve from –∞ to z

0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 1.100 1.282 1.300 1.400 1.500

0.500 0.539 0.579 0.618 0.655 0.691 0.726 0.758 0.788 0.816 0.841 0.864 0.900 0.903 0.919 0.933

1.600 1.645 1.700 1.800 1.960 2.000 2.100 2.200 2.300 2.330 2.400 2.500 2.576 2.600 2.700 2.800

0.945 0.950 0.955 0.964 0.975 0.977 0.982 0.986 0.989 0.990 0.992 0.994 0.995 0.996 0.997 0.998

P (Z > 1.5) = 1 − P (−∞ < Z ≤ 1.5) = 1 − 0.933 = 0.067. P (−∞ < Z ≤ −1.0) = P (Z > 1.0) = 1 − 0.841 = 0.159. P (−1.0 < Z < 1.5) = P (−∞ < Z < 1.5) − (−∞ < Z ≤ −1.0) = 0.933 − 0.159 = 0.774. P (Z ≤ −1.0 or Z ≥ 1.5) = 1 − P (−1.0 < Z < 1.5) = 1 − 0.774 = 0.226. Several values of Z under N(0,1) are commonly encountered, viz. 1.282, 1.645, 1.960, 2.330, and 2.576. P (Z > 1.282) = 0.10 or 10%. P (Z < −1.645 or Z > 1.645) = 0.10 or 10%. P (Z > 1.960) = 0.025 or 2.5%. P (Z < −1.960 or Z > 1.960) = 0.05 or 5%. P (Z > 2.330) = 0.01 or 1%. P (Z < −2.576 or Z > 2.576) = 0.01 or 1%. The case for P (Z < −1.645) = 5% is shown in Fig. 1.4. The bivariate normal distribution of random variables X, Y is given by f(x, y) =

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

1 1 e− 2 q , 2πσX σY 1 − ρ2

(1.1)

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

15

where 1 q= 1 − ρ2

x − µX σX

2

x − µX − 2ρ σX

y − µY σY

+

y − µY σY

2

x,y) = ρ. and cov( σX σY The multivariate normal distribution pdf (p-variate normal pdf) is given by −1 1 1 T (x − µ) , f x1 , x2 , . . . , xp = exp − (x − µ) 2 (2π)p/2 ||1/2

where x is the vector of random variables X1 to Xp , µ is the p × 1 vector of means of x, and is the p × p covariance matrix of x. If p = 2 is substituted into the above, the bivariate pdf shown in Eq. (1.1) be obtained. can k The kth moment of random variable X is x f(x) dx where f(x) is the pdf of X.k If µ = E(X) is the mean of X, the kth central moment of X is (x − µ) f(x) dx. Notice that the variance is the second central moment of X. The third central moment ÷ variance3/2 is known as skewness. The fourth central moment ÷ variance2 is known as kurtosis. The normal distribution r.v. X ∼ N(µ, σ 2 ) has mean µ, variance σ 2 , skewness 0, and kurtosis that is equal to 3. Hence, the standard normal variate Z ∼ N(0, 1) has a mean 0, variance 1, skewness 0, and kurtosis 3. Many financial variables, e.g. daily stock returns, currency rate of change, etc. display skewness as well as large kurtosis compared with the benchmark normal distribution with symmetrical pdf, skewness = 0, and kurtosis = 3. Departure from normality is illustrated by a pdf in Fig. 1.5. The shaded area in Fig. 1.5 shows a normal pdf. The unshaded curve shows pdf of a random variable with negative skewness, a kurtosis larger than that of the normal random variable, and mean µ < 0. The concept of stochastic independence between random variables is important. Two random variables X and Y are said to be stochastically independent if and only if their joint pdf can be expressed as follows: f(X, Y ) = fx (X)fy (Y ). One implication of the above is that for any function h(.) of X and any function g(.) of Y , their expectation can be found as: E(h(X)g(Y )) = E(h(X))E(g(Y )). FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

16

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

f(x) Negative or left skewness (longer left tail)

Fat tails with kurtosis > 3

x

µ Figure 1.5.

0

Example of a Pdf with Negative Skewness and Large Kurtosis.

A special case is the covariance operator. If X and Y are (stochastically) independent, then it implies that their covariance is zero: cov(X, Y ) = E(X − µX )(Y − µY ) = E(X − µX )E(Y − µY ) = 0. The converse is not always true. It is true only for special cases such as when X and Y are jointly normally distributed. When X and Y are jointly normally distributed, then if they have zero covariance, they are stochastically independent. For bivariate normal pdf, conditional pdf g(x | y) =

f(x, y) . fY (y)

Or, 1√

g(x | y) =

2πσX σY

q

1−ρ2

− 1 √ e 2 σY 2π

1

e− 2

y−µY σY

2

x−µ

y−µ

2

1 X −ρ Y − 1 σX σY 2 = e 2(1−ρ ) √ σX 2π 1 − ρ2 σX 1 2 − 1 2 2 [(x−µX )−ρ σY (y−µY )] = e 2(1−ρ )σX 2πσX2 (1 − ρ2 )

=

1

e

2 2πσX|Y

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

−

1 2 2σX|Y

(x−µX|Y )2

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

17

2 where σX|Y = (1 − ρ2 )σx2 is the variance of X conditional on Y = y, and µX|Y = µX + ρ σσXY (y − µY ) is the mean of X conditional on Y = y. There are some common continuous probability distributions that are related d

to the normal distribution. If random variable X ∼ N(µ, σ 2 ), then random vari2 ∼ χ12 is a chi-square distribution with 1 degree of freedom. able V = X−µ σ If X1 , X2 , X3 , . . . , Xn are n random variables each independently drawn from the same population distribution N(µ, σ 2 ), or think of {Xi }i=1 to n as a random 2 sample of size n, then ni=1 Xiσ−µ ∼ χn2 is a chi-square distribution with n degrees of freedom. d

d

If X ∼ N(0, 1), and V ∼ χr2 , and both X and V are stochastically independent, then

√X Vr −1

d

is a Student-t distribution with r degrees of freedom. If U ∼ χr21 , Ur −1 d

d

1 ∼ Fr1 ,r2 V ∼ χr22 , and both U and V are stochastically independent, then Vr−1 2 is an F -distribution with degrees of freedom r1 and r2 . If random variable d X ∼ N µ, σ 2 and Y = exp(X) or X = ln(Y ), then Y is a random variable with a lognormal distribution.

1.4. STATISTICAL ESTIMATION Suppose a random variable X with a fixed normal distribution N(µ, σ 2 ) is given. Suppose there is a random draw of a number or an outcome from this distribution. This is the same as stating that random variable X takes a realised value x. Let this value be x1 ; it may be say 3.89703. Suppose we repeatedly make random draws and thus form a sample of n observations: x1 , x2 , x3 , . . . , xn−1 , xn . This is called a random sample with a sample size of n. Each xi comes from the same distribution N(µ, σ 2 ), but each of xi and xj are realisations from independent sampling. We next compute a statistic, which is a function of the realised values {xk }, k = 1, 2, . . . , n. Consider a statistic, the sample mean. x¯ = n1 nk=1 xk . Another common sample statistic is the unbiased sample variance 1 s = (xk − x¯ )2 . n−1 n

2

k=1

Each time we select a random sample of size n, we obtain a realisation x¯ . Thus, x¯ is itself a realisation of a random variable, and this r.v. can be FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

18

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

denoted by ¯n = X

1 Xk , n N

k=1

¯ n is a where Xk above is clearly the random variable from N(µ, σ 2 ) itself. X random variable and its probability distribution is called the sampling distribution of the mean or perhaps more clearly, the distribution of the sample mean. ¯ n? What is the exact probability distribution of X ¯ n) = E(X

1 1 1 E Xk = E(Xk ) = µ = µ. n n n n

n

k=1

¯ n) = var(X

1 var n2

n

k=1

n

Xk =

k=1

1 n2

k=1

n

var(Xk ) =

k=1

nσ 2 σ2 . = n2 n

¯ n is a normal random variable, therefore, Since X σ2 ¯ Xn ∼ N µ, . n The standardised normal random variable then becomes √ ¯ n − µ) ¯n−µ n(X X ∼ N(0, 1). = σ σ2 n

On the other hand, E(s2 ) = σ 2 . But s2 itself is a sampling distribution. 2

d

2 2 (n − 1) σs 2 ∼ χn−1 . It can be seen that E(χn−1 ) = n − 1, the number of degrees of freedom of the chi-square random variable. Therefore, √ ¯ n −µ) n(X σ s2 σ2

√ ¯ n − µ) n(X = s

is distributed as Student-t with (n−1) degrees of freedom and zero mean. Denote the random variable with t-distribution, n − 1 degrees of freedom, as tn−1 . Then, √ ¯ n − µ) d n(X ∼ tn−1 . s Suppose we find (−a, +a), a > 0, such that Prob(−a ≤ tn−1 ≤ +a) = 95%. Since tn−1 is symmetrically distributed, then Prob(−a ≤ tn−1 ) = 97.5% and FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

19

Prob(tn−1 ≤ +a) = 97.5%. Thus, √ ¯ n − µ) n(X ≤ a = 0.95. Prob −a ≤ s ¯ n + a √s ) = 0.95. ¯ n − a √s ≤ µ ≤ X Also, Prob(X n n d

Suppose x1 , x2 , x3 , . . . , xn−1 , xn are randomly sampled from X ∼ N(µ, σ 2 ). Sample size n = 30. The t-statistic value such that Prob(t29 ≤ a) = 97.5% is a = 2.045. Then, s s ¯ n − 2.045 √ ≤ µ ≤ X ¯ n + 2.045 √ Prob X = 0.95. 30 30 Hence, the 95% confidence interval estimate of µ is given by s s ¯ n + 2.045 √ ¯ n − 2.045 √ , X X 30 30 when estimated s is entered.

1.5. STATISTICAL TESTING In many situations, there is a priori (or ex-ante) information about the value of the mean µ, and it may be desirable to use observed data to test if the information is correct. µ is called a parameter of the population or fixed distribution N(µ, σ 2 ). A statistical hypothesis is an assertion about the true value of the population parameter, in this case µ. A simple hypothesis specifies a single value for the parameter, while a composite hypothesis will specify more than one value. We will work with the simple null hypothesis H0 (sometimes this is called the maintained hypothesis), which is what is postulated to be true. The alternative hypothesis HA is what will be the case if the null hypothesis is rejected. Together the values specified under H0 and HA should form the total universe of possibilities of the parameter. For example, H0 : µ = 1 HA : µ = 1. A statistical test of the hypothesis is a decision rule that, given the inputs from the sample values and hence sampling distribution, chooses to either reject or else not reject (intuitively similar in meaning to “accept”) the null H0 . Given this rule, the set of sample outcomes or sample values that lead to the rejection of the H0 is called the critical region. If H0 is true but is rejected, a Type I error is committed. If H0 is false but is accepted, a Type II error is committed. FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

20

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

tn–1

X

–a Figure 1.6.

0

+a

Critical Region Under the Null Hypothesis H0 : µ = 1.

The statistical rule on H0 : µ = 1, HA : µ = 1, is that if the test statistic ¯ tn−1 = (X√ns−1) which is t-distributed with (n−1) degrees of freedom, falls within n

the critical region (shaded), defined as {tn−1 < −a or tn−1 > +a}, a > 0, as shown in Fig. 1.6, then H0 is rejected in favour of HA . Otherwise, H0 is not rejected and is “accepted”. If H0 is true, then the t-distribution would be correct, and therefore the probability of rejecting H0 would be the area of the critical region, say 5% in this case. Notice that for n = 61, P(−2.00 < t60 < 2.00) = 0.95. Moreover, the t-distribution is symmetrical, so each of the right and left shaded tails makes up 2.5%. This is called a two-tailed test with a significance level of 5%. The significance level is the probability of committing a Type I error when H0 is true. In the above example, if the sample t-statistic is 1.045, then it is 1.045) = 2 × 0.15 = 0.30 or 30%. Another way to verify the test is that if the p-value < test significance level, reject H0 ; otherwise H0 cannot be rejected. In theory, if we reduce the probability of Type I error, the probability of Type II error increases, and vice versa. This is illustrated in Fig. 1.7. Suppose H0 is false, and µ > 1, so the true tn−1 distribution is represented by the dotted curve in Fig. 1.7. The critical region {tn−1 < −2.00 or tn−1 > 2.00} remains the same, so the probability of committing Type II error is 1− sum of shaded areas. Clearly, this probability increases as we reduce the critical region in order to reduce Type I error. Although it is ideal to reduce both types of errors, the tradeoff forces us to choose between the two. In practice, we fix the probability of Type I error when H0 is true, i.e. determine a fixed significance level e.g. 10%, 5%, or 1%. The power of a test is the probability of rejecting H0 when it is false. FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

21

pdf f(X) tn−1

X −2

0

2

Figure 1.7.

Thus, power = 1− P(Type II error). Or, power equals the shaded area in Fig. 1.7. Clearly, this power is a function of the alternative parameter value µ = 1. We may determine such a power function of µ = 1. Thus, reducing significance level also reduces power and vice versa. In statistics, it is customary to want to design a test so that its power function of µ = 1 equals or exceeds that of any other test with equal significance level for all plausible parameter values µ = 1 in HA . If this test is found, it is called a uniformly most powerful test. We have seen the performance of a two-tailed test. Sometimes, we embark instead on a one-tailed test such as H0 : µ = 1, HA : µ > 1, in which we theoretically rule out the possibility of µ < 1, i.e. P(µ < 1) = 0. In this case, it makes sense to limit the critical region to only the right side, for when µ > 1, then tn−1 will become larger. Thus, at the one-tail 5% significance level, the critical region under H0 is {tn−1,95% > 1.671} for n = 61 where tn−1,95% is the 95th percentile of the t distribution with n − 1 d.f.

1.6. DATA TYPES Consider the types of data series that are commonly encountered in regression analyses. There are four generic types, viz. (a) (b) (c) (d)

Time series, Cross-sectional, Pooled time series cross-sectional, and Panel/longitudinal/micropanel.

Time series are the most prevalent in empirical studies in finance. They are data indexed by time. Each data point is a realisation of a random variable at a

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

13:41

9in x 6in

22

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

particular point in time. The data occur as a series over time. A sample of such data is typically a collection of the realised data over time such as the history of ABC stock’s prices on a daily basis from 1970 January 2 till 2002 December 31. Cross-sectional data are also common in finance. An example is the reported annual net profit of all companies listed on an exchange for a specific year. If we collect the cross sections for each year over a 20-year period, then we have a pooled time series cross section of companies over 20 years. Panel data are less used in finance. They are data collected by tracking specific individuals or subjects over time and across subjects. The nature of data also differs according to the following categories. (a) Quantitative, (b) Ordinal e.g. very good, good, average, and poor, and (c) Nominal/categorical e.g. married/not married, college graduate/nongraduate. Quantitative data such as return rates, prices, volume of trades, etc. have the least limitations and therefore the greatest use in finance. These data provide not only ordinal rankings or comparisons of magnitudes, but also exact degrees of comparisons. There are some limitations and therefore special considerations to the use of the other categories of data. In the treatment of ordinal and nominal data, we may have to use specific tools such as dummy variables in regression.

1.7. PROBLEM SET (1.1) X, Y, Z are r.v.’s with a joint pdf f (X, Y, Z) that is integrable. Show using the concept of marginal pdf’s that E(X + Y + Z) = E(X) + E(Y) + E(Z) by integrating over (X + Y + Z). N (1.2) Show how one could express cov( N i=1 Xi , j=1 Xj ) in terms of the N by N covariance matrix N×N ? (1.3) The following is the probability distribution table of a trivariate U1 , U2 , and U3 . U1 −1 −1 −1 −1 1 1 1 −2 −2 2 2 −2 −2 2 U2 U3 −3 3 −3 3 −3 3 −3 P(U1 ,U2 ,U3 ) 0.125 0.125 0.125 0.125 0.125 0.125 0.125

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

1 2 3 0.125

February 1, 2011

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Probability Distribution and Statistics

23

Find the bivariate probability distribution P(U1 , U2 ). Find the marginal P(U3 ). (1.4) In the probability distribution table of a trivariate U1 , U2 , and U3 , U1 −1 −1 −1 −1 1 1 1 U2 −2 −2 2 2 −2 −2 2 −3 3 −3 3 −3 3 −3 U3 P(U1 ,U2 ,U3 ) 0.125 0.125 0.125 0.125 0.125 0.125 0.125

1 2 3 0.125

after finding P(U1 , U2 ), suppose Yi = bXi + Ui , i = 1, 2, and X1 = 1, X2 = 2, (i) Find E(Ui )’s and cov(U1 , U2 ). 2 (ii) Find the probability distribution of estimator bˆ = i=1 Xi Yi 2 ( i=1 Xi2 )−1 . This probability distribution of the estimator is called ˆ the sampling distribution of b. (iii) Find the mean and variance of bˆ from its probability distribution. (1.5) X and Y have joint pdf f(X, Y) = exp(−X − Y) for 0 < X, Y < ∞, and pdf is 0 elsewhere. Find the marginal pdf’s of X and Y . Are X and Y stochastically dependent? (1.6) X and Y have a joint pdf f(X, Y) = 1 in the set {0 ≤ X ≤ 2, 0 ≤ Y ≤ X/2}. (i) Find the marginal distributions of X and Y . (ii) Find the variances of X and Y , and the covariance of X and Y . (iii) Find the conditional means E(X | Y), E(Y | X), and conditional variances var(X | Y), var(Y | X). (1.7) Xit is distributed as independent univariate normal, N(0, 1) for i = 1, 2, 3, and t = 1, 2, . . . , 60. Yt = 0.5X1t + 0.3X2t + 0.2X3t . What are the mean and the standard deviation of Yt ? If a computer program runs and churns out 3K number of random values Zj belonging to univariate normal N(0, 1) distribution, and Wi = 0.5Z3i−2 + 0.3Z3i−1 + 0.2Z3i for i = 1, 2, . . . , K, what is the variance of the sampling mean K−1 K i=1 Wi ? 1 (1.8) Suppose r.v. Xi ∼ N(0, 60 ) for i = 1, 2, . . . , K, and Xi and Xj are independent when i = j. If AXi ∼ N(0, 1) where A is a constant, what is A? If random vector Y = (X1 , X2 , . . . , XK ), what is the distribution of YYT ? (1.9) If cov(a, b) = 0.1, cov(c, a) = 0.2, cov(d, a) = 0.3, and x = b+2c+3d, what is cov(a, x)?

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html

February 1, 2011

24

13:41

9in x 6in

b1016-ch01

Financial Valuation and Econometrics

Financial Valuation and Econometrics

(1.10) Suppose X, Y , and Z are jointly distributed as follows. Probability

X

Y

Z

0.5 0.5

+1 −1

−1 0

0 +1

Find cov(X, Y), cov(X, Z), and cov(Y, Z).

FURTHER RECOMMENDED READINGS 1. Mood, A.M., E.A. Graybill and D.C. Boes, Third or later editions, Introduction to the Theory of Statistics, McGraw-Hill publisher. 2. Hogg, R.V. and A.T. Craig, Introduction to Mathematical Statistics, Fourth or later editions, Collier MacMillan publisher.

FINANCIAL VALUATION AND ECONOMETRICS © World Scientific Publishing Co. Pte. Ltd. http://www.worldscibooks.com/economics/7782.html