Solved Exercises and Problems of Statistical Inference

38 downloads 7810 Views 4MB Size Report
Solved Exercises and Problems of. Statistical. Inference. David Casado. 4th November 2013. Complutense University of Madrid. ∟ Faculty of Economic and  ...
Solved Exercises and Problems of

Statistical Inference David Casado

Complutense University of Madrid ∟ Faculty of Economic and Business Sciences

5 June 2015

∟ Department of Statistics and Operational Research II ∟ David Casado de Lucas You can decide not to print this file and consult it in digital format – paper and ink will be saved. Otherwise, print it on recycled paper, double-sided and with less ink. Be ecological. Thank you very much.

Contents Links, Keywords and Descriptions

Inference Theory (IT) Framework and Scope of the Methods Some Remarks Sampling Probability Distribution

Point Estimations (PE) Methods for Estimating Properties of Estimators Methods and Properties

Confidence Intervals (CI) Methods for Estimating Minimum Sample Size Methods and Sample Size

Hypothesis Tests (HT) Parametric

1–6

7 – 12 7 7–9 9 – 12

13 – 73 13 – 27 27 – 64 64 – 73

74 – 93 74 – 81 81 – 83 83 – 93

94 – 142 94 – 125

Based on T Based on Λ Analysis of Variance (ANOVA)

94 – 117 117 – 122 122 – 125

Nonparametric Parametric and Nonparametric

126 – 137 138 – 142

PE – CI – HT

143 – 153

Additional Exercises

154 – 169

Appendixes

170 – 191

Probability Theory Some Reminders Markov's Inequality. Chebyshev's Inequality Probability and Moments Generating Functions. Characteristic Function.

Mathematics Some Reminders

170 – 175 170 170 – 171 171 – 172

191 – 209 191 – 192

Limits

References

192 – 194

210

Tables of Statistics

211 – 217

Probability Tables

218 – 222

Index

223 – 225

Prologue These exercises and problems are a necessary complement to the theory included in Notes of Statistical Inference, available at http://www.casado-d.org/edu/NotesStatisticalInference-Slides.pdf. Nevertheless, some important theoretical details are also included in the remarks at the beginning of each chapter. Those Notes are thought for teaching purposes, and they do not include the advanced mathematical justifications and calculations included in this document. Although we can study only linearly and step by step, it is worth noticing that methods are usually related—as tasks are in the real-world—in Statistical Inference. Thus, in most exercises and problems we have made it clear which are the suppositions and how they should be proved properly. In same cases, several statistical methods have been “naturally” combined in the statement. Many steps and even sentences are repeated in most exercises of the same type, both to insist on them and to facilitate the reading of exercises individually. The advanced exercises have been marked with the symbol (*). Written in Courier New font style is the code with which we have done some calculation by using the programming language R—you can copy and paste this code from the file. I include some notes to help, up to my knowledge, students with a mother language different to the English.

Acknowledgements This document has been created with Linux, LibreOffice, OpenOffice.Org, GIMP and R. I thank those who make this software available for free. I donate funds to these kinds of project from time to time.

Links, Keywords and Explanations Inference Theory (IT) Framework and Scope of the Methods > [Keywords] infinite populations, independent populations, normality, asymptoticness, descriptive statistics. > [Description] The conditions under which the Statistics considered here can be applied are listed.

Some Remarks > [Keywords] partial knowledge, randomness, certainty, dimensional analysis, validity, use of the samples, calculations. > [Description] The partial knowledge justifies both the random character of the mathematical variables used to explain the variables of the real-world problems and the impossibility of reaching the maximum certainty in using samples instead of the whole population. The validity of the results must be understood within the scenario made of the assumptions, the methods, the certainty and the data.

Sampling Probability Distribution Exercise 1it-spd > [Keywords] inference theory, joint distribution, sampling distribution, sample mean, probability function. > [Description] From a simple probability distribution for X, the joint distribution of a sample (X1,X2) and the sampling distribution of the sample mean X are determined.

Point Estimations (PE) Methods for Estimating Exercise 1pe-m > [Keywords] point estimations, binomial distribution, Bernoulli distribution, method of the moments, maximum likelihood method, plug-in principle. > [Description] For the binomial distribution, the two methods are applied to estimate the second parameter (probability), when the first (number of trials) is known. In the second method, the maximum can be found by looking at the derivatives. Both methods provide the same estimator. The plug-in principle allows using the previous estimator to obtain others for the mean and the variance.

Exercise 2pe-m > [Keywords] point estimations, geometric distribution, method of the moments, maximum likelihood method, plug-in principle. > [Description] For the geometric distribution, the two methods are applied to estimate the parameter. In the second method, the maximum can be found by looking at the derivatives. Both methods provide the same estimator. The plug-in principle is applied to use the previous estimator to obtain others for the mean and the variance.

Exercise 3pe-m > [Keywords] point estimations, Poisson distribution, method of the moments, maximum likelihood method, plug-in principle. > [Description] For the Poisson distribution, the two methods are applied to estimate the parameter. In the second method, the maximum can be found by looking at the derivatives. The two methods provide the same estimator. The plug-in principle is applied to use the previous estimator to obtain others for the mean and the variance.

Exercise 4pe-m > [Keywords] point estimations, normal distribution, method of the moments, maximum likelihood method. > [Description] For the normal distribution, the two methods are applied to estimate at the same time the two parameters of this distribution. In the second method, the maximum can be found by looking at the derivatives. The two methods provide the same estimator.

Exercise 5pe-m > [Keywords] point estimations, (continuous) uniform distribution, method of the moments, maximum likelihood method, plug-in principle, integrals. > [Description] For the continuous uniform distribution, the two methods are applied to estimate the parameter. In the second method, the maximum cannot be found by looking at the derivatives and this task is done by applying simple qualitative reasoning. The two methods provide different estimators. The plug-in principle allows using the previous estimator to obtain others for the mean and the variance. As a mathematical exercise, the theoretical expression of the mean and the variance are calculated.

Exercise 6pe-m > [Keywords] point estimations, (translated) exponential distribution, method of the moments, maximum likelihood method, plug-in principle, integrals. > [Description] For a translation of the exponential distribution, the two methods are applied to estimate the parameter. In the second method, the maximum can be found by looking at the derivatives. The two methods provide the same estimator. The plug-in principle is applied to use the previous estimator to obtain others for the mean. As a mathematical exercise, the theoretical expression of the mean and the variance of the distribution are calculated.

Exercise 7pe-m > [Keywords] point estimations, method of the moments, maximum likelihood method, plug-in principle, integrals. > [Description] For a distribution given through its density function, the two methods are applied to estimate the parameter. In the second method, the maximum cannot be found by looking at the derivatives and this task is done by applying simple qualitative

1

Solved Exercises and Problems of Statistical Inference

reasoning. The two methods provide different estimators. The plug-in principle is applied to obtain other estimators for the mean and the variance. Additionally, the theoretical expression of the mean and the variance of this distribution are calculated.

Properties of Estimators Exercise 1pe-p > [Keywords] point estimations, probability, normal distribution, sample mean, completion (standardization). > [Description] For a normal distribution with known parameters, the probability that the sample mean is larger than a given value is calculated.

Exercise 2pe-p > [Keywords] point estimations, probability, normal distribution, sample quasivariance, completion. > [Description] For a normal distribution with known standard deviation, the probability that the sample quasivariance is larger than a given value is calculated.

Exercise 3pe-p > [Keywords] point estimations, probability, Bernoulli distribution, sample proportion, completion (standardization), asymptoticness. > [Description] For a Bernoulli distribution with known parameter, the probability that the sample proportion is between two given values is calculated.

Exercise 4pe-p > [Keywords] point estimations, probability and quantile, normal distribution, sample mean, sample quasivariance, completion. > [Description] For two (independent) normal distributions with known parameters, probabilities and quantiles of several events involving the sample mean or the sample quasivariance are calculated or found out, respectively.

Exercise 5pe-p > [Keywords] point estimations, probability, normal distribution, total sum, completion, bound. > [Description] For two (independent) normal distributions with known parameters, the probabilities of several events involving the total sum are calculated.

Exercise 6pe-p > [Keywords] point estimations, trimmed sample mean, mean square error, consistency, sample mean, rate of convergence. > [Description] To study the population mean, the mean square error and the consistency are studied for the trimmed sample mean. The speed in converging is analysed through a comparison with that of the (ordinary) sample mean.

Exercise 7pe-p > [Keywords] point estimations, chi-square distribution, mean square error, consistency. > [Description] To study twice the mean of a chi-square population, the mean square error and the consistency are studied for a given estimator.

Exercise 8pe-p > [Keywords] point estimations, mean square error, relative efficiency. > [Description] For a sample of size two, the mean square errors of two given estimators are calculated and compared by using the relative efficiency.

Exercise 9pe-p > [Keywords] point estimations, sample mean, mean square error, consistency, efficiency (under normality), Cramér-Rao's lower bound. > [Description] That the sample mean is always a consistent estimator of the population mean is proved. When the population is normally distributed, this estimator is also efficient.

Exercise 10pe-p > [Keywords] point estimations, (continuous) uniform distribution, probability function, sample mean, consistency, efficiency, unbiasedness. > [Description] For a population variable following the continuous uniform distribution, the density function is plotted. The consistency and the efficiency of the sample mean, as an estimator of the population mean, are studied. Looking at the bias obtained, a new unbiased estimator of the population mean is built, and its consistency is proved.

Exercise 11pe-p > [Keywords] point estimations, geometric distribution, sufficiency, likelihood function, factorization theorem. > [Description] When a population variable follows the geometric distribution, a (minimum-dimension) sufficient statistic for studying the parameter is found by applying the factorization theorem.

Exercise 12pe-p (*) > [Keywords] point estimations, basic estimators, population mean, Bernoulli distribution, population proportion, normality, population variance, mean square error, consistency, rate of convergence. > [Description] The mean square error is calculated for all basic estimators of the mean, the proportion (for Bernoulli populations) and the variance (for normal populations). Then, their consistencies in mean of order two and in probability are studied. For two populations, the two-variable limits that appear are studied by splitting them into two one-variable limits or by binding them.

Exercise 13pe-p (*) > [Keywords] point estimations, basic estimators, normality, population variance, mean square error, consistency, rate of convergence. > [Description] For the basic estimators of the variance of normal populations, the mean square errors are compared for one and two populations. The computer is used to compare graphically the coefficients that appear in the expression of the mean square errors. Besides, the consistency is also graphically studied.

Exercise 14pe-p (*) > [Keywords] point estimations, Bernoulli distribution, normal distribution, mean square error, consistency, pooled sample proportion, pooled sample variance, rate of convergence.

2

Solved Exercises and Problems of Statistical Inference

> [Description] The mean square error is calculated for some pooled estimators of the proportion (for Bernoulli populations) and the variance (for normal populations). Then, their consistencies in mean of order two and in probability are studied. For pooled estimators, one sample size tending to infinite suffices, that is, one sample can “do the whole work”. Each pooled estimator—for the proportion of a Bernoulli population and for the variance of a normal population—is compared with the “natural” estimator consisting in the semisum of the estimators of the two populations. The computer is also used to compare graphically the coefficients that appear in the expression of the mean square errors. The consistency can be studied graphically.

Methods and Properties Exercise 1pe > [Keywords] point estimations, method of the moments, mean square error, consistency, maximum likelihood method. > [Description] Given the density function of a population variable, the method of the moments is applied to find an estimator of the parameter; the mean square error of this estimator is calculated; finally, its consistency is studied. On the other hand, the maximum likelihood method is applied too; the maximum cannot be found by using the derivatives and some qualitative reasoning is necessary. A simple analytical calculation suffices to see how the likelihood function depends upon the parameter. The two methods provide different estimators.

Exercise 2pe > [Keywords] point estimations, Rayleigh distribution, method of the moments, mean square error, consistency, maximum likelihood method. > [Description] Supposed a population variable following the Rayleigh distribution, the method of the moments is applied to build an estimator of the parameter; the mean square error of this estimator is calculated and its consistency is studied. The maximum likelihood method is also applied to build an estimator of the parameter. For this population distribution, both methods provide different estimators. As a mathematical exercise, the expressions of the mean and the variance are calculated.

Exercise 3pe > [Keywords] point estimations, exponential distribution, method of the moments, maximum likelihood method, sufficiency, likelihood function, factorization theorem, sample mean, efficiency, consistency, plug-in principle. > [Description] A deep statistical study of the exponential distribution is carried out. To estimate the parameter, two estimators are obtained by applying both the method of the moments and the maximum likelihood method. For this population distribution, both methods provide the same estimator. A sufficient statistic is found. The sample mean is studied as an estimator of the parameter and the inverse of the parameter. In this exercise, it is highlighted how important the mathematical notation may be in doing calculations.

Confidence Intervals (CI) Methods for Estimating Exercise 1ci-m > [Keywords] confidence intervals, method of the pivot, asymptoticness, normal distribution, margin of error. > [Description] The method of the pivot is applied twice to construct asymptotic confidence intervals for the mean and the standard deviation of a normally distributed population variable with unknown mean and variance. For the first interval, the expression of the margin of error is used to obtain the confidence when the length of the interval is one unit.

Exercise 2ci-m > [Keywords] confidence intervals, method of the pivot, asymptoticness, normal distribution, margin of error. > [Description] The method of the pivot is applied to construct an asymptotic confidence interval for the mean of a population variable with unknown variance. There was a previous estimate of the mean that is inside the interval obtained. The value of the margin of error is explicitly given.

Exercise 3ci-m > [Keywords] confidence intervals, method of the pivot, Bernoulli distribution, asymptoticness. > [Description] The method of the pivot is applied to construct an asymptotic confidence interval for the proportion of a population variable following the Bernoulli distribution.

Exercise 4ci-m > [Keywords] confidence intervals, asymptoticness, method of the pivot, Bernoulli distribution, pooled sample proportion. > [Description] A confidence interval for the difference between two proportions is constructed by applying the method of the pivot. The interval allows us to make a decision about the equality of the proportions, which is equivalent to applying a two-tailed hypothesis test. As an advanced task, the exercise is repeated with the pooled sample proportion in the denominator of the statistic (estimation of the variances of the populations), not in the numerator (estimation of the difference between the means).

Minimum Sample Size Exercise 1ci-s > [Keywords] confidence intervals, minimum sample size, normal distribution, method of the pivot, margin of error, Chebyshev's inequality. > [Description] To find the minimum number of data necessary to guarantee theoretically the precision desired, two methods are applied: one based on the expression of the margin of error and the other based on the Chebyshev's inequality.

Methods and Sample Size Exercise 1ci > [Keywords] confidence intervals, minimum sample size, normal distribution, method of the pivot, margin of error, Chebyshev's inequality. > [Description] A confidence interval for the mean of a normal population is built by applying the method of the pivotal quantity. The dependence of the length of the interval with the confidence is analysed qualitatively. Given all the other quantities, the minimum

3

Solved Exercises and Problems of Statistical Inference

sample size is calculated in two different ways: with the method based on the expression of the margin of error and the method based on the Chebyshev's inequality.

Exercise 2ci > [Keywords] confidence intervals, minimum sample size, asymptoticness, normal distribution, method of the pivot, margin of error, Chebyshev's inequality. > [Description] An asymptotic confidence interval for the mean of a population random variable is constructed by applying the method of the pivotal quantity. The equivalent exact confidence interval can be obtained under the supposition that the variable is normally distributed. Given all the other quantities, the minimum sample size is calculated in two different ways: with the method based on the expression of the margin of error and the method based on the Chebyshev's inequality.

Exercise 3ci > [Keywords] confidence intervals, minimum sample size, normal distribution, method of the pivot, margin of error, Chebyshev's inequality. > [Description] A confidence interval for the mean of a normal population is built by applying the method of the pivotal quantity. Given all the other quantities, the minimum sample size is calculated in two different ways: with the method based on the expression of the margin of error and the method based on the Chebyshev's inequality. The dependence of the length of the interval upon the confidence is analysed qualitatively.

Exercise 4ci > [Keywords] confidence intervals, minimum sample size, normal distribution, method of the pivot, margin of error, Chebyshev's inequality. > [Description] The method of the pivot allows us to construct a confidence interval for the difference between the means of two (independent) normal populations. Given the other quantities and supposing equal sample sizes, the minimum value is calculated by applying two different methods: one based on the expression of the margin of error and the other based on the Chebyshev's inequality.

Hypothesis Tests (HT) Parametric Based on T Exercise 1ht-T > [Keywords] hypothesis tests, normal distribution, two-tailed test, population mean, critical region, p-value, type I error, type II error, power function. > [Description] A decision on the equality of the population mean (of a variable) to a given number is made by applying a two-sided test and looking at both the critical values and the p-value. The two types of error are determined. With the help of a computer, the power function is plotted.

Exercise 2ht-T > [Keywords] hypothesis tests, normal population, one-tailed test, population standard deviation, critical region, p-value, type I error, type II error, power function. > [Description] A decision on whether the population standard deviation (of a variable) is smaller than a given number is made by applying a one-tailed test and looking at both the critical values and the p-value. The expression of the type II error is found. With the help of a computer, the power function is plotted. Qualitative analysis on the form of the alternative hypothesis is done. The assumption that the population variable follows the normal distribution is necessary to apply the results for studying the variance.

Exercise 3ht-T > [Keywords] hypothesis tests, normal population, one- and two-tailed tests, population variance, critical region, p-value, type I error, type II error, power function. > [Description] The equality of the population variance (of a variable) to a given number is tested by considering both one- and two-tailed alternative hypotheses. Decisions are made after looking at both the critical values and the p-value. In the two cases, the expression of the type II error is found and the power function is plotted with the help of a computer. The power functions are graphically compared, and the figure shows that the one-sided test is uniformly more powerful than the two-sided test.

Exercise 4ht-T > [Keywords] hypothesis tests, normal population, one- and two-tailed tests, population variance, critical region, p-value, type I error, type II error, power function, statistical cook. > [Description] From the hypotheses of a one-sided test on the population variance (of a variable), different ways are qualitatively and quantitatively considered for the opposite decision to be made.

Exercise 5ht-T > [Keywords] hypothesis tests, normal populations, one- and two-tailed tests, population standard deviation, critical region, p-value, type I error, type II error, power function. > [Description] A decision on whether the population standard deviation (of a variable) is equal to a given value is made by applying three possible alternative hypotheses and looking at both the critical values and the p-value. The type II error is calculated and the power function is plotted. The power functions are graphically compared: the figure shows that the one-sided tests are uniformly more powerful than the two-sided test.

Exercise 6ht-T > [Keywords] hypothesis tests, Bernoulli populations, one-tailed tests, population proportion, critical region, p-value, type I error, type II error, power function. > [Description] A decision on whether the population proportion is higher in one population is made after allocating this inequality in the null hypothesis, firstly, and the alternative hypothesis, secondly. Two methodologies are considered, one based on the critical values and the other based on the p-value. In both tests, the type II error is calculated and the power function is plotted. The

4

Solved Exercises and Problems of Statistical Inference

symmetry of the power functions of the two cases is highlighted. As an advanced section, the pooled sample proportion is used to estimate the variance of the populations (in the denominator of the statistic), but not to estimate the difference between the population proportions (in the numerator of the statistic).

Based on Λ Exercise 1ht-Λ > [Keywords] hypothesis tests, Neyman-Pearson's lemma, likelihood ratio test, critical region, Poisson distribution, exponential distribution, Bernoulli distribution, normal distribution. > [Description] The critical region is theoretically studied for the null hypothesis that a parameter of the distribution equals a given value against four different alternative hypothesis. The form of the region is related to the maximum likelihood of the estimator.

Analysis of Variance (ANOVA) Exercise 1ht-av > [Keywords] hypothesis tests, normal populations, analysis of variance, critical region, p-value, type I error, type II error. > [Description] The analysis of variance is applied to test whether the means of three independent normal populations—whose variances are supposed to be equal—are the same. Calculations are repeated three times with different levels of “manual work”.

Nonparametric Exercise 1ht-np > [Keywords] hypothesis tests, chi-square tests, independence tests, critical region, p-value, type I error, table of frequencies. > [Description] The independence between two qualitative variables or factors is tested by applying the chi-square statistic.

Exercise 2ht-np > [Keywords] hypothesis tests, chi-square tests, goodness-of-fit tests, critical region, p-value, type I error, table of frequencies. > [Description] The goodness-of-fit to the whole Poisson family, firsly, and to a member of the Poisson distribution family, secondly, is tested by applying the chi-square statistic. The importance of using the sample information, instead of poorly justified assumptions, is highlighted when the results of both sections are compared.

Exercise 3ht-np > [Keywords] hypothesis tests, chi-square tests, goodness-of-fit tests, independence tests, homogeneity tests, critical region, p-value, type I error, table of frequencies. > [Description] Just the same table of frequencies is looked at as coming from three different scenarios. Chi-square goodness-of-fit, independence and homogeneity tests are respectively applied.

Parametric and Nonparametric Exercise 1ht > [Keywords] hypothesis tests, Bernoulli distribution, goodness-of-fit chi-square test, position signs test, critical region, p-value, type I error, type II error, power function, table of frequencies. > [Description] Just the same problem is dealt with by considering three different approaches: one parametric test and two kinds of nonparametric test. In this case, the same decision is made.

PE – CI – HT Exercise 1pe-ci-ht > [Keywords] point estimations, confidence intervals, method of the pivot, normal distribution, t distribution, pooled sample variance. > [Description] The probability of an event involving the difference between the means of two independent normal populations is calculated with and without the supposition that the variances of the populations are the same. The method of the pivot is applied to construct a confidence interval for the quotient of the standard deviations.

Exercise 2pe-ci-ht > [Keywords] confidence intervals, point estimations, normal distribution, method of the pivot, probability, pooled sample variance. > [Description] For the difference of the means of two (independent) normally distributed variables, a confidence interval is constructed by applying the method of the pivotal quantity. Since the equality of the means is included in a high-confidence interval, the pooled sample variance is considered in calculating a probability involving the difference of the sample means.

Exercise 3pe-ci-ht > [Keywords] hypothesis tests, confidence intervals, Bernoulli populations, one-tailed tests, population proportion, critical region, p-value, type I error, type II error, power function, method of the pivot. > [Description] A decision on whether the population proportion is smaller or equal in one population than in the other is made looking at both the critical values and the p-value. The type II error is calculated and the power function is plotted. By applying the method of the pivot, a confidence interval for the difference of the population proportions is built. This interval can be seen as the acceptance region of the equivalent two-sided hypothesis test. In this case, the same decision is made with the test and with the interval.

Exercise 4pe-ci-ht > [Keywords] point estimations, hypothesis tests, standard power function density, method of the moments, maximum likelihood method, plug-in principle, Neyman-Pearson's lemma, likelihood ratio tests, critical region. > [Description] Given the probability function of a population random variable, estimators are built by applying both the method of the moments and the maximum likelihood method. Then, the plug-in principle allows us to obtain estimators for the mean and the variance of the distribution of the variable. In testing the equality of the parameter to a given value, the form of the critical region is theoretically studied when four different types of alternative hypothesis are considered.

Additional Exercises 5

(Solved but not ordered by difficulty, described nor referred to in the final index.)

Solved Exercises and Problems of Statistical Inference

Appendixes Probability Theory (PT) Some Reminders Markov's Inequality. Chebyshev's Inequality Probability and Moments Generating Functions. Characteristic Function.

Exercise 1pt > [Keywords] probability, quantile, probability tables, probability function, binomial distribution, Poisson distribution, uniform distribution, normal distribution, chi-square distribution, t distribution, F distribution. > [Description] For each of these distributions, the probability of a simple event is calculated both by using probability tables and by using the mass function, or, on the contrary, a quantile is found by using the probability tables or a statistical software program.

Exercise 2pt > [Keywords] probability, normal distribution, total sum, sample mean, completion (standardization). > [Description] For a quantity that follows the normal distribution with known parameters, the probability of an event involving the quantity is calculated after properly completing the two sides of the inequality, that is, after properly rewriting the event.

Exercise 3pt (*) > [Keywords] probability, Bernoulli distribution, binomial distribution, geometric distribution, Poisson distribution, exponential distribution, normal distribution, raw or crude population moments, series, integral, probability generating function, moment generating function, characteristic function, differential equation, integral equation, complex analysis. > [Description] For the distributions mentioned, the first two raw or crude population moments are calculated by using as many ways as possible. Their level of difficulty is different, but the aim is to practice. Some calculations require strong mathematical justifications. Several interested analytical techniques are used: changing the order of summation in series, using Taylor series, characterizing a function through a differential or integral equation, et cetera.

Mathematics (M) Some Reminders Limits Exercise 1m (*) > [Keywords] real analysis, integral, exponential function, bind, Fubini's theorem, integration by substitution, multiple integrals, polar coordinates. > [Description] It is well-known that the function exp(–x2) has no antiderivative. The definite integral is calculated in three cases that appear frequently, e.g. when working with the density function of the normal or the Rayleigh distributions. By applying the Fubini's theorem for improper integrals, calculations are translated to the two-dimensional real space, where polar coordinates are used to solve the multiple integral easily.

Exercise 2m > [Keywords] real analysis, limits, sequence, indeterminate forms > [Description] Several limits of one-variable sequences, similar to those necessary for other exercises, are calculated.

Exercise 3m (*) > [Keywords] real analysis, limits, sequence, indeterminate forms, polar coordinates. > [Description] Several limits of two-variable sequences, similar to those necessary for other exercises, are calculated.

Exercise 4m (*) > [Keywords] algebra, geometry, real analysis, linear transformation, rotation, movement, frontier, rectangular coordinates. > [Description] Several approaches are used to find the frontier and the regions determined by a discrete relation in the plain.

References Tables of Statistics (T) > [Keywords] estimators, statistics T, parametric tests, likelihood ratio, analysis of Variance (ANOVA), nonparametric tests, chi-square tests, Kolmogorov Smirnov tests, runs test (of randomness), signs test (of position), Wilcoxon signed-rank test (of position). > [Description] The statistics applied in the exercises are tabulated in this appendix. Some theoretical remarks are included.

Probability Tables (P) > [Keywords] normal distribution, t distribution, chi-square distribution, F distribution. > [Description] A probability table with the most frequently used values is included for each of the distributions abovementioned.

Index

6

Solved Exercises and Problems of Statistical Inference

Inference Theory [IT] Framework and Scope of the Methods Populations [Ap1] When the entire populations can be studied, no inference is needed. Thus, here we suppose that we have not such total knowledge. [Ap2] Populations will be supposed to be independent—matched or paired data must be treated in a slightly different way.

Samples [As1] Sample sizes are supposed to be quite smaller than population sizes—a correction factor is not necessary for these (closely) infinite populations. [As2] At the same time, we consider either any amount of normally distributed data or many data (large samples) from any distribution. [As3] Data will be supposed to have been selected randomly, with the same probability and independently; that is, by applying simple random sampling.

Methods [Am1] Before applying inferential methods, data should be analysed to guarantee that nothing strange will spoil the inference—we suppose that such descriptive analysis and data treatment have been done. [Am2] We are able to learn only linearly, but in practice methods need not be applied in the order in which they are presented here—e.g. nonparametric hypothesis tests to check assumptions before applying parametric methods.

[IT] Some Remarks Partial Knowledge and Randomness The partial knowledge mentioned in the previous section has crucial consequences. The use of only some elements of the population implies that—we can only hypothesized about the other elements—variables must be assigned a random character, on the one hand, and results will have no total certainty in the sense that statements will be set with some probability, on the other hand. For example: a 95% confidence in applying a method must be interpreted as any other probability: the results are true with probability 0.95 and false with probability 1–0.95 (frequently, we will never know if the method has failed or not). See remark 1pt, in the appendix of Probability Theory, on the interpretation of the concept of probability. In Probability Theory, random variables are dimensionless quantities; in real-life problems, variables almost always are not. Since usually this fact does not cause troubles in Statistics, we do not pay much attention to the units of measurement, and we can understand that the magnitude of the real-life variable, with no unit of measurement, is the part that is being modeled by using the proper probability distribution with the proper parameter values (of course, units of measurement are not random). To get used to pay attention to the units of measurement and to manage them, they have been written in most numerical expressions. 7

Solved Exercises and Problems of Statistical Inference

Regarding the interpretation of the whole statistical processes that we will apply either to practice their use or to solve particular real-world problems, we highlight the main points on which results are usually based: (i) (ii) (iii) (iv)

Assumptions. The method applied, including particular details of its steps, mathematical theorems, statistic T, etc. Certainty with which the method is applied: probability, confidence or significance. The data available.

In Statistics, results may change severely when assumptions are really false, other method is applied, different certainty is considered, or data has no proper information (quantity, quality, representativity, etc.). Alongside this document, we do insist on the cautions that statisticians and reader of statistical works must take in interpreting results. Even if you are not interested in “statistically cooking” data, you had better know the recipes... (Some of them have been included in the notes mentioned in the prologue.)

Use of the Samples Let X = (X1,...,Xn) be the data of a population. The information they contain is extracted and used through appropriate mathematical functions: estimators and statistics. When applying the methods, since we usually need to calculate a probability or to find a quantile, expressions must be written in terms of those appropriate quantities whose sampling distribution is known. In trying to make estimators or statistics appear, some Mathematics are needed . We do not repeat them whenever they are applied in this document. For example, the standardization is a strictly positive transformation that does not change inequalities when it is applied to both sides, or the positive branch of the square root must be considered to work with population or sample variances and standard deviations (this concepts are nonnegative by definition, while the square root is a general mathematical tool applied to this particular situation). As an example of those mathematical explanations not repeated again and again, we include the following: Remark: Since variances are nonnegative by definition and the positive branch of the square root function is strictly increasing, it holds that σx2 = σy2 ↔ σx = σy (similarly for inequalities). For general numbers a and b, it holds only that a2 = b2 ↔ |a| = |b|. From a strict mathematical point of view, for the standard deviation we should write σ = +√σ2 = |√σ2|.

Finally, at the end of the possible theoretical part of exercises, we do not insist that a sample (X1,...,Xn) would in practice be used by entering its values in the theoretical expressions obtained as a solution. Estimators and statistics are random quantities until specific data are used.

Useful Questions To make the answer, users can find it useful to ask themselves: On the Populations ● ●

How many populations are there? Are their probability distributions known?

On the Samples ● If populations are not normally distributed, are the sample sizes large enough to apply asymptotic results? ● Do we know the data themselves, or only some quantities calculated from them?

8

Solved Exercises and Problems of Statistical Inference

On the Assumptions ● What is supposed to be true? Does it seem reasonable? Do we need to prove it? ● Should it be checked for the populations: the random character, the independence of the populations, the goodness-of-fit to the supposed models, the homogeneity between the populations, et cetera? ● Should it be checked for the samples: the within-sample randomness and independence, the between-samples independence, et cetera? ● Are there other assumptions (neither mathematical nor statistical)? On the Statistical Problem ● ●

What are the quantities to be studied statistically? Concretely, what is the statistical problem: point estimation, confidence interval, hypothesis test, etc?

On the Statistical Tools ●

Which are the estimators, the statistics and the methods that will be applied?

On the Quantities ● Which are the units of measurement? Are all the units equal? ● How large are the magnitudes? Do they seem reasonable? Are all of them coherent (variability is positive, probabilities and relative frequencies are between 0 and 1, etc)? On the Interpretation ● ● ● ●

What is the statistical interpretation of the solution? How is the statistical solution interpreted in the framework of the problem we are working on? Do the qualitative results seem reasonable (as expected)? Do the quantities seem reasonable (signs, order of magnitude, etc)?

They may want to consult some other pieces of advice that we have written in Guide for Students of Statistics, available at http://www.casado-d.org/edu/GuideForStudentsOfStatistics-Slides.pdf.

[IT] Sampling Probability Distribution Remark 1it: The notation and the expression of the most basic estimators, for one population, are n ¯ = 1 ∑i=1 X i X n 2

V =

1 n 2 ( X j −μ ) n ∑ j=1

2

s =

n

η̂ =

∑i=1 X i n

1 n ¯ )2 ( X j− X n ∑ j =1

2

S =

n 1 ¯ )2 ( X j− X ∑ n−1 j=1

For two populations, other basic estimators are made with these:

̄ −Ȳ X

V 2X V 2Y

s 2X s 2Y

S 2X S Y2

η̂ X − η̂ Y

Finally, all these estimators are used to make statistics whose sampling distribution is known.

Exercise 1it-spd Given a population (variable) X following the probability distribution determined by the following values and 9

Solved Exercises and Problems of Statistical Inference

probabilities Value x

1

2

3

Probability p

3 9

1 9

5 9

Determine: (a) The joint probability distribution of the sample X = (X1,X2) (b) The sampling probability distribution of the sample mean X (Based on an exercise of the materials in Spanish prepared by my workmates.)

Discussion: The distribution of X is totally determined, since we know all the information necessary to calculate any quantity—e.g. the mean: 3 1 5 18 μ X = E( X ) = ∑ Ω x j⋅P X ( x j )= ∑ {1,2,3 } x j⋅p j = 1⋅ + 2⋅ +3⋅ = =2.222222 9 9 9 9 Instead of a table, a function is sometimes used to provide the values and the probabilities—the mass or density function. We can represent this function with the computer: values = c(1, 2, 3) probabilities = c(3/9, 1/9, 5/9) plot(values, probabilities, type='h', xlab='Value', ylab='Probability', ylim=c(0,1), main= 'Mass Function', lwd=7)

The sampling probability distribution of X is determined once we give the possible values and the probabilities with which they can be taken. Before doing that, we describe the probability distribution of the random vector X = (X1,X2).

(A) Joint probability distribution of the sample Since Xj are independent in any simple random sample, the probability that X = (X1,X2) takes the value x1 = (1,1), for example, is calculated as follows (note the intersection): 3 3 1 f X (1,1)=P X ( 1,1) = P X ({ X 1=1 }∩{ X 2=1 })=P X ( X 1=1)⋅P X ( X 2 =1)= ⋅ = 9 9 9 1

2

To fill in the following table, the other probabilities are calculate in the same way. Joint Probability Distribution of (X1,X2) Value (x1,x2)

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(3,1)

(3,2)

(3,3)

Probability of (x1,x2)

3 3 ⋅ 9 9

3 1 ⋅ 9 9

3 5 ⋅ 9 9

1 3 ⋅ 9 9

1 1 ⋅ 9 9

1 5 ⋅ 9 9

5 3 ⋅ 9 9

5 1 ⋅ 9 9

5 5 ⋅ 9 9

1 9

1 27

5 27

1 27

1 81

5 81

5 27

5 81

25 81

Notice that (1,3) and (3,1), for example, contain the same information. The values and their probabilities can 10

Solved Exercises and Problems of Statistical Inference

be given by extension (table or figure) or by comprehension (function). ## Install this package if you don't have it (run the following line without #) # install.packages('scatterplot3d') valuesX1 = c(1, 1, 1, 2, 2, 2, 3, 3, 3) valuesX2 = c(1, 2, 3, 1, 2, 3, 1, 2, 3) probabilities = c(1/9, 1/27, 5/27, 1/27, 1/81, 5/81, 5/27, 5/81, 25/81) library('scatterplot3d') # To load the package scatterplot3d(valuesX1, valuesX2, probabilities, type='h', xlab='Value X1', ylab='Value X2', zlab='Probability', xlim=c(0, 4), ylim=c(0, 4), zlim=c(0,1), main= 'Mass Function', lwd=7)

That the total sum of probabilities is equal to one can be checked: 1 1 5 1 1 5 5 5 25 9+3+15+3+1+5+15+5+25 81 = =1 ∑Ω f X ( x j )= ∑Ω p j = 9 + 27 + 27 + 27 + 81 + 81 + 27 + 81 + 81 = 81 81 From the information in the table it is possible to calculate any quantity—e.g. the first-order joint moment:

1 1 5 25 1,1 μ X = E ( X 1⋅X 2 )= ∑ Ω x j⋅f X (x j )= 1⋅1⋅ +1⋅2⋅ +⋯+3⋅2⋅ +3⋅3⋅ =4.938272 9 27 81 81

(B) Sampling probability distribution of the sample mean The sample mean X(X) = X(X1,X2) is a random quantity, since so are X1 and X2. Each pair of values (x1,x2) of (X1,X2) gives a value x for X; on the contrary, a value x of X can correspond to different pairs of values (x1,x2). Then, we will fill in a table with all values and merge those that are equal. For example: ¯ (1,1) = 1+1 = 2 =1 X 2 2 The other values x of X are calculate in the same way to fill in the following table: Value (x1,x2)

(1,1)

(1,2)

(1,3)

(2,1)

(2,2)

(2,3)

(3,1)

(3,2)

(3,3)

Probability of (x1,x2)

1 9

1 27

5 27

1 27

1 81

5 81

5 27

5 81

25 81

Value x of X

1+1 2

1+2 2

1+3 2

2+1 2

2+2 2

2+3 2

3+ 1 2

3+ 2 2

3+ 3 2

1

3 2

2

3 2

2

5 2

2

5 2

3

The sample mean X can take five different values while (X1,X2) could take nine different possible values (x1,x2). Thus, the probability for X to take the value 2, for example, is calculated as follows (note the union): 5 1 5 31 P X¯ ( 2)= P X ({(1,3) }∪{(2,2) }∪{(3,1)})=P X ((1,3))+ P X ((2,2))P X ((3,1))= + + = 27 81 27 81 In the same way, P X¯ ( 1) = P X ({(1,1)})= 11

1 9

Solved Exercises and Problems of Statistical Inference

( 32 ) = P ({(1,2)}∪{(2,1)})=P ({(1,2)})+ P ({(2,1) })= 271 + 271 = 272 5 5 5 10 P ( ) = P ({(2,3) }∪{(3,2)})=P ({(2,3)})+ P ({( 3,2)})= + = 2 81 81 81

P X¯

X

X

X

¯ X

X

X

X

P X¯ (3) = P X ({(3,3)})=

25 81

Then, the sampling probability distribution of the sample mean X is determined, in this case, by Probability Distribution of X Value x

1

3 2

2

5 2

3

Probability of x

1 9

2 27

31 81

10 81

25 81

We can check that the total sum of probabilities is equal to one: 1 2 31 10 25 9+6+31+10+ 25 81 = =1 ∑Ω P ¯X ( x j) = ∑Ω p j = 9 + 27 + 81 + 81 + 81 = 81 81 From the information in the table above it is possible to calculate any quantity—e.g. the mean:

1 3 2 31 5 10 25 9+9+62+25+75 μ ¯X = E( X¯ )= ∑ Ω x j⋅P X¯ ( x j )= ∑ Ω x j⋅p j = 1⋅ + ⋅ +2⋅ + ⋅ +3⋅ = =2.222222 9 2 27 81 2 81 81 81 It is worth noticing that this value is equal to the value that we obtained at the beginning, which agrees with the well-known theoretical property:

μ ¯X = E( X¯ )= E ( X ) = μ X Values and probabilities can also be provided by using a function—the mass or density function, which can be represented with the help of a computer: values = c(1, 3/2, 2, 5/2, 3) probabilities = c(1/9, 2/27, 31/81, 10/81, 25/81) plot(values, probabilities, type='h', xlab='Value', ylab='Probability', ylim=c(0,1), main= 'Mass Function', lwd=7)

Conclusion: For a simple distribution for X and a small sample size X = (X1,X2), we have written both the joint probability distribution of the sample X and the sampling distribution of X. This helps us to understand the concept of sampling distribution of any random quantity (not only the sample mean), whether we are able to write it or even to know it (e.g. due to a theorem). My notes:

12

Solved Exercises and Problems of Statistical Inference

Point Estimations [PE] Methods for Estimating Remark 1pe: When necessary, the expectations E(X) and E(X2) are usually given in the statement; once E(X) is given, either Var(X) or E(X2) can equivalently be given, since Var(X) = E(X2)–E(X)2. If not given, these expectations can be calculated from their definitions by adding up to or integrating, for discrete and continuous variables, respectively (this is sometimes an advanced mathematical exercise). Remark 2pe: If the method of the moments is used to estimate m parameters (frequently 1 or 2), the first m equations of the system usually suffice; nevertheless, if not all the parameters appear in the first-order moments of X, the smallest m moments—and equations—for which the parameters appear must be considered. For example, if μ1 = 0 or if the interest relies directly on σ2 because μ is known, the first-order equation μ1 = μ = E(X) = m1 does not involve σ and hence the second-order equation μ2 = E(X2) = Var(X) + E(X)2 = σ2+μ2 = m2 must be considered instead. Remark 3pe: When looking for local maxima or minima of differentiable functions, the first-order derivatives are equalized to zero. After that, to discriminate between maxima and minima, the second-order derivatives are studied. For most of the functions we will work with, this second step can be solved by applying some qualitative reasoning on the sign of the quantities involved and the possible values of the data xi. When this does not suffice, the values found in the first step, say θ0, must be substituted in the expression of the second step. On the other hand, global maxima and minima cannot in general be found using the derivatives, and some qualitative reasoning must be applied. It is important to highlight that, in applying the maximum likelihood method, the purpose is to find the maximum, whichever the mathematical way.

Exercise 1pe-m If X is a population variable that follows a binomial distribution of parameters κ and η, and X = (X1,...,Xn) is a simple random sample: (a) Apply the method of the moments to obtain an estimator of the parameter η. (b) Apply the maximum likelihood method to obtain an estimator of the parameter η. (c) When κ = 10 and x = (x1,...,x5) = (4, 4, 3, 5, 6), use the estimators obtained in the two previous sections to construct final estimates of the parameter η and the measures μ and σ2. Hint: (i) In the two first sections treat the parameter κ as if it were known. (ii) In the likelihood function, join the combinatorial terms into a product; this product does not depend on the parameter η and hence its derivative will be zero.

Discussion: This statement is mathematical, although in the last section we are given some data to be substituted. In practice, that the binomial can be used to explain X should be supported. The variable X is dimensionless. For the binomial distribution,

(See the appendixes to see how the mean and the variance of this distribution can be calculated.) Particularly, the results obtained here can be applied to the Bernoulli distribution with κ = 1.

(a) Method of the moments (a1) Population and sample moments: The probability distribution has two parameters originally, but we have to study only one. The first-order moments are 1 n and μ1 ( η)=E ( X )=κ⋅η m1 ( x 1 , x 2 ,... , x n )= ∑ j =1 x j= ¯x n (a2) System of equations: Since the parameter of interest η appears in the first-order population moment of 13

Solved Exercises and Problems of Statistical Inference

X, the first equation is enough to apply the method: →

μ1 (η)=m1 ( x1 , x 2 , ... , x n ) (a3) The estimator:

κ⋅η=

1 n x = ¯x n ∑ j=1 j

1 η= κ ¯x



1 η^ M = κ X¯

(b) Maximum likelihood method (b1) Likelihood function: For the binomial distribution the mass function is f (x ; κ , η)= κ ηx (1−η) κ− x . x We are interested only in η, so

( )

n n x κ− x x κ− x x κ− x L( x 1 , x 2 , ... , x n ; η)=∏ j=1 f ( x j ; η)=∏ j=1 κ η (1−η) = κ η (1−η) ⋯ κ η (1−η) xj x1 xn

( )

=

[

n

∏ j =1 ( xκj )

]

η∑

n j=1

xj

(1−η)∑

j

n j=1

( )

j

(κ− x j )

=

[

]

n

1

( )

1

n

∏ j=1 ( xκj ) ⋅η∑

j=1

xj

n

n

n

n κ−∑ j=1 x j

(1−η)

.

(b2) Optimization problem: The logarithm function is applied to facilitate the calculations, log [ L( x 1 , x 2 , ... , x n ; η)]=log =log

[

]

n

[∏

n j=1

( )]

κ +log [ η∑ xj

n

n j=1

xj

n

]+log [(1−η)

n κ− ∑ j=1 x j

]

n

∏ j=1 ( xκj ) +(∑ j =1 x j )log(η)+( n κ−∑ j=1 x j )log(1−η).

To discover the local or relative extreme values, the necessary condition is n

0=

d 1 −1 log[ L( x 1 , x 2 , ... , x n ;η)]=0 +( ∑ j =1 x j ) +( n κ−∑ j=1 x j ) dη 1−η η n

n

n κ−∑ j=1 x j



1−η

n

=

∑ j=1 x j η

n

n

n

n

n

→ η n κ−η∑ j=1 x j =∑ j=1 x j −η ∑ j=1 x j → η n κ=∑ j =1 x j → η0=

∑ j=1 x j nκ

11 n 1 = κ ∑ j=1 x j = κ ¯x n

To verify that the only candidate is a local or relative maximum, the sufficient condition is n

2

d −1 −1 log[ L ( x 1 , x 2 ,... , x n ;η)]=( ∑ j =1 x j) 2 −(n κ−∑ j=1 x j ) (−1)=− 2 dη η (1−η)2 n

n

n

∑ j =1 x j η2

n



n κ−∑ j=1 x j (1−η) 2

n

δ (and zero elsewhere) is termed two-parameter exponential distribution. It is a translation of size δ of the usual exponential distribution. A particular, simple case is obtained for θ = 1 and δ =0, since f ( x ) = e− x , x > 0 .

My notes:

Exercise 7pe-m A random quantity X is supposed to follow a distribution whose probability function is, for θ>0,

{

3 x2 3 f (x ; θ) = θ

if 0≤ x≤θ

0

otherwise

A) Apply the method of the moments to find an estimator of the parameter θ. B) Apply the maximum likelihood method to find an estimator of the parameter θ. C) Use the estimators obtained to build estimators of the mean μ and the variance σ2. Hint: Use that E(X) = 3θ/4 and Var(X) = (3θ2)/80.

Discussion: This statement is mathematical. The random variable X is supposed to be dimensionless. The probability function and the first two moments are given, which is enough to apply the two methods. In the last step, the plug-in principle will be applied. Note: If E(X) had not been given in the statement, it could have been calculated by integrating: θ

3x2 3 4 3 E ( X )=∫−∞ x f ( x ;θ)dx=∫0 x 3 dx= 3 θ = θ θ θ 4 0 4 +∞

[ ]

θ

On the other hand, if Var(X) had not been given in the statement, it could have been calculated by using a property and integrating: +∞

θ

E ( X )=∫−∞ x f (x ;θ)dx=∫0 2

2

θ

[ ]

3 x2 3 x5 3 x 3 dx= 3 = θ2 . 5 5 θ θ 0 2

Now,

3 μ=E ( X )= θ 4

and

3 2 3 2 3 32 2 3 2 2 2 2 σ =Var ( X )=E ( X )−E ( X ) = θ − θ = − 2 θ = θ . 5 4 5 4 80

( ) (

)

A) Method of the moments a1) Population and sample moments: There is only one parameter, so one equation suffices. The first-order moments of the model X and the sample x are, respectively, 25

Solved Exercises and Problems of Statistical Inference

3 μ1 (θ )=E ( X )= θ 4

and

m1 (x 1 , x 2 ,... , x n )=

1 n x = ¯x n ∑ j =1 j

a2) System of equations: Since the parameter of interest η appears in the first-order moment of X, the first equation suffices: 3 1 n 4 μ1 (θ )=m1 ( x 1 , x 2 ,... , x n ) → θ= ∑ j=1 x j =¯x → θ0 = ¯x 4 n 3 a3) The estimator:

4¯ θ^ M = X 3

B) Maximum likelihood method b1) Likelihood function: For this probability distribution, the density function is f ( x ; θ)= n

n

L( x 1 , x 2 , ... , x n ; θ)=∏ j =1 f (x j ; θ)=∏ j=1

3x 3 θ

2

so

n 3 x 2j 3n = x2 ∏ 3 3n j=1 j θ θ

b2) Optimization problem: The logarithm function is applied to make calculations easier n

log[ L( x 1 , x 2 , ... , x n ; θ)]=log( 3n)−3 n log(θ)+ log( ∏ j =1 x 2j) Now, if we try to find the maximum by looking at the first-order derivatives, a useless equation is obtained: 0=

d 1 log[ L(x 1 , x 2 , ... , x n ; θ)]=−3 n θ dθ



?

Then, we realize that global minima and maxima cannot in general be found through the derivatives (only if they are also local). It is easy to see that the function L monotonically increases when θ decreases (this pattern or just the opposite tend to happen when the probability function changes monotonically with the parameter, e.g. when the parameter appears only once in the expression). As a consequence, it has no local extreme values. On the other hand, 0≤x j≤θ , ∀ j , so θ {ButL xwhen ≤θ , ∀ j

θ0 =max j { x j }



j

b3) The estimator: θ^ ML =max j { X j }

C) Estimation of η and σ2 c1) For the mean: By using the hint and the plug-in principle, 3 34 ¯ ¯  From the method of the moments: μ^ M = θ^ M = X =X . 4 43 3 3  From the maximum likelihood method: μ^ ML = θ^ ML = max j { X j }. 4 4 c2) For the variance: By using that principle again, 2

3 3 4¯ 1 ¯ 2  From the method of the moments: σ^ 2M = θ^ 2M = X = (X ) . 80 80 3 15

( )

26

Solved Exercises and Problems of Statistical Inference

2  From the maximum likelihood method: σ^ ML =

3 ^2 3 2 θ ML = ( max j { X j }) . 80 80

Conclusion: For this model, the two methods provide different estimators. The quality of the estimators obtained should be studied. We have used the estimator of θ to obtain estimators of μ and σ2. My notes:

[PE] Properties of Estimators Remark 4pe: As regards the sample sizes, we can talk about static situations where we study the dependence of the concepts on the sizes, or the possible relation between the sizes, say nX = c·nY. On the other hand, we can talk about dynamic situations where the same dependences are studied asymptotically while the sample sizes are always increasing, say nX(k)= c(k)·nY(k), where k is the index of a sequence of statistical schemes with those sample sizes. (Statistically, we are interested in sequences with nondecreasing sample sizes; mathematically, all possible sequences should be taken into account.) The static and the dynamic situations are respectively represented in the following figures:

Remark 5pe: We do not usually use the definition of the mean square error but the result at the end of the following equalities: 2 ^ ^ ^ ^ E ( θ)−θ ^ ^ ^ 2+[ E ( θ)−θ ^ ^ E(θ)]⋅ ^ [ E ( θ)−θ ^ MSE ( θ)= E ([ θ−θ] )= E ([ θ−E ( θ)+ ]2 )=E ([θ−E ( θ)] ]2 +2 [θ− ]) 2 2 2 ^ ^ ) + [ E ( θ)−θ] ^ ^ E ( θ)−θ]−2 ^ ^ E ( θ)−θ ^ ^ b( θ) ^ = E ([ θ−E ( θ)] + 2 E ( θ)⋅[ E ( θ)⋅[ ]= Var ( θ)+

Remark 6pe: To study the consistency in probability we have been taught a sufficient—but not necessary—condition that is equivalent to the consistency in mean of order two (managing the definition is quite complex). Thus, this type of consistency is proved when the condition is fulfilled, which is sufficient—but not necessary—for the consistency in probability. By using the Chebyshev's inequality: 2 ^ ^ E(( θ−θ) ) MSE ( θ) ^ | | P( θ−θ ≥ϵ)≤ = ϵ2 ϵ2



^ |≥ϵ) ≤ lim n →∞ P (|θ−θ

^ lim n→∞ MSE ( θ) ϵ2

If the sufficient condition is not fulfilled, the estimator under study is not consistent in mean of order two, but it can still be ^ consistent in probability—this type of consistency should be studied using a different way. Additionally, since MSE ( θ),

^ ^ 2 and Var ( θ) b ( θ)

are nonnegative, the mean square error is zero if and only if the other two are zero at the same time, and viceversa. The same happens for their limits. That is why we are allowed to split the limit of the mean square error into two limits.

Exercise 1pe-p The efficiency (in lumens per watt, u) of light bulbs of a certain type have a population mean of 9.5u and standard deviation of 0.5u, according to production specifications. The specifications for a room in which eight of these bulbs (the simple random sample) are to be installed call for the average efficiency of the eight bulbs to exceed 10u. Find the probability that this specification for the room will be met, assuming that efficiency measurements are normally distributed. (From Mathematical Statistics with Applications, Mendenhall, W., D.D. Wackerly and R.L. Scheaffer, Duxbury Press.)

27

Solved Exercises and Problems of Statistical Inference

Discussion: The supposition that efficiency measurements follow the distribution N(μ=9.5u, σ2=0.52u2) should be tested by applying an appropriate statistical technique. The event is defined in terms of X. We think about making the proper statistic appear, and hence to be allowed to use its sampling distribution.

Identification of the variable and selection of the statistic : The variable is the efficiency of the light bulbs, while the estimator is the sample mean of eight elements. Since the population is normal and the two population parameters are known, we will consider the (dimensionless) statistic: X̄ −μ T ( X ;μ )= ∼ N ( 0,1) σ2 n



( n) 2

Rewriting the event: Although in this case the sampling distribution of X is known, as X̄ ∼ N μ , σ , we need to standardize before consulting the table of the standard normal distribution: ̄ > 10)=P P(X

(√

̄ −μ X 2

>

σ n

10−μ



2

σ n

) (

=P T >

where in this case the language R has been used:

10−9.5



0.52 8

)

(

=P T >

)

0.5 √ 8 =P ( T > √ 8) =0.0023 √ 0.5 2

> 1 - pnorm(sqrt(8),0,1) [1] 0.002338867

Conclusion: The production specifications will be met, for the room mentioned, with a probability of 0.0023, that is, they will hardly be met. My notes:

Exercise 2pe-p When a production process is working properly, the resistance of the components follows a normal distribution with standard deviation 4.68u. A simple random sample with four components is taken. What is the probability that the sample quasivariance will be bigger than 30u2?

Discussion: In this exercise, the supposition that the normal distribution reasonably explains the variable resistance should be evaluated by using proper statistical techniques. The question involves S2. Again, it is necessary to make the proper statistic appear, in order to use its sampling distribution. Identification of the variable: R ≡ Resistance (of one component)

R ~ N(μ, σ2 = 4.682u2)

Sample and statistic: R1, R2, R3, R4 (The resistance of four components is measured.) → n = 4 2

S=

4 1 ¯ )2 (R −R ∑ 4−1 j=1 j

Sample quasivariance

Search for a known distribution: The quantity required is P(S2 >30). To calculate the probability of an event, we need to know the distribution of the random quantity involved. In this case, we do not know the sampling distribution of S 2 , but since R follows a normal distribution we are allowed to use 28

Solved Exercises and Problems of Statistical Inference

T=

(n−1) S 2 ∼χ 2n−1 2 σ

Then, by completing the inequality with the necessary constants (until making T appear): P( S2 >30)=P

(

(n−1) S 2 ( n−1)30 ( 4−1)30 > =P T > =P(T > 4.11) 2 2 σ σ 4.68 2

) (

)

where T ∼ χ 23 . Multiplying and dividing by positive quantities have not changed the inequality. Table of the χ2 distribution: Since n–1=4–1=3, it is necessary to look at the third row.

The probabilities in the table are given for events of the form P(T < x ) (or P(T ≤x ) , as the distribution is continuous), and therefore the complementary of the event must be considered:

P(T > 4.11)=1−P (T ≤4.11)=1−0.75=0.25

Conclusion: The probability of the event is 0.25. This means that S2 will sometimes take a value larger than 30u2, when evaluated at specific data x coming from the mentioned distribution. My notes:

Exercise 3pe-p A simple random sample of 270 homes was taken from a large population of older homes to estimate the proportion of homes with unsafe wiring. If, in fact, 20% of homes have unsafe wiring, what is the probability that the sample proportion will be between 16% and 24%? Hint: Since probabilities and proportions are measured in a 0-to-1 scale, write all quantities in this scale. (From Statistics for Business and Economics, Newbold, P., W.L. Carlson and B.M. Thorne, Pearson.)

LINGUISTIC NOTE (From: The Careful Writer: A Modern Guide to English Usage. Bernstein, T.M. Atheneum) home, house. It is a tribute to the unquenchable sentimentalism of users of English that one of the matters of usage that seem to agitate them the most is the use of home to designate a structure designed for residential purposes. Their contention is that what the builder erects is a house and that the occupants then fashion it into a home. That is, or at least was, basically true, but the distinction has become blurred. Nor is this solely the doing of the real estate operators. They do, indeed, lure prospective buyers not with the thought of mere masonry but with glowing picture of comfort, congeniality, and family collectivity that make a house into a home. But the prospective buyers are their co-conspirators; they, too, view the premises not as a heap of stone and wood but as a potential abode. There may be areas in which the words are not used interchangeably. In legal or quasi-legal terminology we speak of a “house and lot,” not a “home and lot.” The police and fire departments usually speak of a robbery or a fire in a house, not a home, at Main Street and First Avenue. And the individual most often buys a home, but sells his house (there, apparently, speaks sentiment again). But in most areas the distinction between the words has become obfuscated. When a flood or a fire destroys a community, it wipes out not merely houses but homes as well, and homes has come to be accepted in this sense. No one would discourage the sentimentalists from trying to pry the two words apart, but it would be rash to predict much success for them.

Discussion: The information of this “real-world study” must be translated into the mathematical language. Since there are two possible situations, each home can be “modeled” by using a Bernoulli variable. Although 29

Solved Exercises and Problems of Statistical Inference

given in a 0-to-100 scale, the population and sample proportions—always in a 0-to-1 scale—are involved. The dimensionless character of a proportion is due to its definition. Note that if the data (x1,...,xn) are taken and we have access to them, there is nothing random any longer. The lack of knowledge, as if we had to select n elements to build (X1,...,Xn), justifies the use of Probability Theory.

Identification of the variable and selection of the statistic : The variable having unsafe wiring can take two possible values: 0 (not having unsafe wiring) and 1 (having it, if one want to register or count this fact). The theoretical proportion of older homes with unsafe wiring is known: η = 0.20 (20%). For this framework—a large sample from a Bernoulli population with parameter η—we select the dimensionless, asympotic statistic: d ̂ η−η T ( X ; η)= → N (0,1) ?(1−? ) n



̂ Here we know η. where ? is substituted by the best information available about the parameter: η or η.

Rewriting the event: We are asked for the probability P (0.16 < η̂ < 0.24), but to calculate it we need to rewrite the event until making T appear: P (0.16 < η ̂ < 0.24)=P

(

=P T
0.1−Ȳ ) . (Advanced Item) The probability P ( X

Discussion: There are two independent normal populations whose parameters are known. The variances, not the standard deviation, are given. It is required to calculate probabilities or find quantiles for events involving the sample means and the sample quasivariances. In the first two sections, only one of the populations is involved. Sample sizes are 11 and 6, respectively. The variables X and Y are dimensionless, and so are both sides of the inequalities. (1) The event involves the estimator S 2 , which reminds us of the statistic T = 2 y

P ( S ≤ 1.5)= P

(

(n y −1)S 2y σ

2 y



(n y −1)1.5 σ

2 y

)

(

=P T≤

or, equivalently,

̄ −μ X X

(√

2 X

>

σ nX

c−μ X



2 X

σ nX

Y

(6−1) 1.5 5⋅1.5 =P T ≤ = P ( T ≤ 15 ) = 0.99 0.5 1 2

)

̄ , so we think about the statistic T = (2) The event involves X

̄ > c) = P 0.25 = P ( X

(nY −1)S Y2 ∼ χ 2n −1 . Then, 2 σY

(

̄ −μ X X



)

∼ N ( 0,1) . Then,

2 X

σ nX

c−μ x

) ( √ ) ( √) =P T>

2 X

=P T>

σ nX

(

1−0.25 = 0.75 = P T ≤

c−1 1 11



c−1 1 11

)

Now, the quantile found in the table of the standard normal distribution must verify that r 0.25=l 0.75=0.674 =

c−1 1 11





c = 0.674

(3) To work with the means of two populations, we use T =

̄ −0.1 > 0.1+ Ȳ ) = P ( X ̄ −Ȳ > 0.2) = P P(X

31

(

¯ −Y¯ )−(μ X −μ Y ) (X



̄ −Ȳ )−(μ x −μ y ) (X



σ 2x σ 2y + nx ny



1 +1 = 1.20 11

σ 2X σ Y2 + n X nY

>

∼ N (0,1), so

0.2−(μ x −μ y )



σ 2x σ 2y + nx ny

Solved Exercises and Problems of Statistical Inference

) (

=P T >

0.2−(1−2) 1 0.5 + 11 6



)

0.2−1+ 2 = P ( T > 2.87 ) = 1−P (T ≤ 2.87 ) = 1−0.9979 = 0.0021 1 1 + 11 12 S 2X σ 2Y (4) To work with the variances of two populations, T = 2 2 ∼ F n −1 ,n −1 is used: S Y σX

(

=P T >

)



X

0.9 = P

(

S 2X S

2 Y

) (

≤c =P

σ 2Y S 2X 2 X

σ S

2 Y

≤c

σY2 σ

) (

= P T ≤c

2 X

The quantile found in the table of the distribution F n find the unknown c: c r 0.1=l 0.9=3.30= 2

X

−1 , nY −1

σ 2Y σ

2 X

)

Y

(

=P T ≤c

0.5 c =P T ≤ 1 2

) (

=F 11−1 ,6−1=F 10,5 is 3.30, which allows us to > qf(0.9, 10, 5) [1] 3.297402

c = 6.60.



)

(Advanced Item) In this case, allocating the two sample means in the first side of the inequality leads to

̄ −0.1 > 0.1−Ȳ ) = P( X̄ + Ȳ > 0.2) P(X We remember that

(

̄ ∼ N μX , X

σ2X nX

)

and

(

Y¯ ∼ N μ Y ,

σ2Y nY

)

so the rules that govern the sums—and hence subtractions—of normally distributed variables imply both

(

̄ −Ȳ ∼ N μ X −μ Y , X

σ 2X σ 2Y + n X nY

)

and

(

¯ + Y¯ ∼ N μ X +μY , X

σ 2X σ2Y + n X nY

)

(Note that in both cases the variances are added—uncertainty increases.) Although the difference is used more frequently, to compare to populations, the sampling distribution of the sum of the sample means is also known thanks to the rules for normal variables; alternatively, we could still use the first result by doing X+Y = X–(–Y) and using the –Y has mean and variances equal to –μY and σY2. Either way, after standardizing: ¯ + Y¯ )−(μ X +μ Y ) (X T= ∼ N ( 0, 1 ) σ 2X σ2Y + n X nY



̄ + Y. ̄ Now, This is the “mathematical tool” necessary to work with X ̄ −0.1 > 0.1−Ȳ ) = P( X̄ + Ȳ > 0.2) = P P(X

(

=P T >



(

̄ + Ȳ )−(μ X +μY ) (X



σ2X σ 2Y + nX nY

>

0.2−(μ X +μY )



σ 2X σY2 + n X nY

) (

=P T >

0.2−(1+ 2) 1 0.5 + 11 6



)

0.2−3 = P ( T > −6.71 ) = 1−P ( T ≤−6.71 ) = 1 1 1 + 11 12

)

The quantile 6.71 is not usually in the tables of the N(0,1), so we can consider that P ( T ≤−6.71 )≈0. Or, if > 1-pnorm(-6.71,0,1) we use the programming language R: [1] 1

Conclusion: For each case, we have selected the appropriate statistic. After completing the expression of the event, the statistic T appears. Then, since the (sampling) distribution of T is known, the tables can be used to calculate probabilities or to find quantiles. In the latter case, the unknown c is found after the quantile of T. 32

Solved Exercises and Problems of Statistical Inference

My notes:

Exercise 5pe-p Suppose that you manage a bank where the amounts of daily deposits and daily withdrawals are given by independent random variables with normal distributions. For deposits, the mean is ₤12,000 and the standard deviation is ₤4,000; for withdrawals, the mean is ₤10,000 and the standard deviation is ₤5,000. (a) For a week, calculate or bind the probability that the five withdrawals will add up to more than ₤55,000. (b) For a particular day, calculate or bind the probability that withdrawals will exceed deposits by more than ₤5,000. Imagine that you are to launch a new monthly product. A prospective study indicated that profits (in million dollars) can be modeled through the random quantity Q = (X+1)/2.325, where X follows a t distribution with twenty degrees of freedom. (c) For a particular month, calculate or bind the probability that profits will be smaller than ₤106 (one million pounds). (Based on an exercise of Business Statistics, Douglas Downing and Jeffrey Clark, Barron's.)

Discussion: There are several suppositions implicit in the statement, namely: (i) the normal distribution can reasonably be used to model the two variables of interest D and W; (ii) withdrawals and deposits are independent; and (iii) X can reasonably be modeled by using the t distribution. These suppositions should firstly be evaluated by using proper statistical techniques. To solve this exercise, the rules on sums and differences of normally distributed variables must be used.

Identification of variables and distributions: If D and W represent the random variables daily sum of deposits and daily sum of withdrawals, respectively, from the statement we have that D ∼ N (μ D =₤ 12,000 , σ 2D =₤ 2 4,0002 )

and

W ∼ N (μW =₤ 10,000 , σ 2W =₤ 2 5,000 2)

(a) Since the variables are measured daily, in a week we have five measurements (one for each working day). Translation into the mathematical language: We are asked for the probability 5

P (W 1+ W 2+ W 3+W 4+ W 5 > 55,000)=P ( ∑ j =1 W j > 55,000) Search for a known distribution: To calculate or bind this probability, we need to know the distribution of the sum or, alternatively, to relate it to any quantity whose distribution we know. By using the rules that govern the sums and subtractions of normal variables, 5

∑ j=1 W j ∼ N (5μ W ,5 σ 2W ) Rewriting the event: We can easily rewrite the event in terms of the standardized version of this normal distribution: 5

P ( ∑ j=1 W j >55,000)=P

(

33

5

∑ j =1 W j−5μ W 55,000−5 μW > 2 5 σ √ W √ 5 σ2W

)

(

=P Z>

55,000−50,000

√5⋅5,000 2

Solved Exercises and Problems of Statistical Inference

)

=P ( Z > 0.4472)

Consulting the table: Finally, it is enough to consult the table of the standard normal distribution Z. On the one hand, in the table we are given values for the quantiles 0.44 and 0.45, so we could round the value 0.4472 to the closest 0.45 or, more exactly, we can bind the probability. On the other hand, our table provides lower-tail probabilities, so we will consider the complementary of some events. From the figure below, it is easy to deduce that

P (Z > 0.44)> P ( Z > 0.4472)> P (Z > 0.45) 1−P ( Z ≤0.44)> P (Z > 0.4472)> 1−P ( Z≤0.45)

1−0.6700> P (Z > 0.4472)> 1−0.6736 0.3300> P (Z > 0.4472)> 0.3264 Then, 0.3264< P

5

(∑

j=1

)

W j > 55,000 < 0.3300

Note: It is also possible to relate the total sum to the sample mean 5

P ( ∑ j=1 W j >55,000)=P

(

1 5 1 ¯ >11,000 ) W > 55,000 =P ( W 5 ∑ j=1 j 5

)

and use that

σ 2W 1 5 ¯ W = ∑ j=1 W j ∼ N μ W , 5 5

(

)



̄ −μ W W



σ 2W 5

∼ N (0,1)

(b) Translation into the mathematical language: We are asked for the probability P (W > D+ 5,000). Search for a known distribution: To calculate or bind this probability, we rewrite the event until all random quantities are on the left side of the inequality: P (W > D+ 5,000)=P (W −D >5,000) Now we need to know the distribution of W – D or, alternatively, of a quantity involving this difference. By again using the rules that govern the sums and differences of normal variables, it holds that

W −D ∼ N (μ W −μ D , σW2 +σ 2D )= N (₤ 10,000−₤ 12,000 , ₤ 2 5,000 2 + ₤ 2 4,0002 ) Rewriting the event: We can easily express the event in terms of the standardized version of W – D: P (W −D> 5,000)=P =P

(

(

(W − D)−(μ W −μ D ) 5,000−(μ W −μ D ) > √ σ2W + σ2D √ σ2W + σ 2D

)

(W −D)−(−2,000) 5,000−(−2,000) 7⋅103 > = P(Z > )=P (Z >1.0932) √ 25+16⋅103 √ 25⋅106 +16⋅106 √25⋅106 +16⋅106

)

Consulting the table: We can bind the probability as follows (see the figure below)

P (Z > 1.0900)> P (Z> 1.0932)> P( Z > 1.1000) 1−P ( Z ≤1.0900)> P ( Z > 1.0932)> 1−P (Z ≤1.1000)

1−0.8621> P ( Z> 1.0932)> 1−0.8643 0.1379> P(Z > 1.0932)> 0.1357 Then,

0.1357< P (W > D+ 5,000)< 0.1379 34

Solved Exercises and Problems of Statistical Inference

X +1 X +1 ⋅106 2 (there is a similar figure for any other θ),

This plot is not necessary for the following sections.

(b) Study the consistency (in probability) of X as an estimator of θ ̂ We apply the sufficient consistency in mean of order two: lim n →∞ MSE ( θ)=0 ↔

{

lim n→∞ b( θ̂ )=0 lim n →∞ Var ( θ̂ )=0

(b1) Bias: By applying a property of the sample mean and the information of the statement, ̄ )= E ( X )=θ− 1 E(X 2

̄ )= E( X̄ )−θ=θ− 1 −θ=− 1 → b(X 2 2

1 1 → lim n →∞ b( X̄ )=lim n→∞ − =− 2 2

(It is asymptotically biased.) Since one condition of the pair is not verified, it is not necessary to check the other, and neither the fulfillment of the consistency in probability nor the opposite can be proved using this way (though the estimator is not consistent in the mean-square sense).

(c) Study the efficiency of X as an estimator of θ The definition of efficiency consists of two conditions: unbiasedness and minimum variance (this latter is checked by comparing the variance and the Cramér-Rao's bound). (c1) Unbiasedness: In the previous section it has been proved that X is a biased estimator of θ. The first condition does not hold, and hence it is not necessary to check the second one. The conclusion is that X is not an efficient estimator of θ.

41

Solved Exercises and Problems of Statistical Inference

(d) An unbiased estimator of θ and its consistency ̄ )=− 1 , which suggests correcting the previous estimator by adding 1/2, that is: In (b) we found that b ( X 2 1 ̂ X̄ + . To study its consistency (in probability), we apply the sufficient condition mentioned in section θ= 2 b (the consistency in mean of order two). (d1) Bias: By applying a property of the sample mean and the information of the statement, ̂ E( X̄ )+ 1 =θ− 1 + 1 =θ → b( θ)=E ̂ ̂ ̂ E ( θ)= ( θ)−θ=θ−θ=0 → lim n →∞ b( θ)=lim n →∞ 0 = 0 2 2 2 (d2) Variance: By applying a property of the sample mean and the information of the statement, Var ( X ) 1 3 ^ ¯ + =Var ( X ¯ )= Var ( θ)=Var X = 2 n 4⋅n

3 =0 → lim n →∞ Var ( θ̂ )=lim n→∞ 4⋅n ̂ X̄ + 1 is a As a conclusion, the mean square error (MSE) tends to zero and hence the proposed estimator θ= 2 consistent—in mean square error and hence in probability—estimator of θ.

(

)

Conclusion: We could prove neither the consistency nor the efficiency. Nevertheless, the bias has allowed us to build an unbiased, consistent estimator of the parameter. The efficiency of this new estimator could be studied, but it is not required in the statement. My notes:

Exercise 11pe-p A population random quantity X is supposed to follow a geometric distribution. Let X = (X1,...,Xn) be a simple random sample. By applying the factorization theorem below, find a sufficient statistic T(X) = T(X1,...,Xn) for the parameter. Give explanations.

Discussion: The factorization theorem can be applied both to prove that a given statistic is sufficient and to find sufficient statistics. On the other hand, for the distribution involved we know that

Likelihood function: n

L( X ; η)=∏ j =1 f ( X j ; η)= f ( X 1 ; η)⋅ f ( X 2 ; η)⋯ f ( X n ; η)=η⋅(1−η) X −1⋅η⋅(1−η) X −1 ⋯η⋅(1−η) X 1

42

Solved Exercises and Problems of Statistical Inference

2

n

−1

n

n

=η ⋅(1−η)

X 1−1+ X 2−1+⋯+ X n−1

n

(∑

=η ⋅(1−η)

j=1

)

X j −n

Theorem: We must try allocating each term of the likelihood function: ➔ ➔

η

n

depends only on the parameter, not on Xj. Then, it would be part of g.

(∑

n

)

X j −n

depends on both the parameter and the data Xj, and these two kinds of information (1−η) neither are mixed nor can mathematically be separated. Then, it would be part of g and the only j=1

n

possible sufficient statistic, if the theorem holds, is T =∑ j =1 X j . n

n

−n

∑ j=1 X j

By considering g (T ( X ) ; η)=η ⋅(1−η) (1−η)

and h( X )=1 , the theorem holds and hence the

n

statistic T ( X )=∑ j=1 X j is sufficient for studying η. The idea behind this kind of statistics is that they “summarize the important information (about the parameter)” contained in the sample. In fact, the statistic T has essentially the same information as any one-to-one transformation of it, particularly the sample mean n n ¯. T ( X )= ∑ j =1 X j =n X n

Conclusion: The factorization theorem has been used to find a sufficient statistic (for the parameter). Since the total sum appears, we complete the expression to write the result in terms of the sample mean. Both statistics contain the same information about the parameter of the distribution. My notes:

Exercise 12pe-p (*) For population variables X and Y, simple random samples of size n X and nY are taken. Calculate the mean square error of the following estimators, possibly by using proper statistics (involving them) whose sampling distribution is known. (A) For any populations:

¯ X

¯ −Y¯ X

(B) For Bernoulli populations:

η^

η^ X − η^ Y

(C) For normal populations:

V

2

V 2X 2

VY

s

2

s 2X 2

sY

S

2

S 2X 2

SY

Suppose that the two populations are independent. Study the consistency in mean of order two and then the consistency in probability.

Discussion: In this exercise, the most important estimators are involved. The basic properties of the expectation and the variance allows us to calculate the mean square error. In most cases, the estimators will be completed for a proper quantity (with known sampling distribution) to appear, and then use its properties. Although the estimators of the third section can be used for any X and Y, the calculations for normally distributed variables are easier due to the use of additional information—the knowledge about statistics and their sampling distribution. Thus, the results of this section are based on the normality of the variables X and Y. (Some of the quantities are also valid for any variables.) 43

Solved Exercises and Problems of Statistical Inference

The mean square errors are found for static situations, but the idea of limit involves dynamic situations. Statistically speaking, we want to study the behaviour of the estimators when the number of data increases—we can imagine a sequence of schemes where more and more data are added to the samples, that is, with the sample sizes always increasing. (From the mathematical point of view, limits must be studied for any possible way in which the sample sizes tend to infinite.)

Fortunately, the limits of the two-variable functions—sequences, really—that appear in this exercise can easily be solved either by decomposing them into two limits of one-variable functions or by binding the twovariable sequences. That the limits are studied when nX and nY tend to infinite facilitates the calculations (e.g. a constant like –2 is negligible when it appears in a factor).

(A) For any populations ¯ (a1) For the sample mean X It holds that n n ¯ )= E 1 ∑ X j = 1 ∑ E ( X j )= 1 n E ( X )= E ( X )=μ E(X j=1 n j =1 n n

(

)

n n Var ( X ) 1 2 ¯ )=Var 1 ∑ j=1 X j = 12 ∑ j =1 Var ( X j ) = 12 n Var ( X ) = Var ( X = σ n n n n n 2 2 ¯ ) = [ E( X ¯ )−μ ] + Var ( X ¯ )=0 2 + σ = 1 σ 2 MSE( X n n

(

)

Then, ¯ is unbiased for μ, whatever the sample size. • The estimator X ¯ is consistent (in mean of order two and therefore in probability) for μ, since • The estimator X 2

lim n →∞ MSE ( ¯ X ) = lim n→ ∞ σ =0 n

It is sufficient and necessary the sample size tending to infinite—see the mathematical appendix. ¯ −Y¯ (a2) For the difference between the sample means X By using the previous results, ¯ −Y¯ ) = E ( X ¯ )−E ( Y¯ )=μ X −μ Y E(X ¯ −Y¯ ) = Var ( X ¯ )+Var ( Y¯ )= Var ( X

1 2 1 2 σ + σ n X X nY Y

¯ −Y¯ ) = [ E( X ¯ −Y¯ )−(μ X −μ Y ) ] 2 + Var ( X ¯ −Y¯ )= 1 σ 2X + 1 σ 2Y MSE( X nX nY The mean square error of X–Y is the sum of the mean square errors of X and Y. On the other hand, ¯ −Y¯ is unbiased for μX–μY, whatever the sample sizes. • The estimator X •

¯ −Y¯ is consistent (in the mean-square sense and hence in probability) for μX–μY, as The estimator X

44

Solved Exercises and Problems of Statistical Inference

lim n

→∞ X nY →∞

¯ −Y¯ ) = lim n MSE ( X

→∞ X nY →∞

(

σ2X σ 2Y + =0 n X nY

)

It is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix.

(B) For Bernoulli populations (b1) For the sample proportion η^ Since η^ is a particular case of the sample mean, ^ ) = μ=η E (η 2 1 Var ( η^ ) = σ = η(1−η) n n 2 1 ^ = [ E( η)−η ^ ^ MSE( η) ] + Var (η)= η(1−η) n

Then, •

The estimator η^ is unbiased whatever the sample size.



It is consistent for η, being sufficient and necessary the sample size tending to infinite.

(b2) For the difference between the sample proportion η^ X − η^ Y Again, this is a particular case of difference between sample means, ^ X −η^ Y ) = μ X −μ Y =η X −ηY E (η ^ X −η ^ Y) = Var ( η

1 2 1 2 1 1 σ + σ = η (1−ηX )+ ηY (1−ηY ) n X X nY Y n X X nY

^ Y) = MSE( η^ X − η

1 2 1 2 1 1 σ X + σY = ηX (1−ηX )+ ηY (1−ηY ) nX nY nX nY

Then, •

The estimator η^ X − η^ Y is unbiased for ηX–ηY, whatever the sample sizes.



It is also consistent for ηX–ηY, being sufficient and necessary the two sample sizes tending to infinite.

(C) For normal populations (c1) For the variance of the sample V 2 By using T =

nV 2 ∼ χ2n and the properties of the chi-square distribution, 2 σ

(

2 nV E (V 2 ) = E σ n σ2

2

)

( )

2 nV =σ E 2 n σ

2

2

= σ n = σ2 n

2 2 nV2 σ 4 Var nV = σ 4 2 n = 2 σ4 Var ( V 2 ) = Var σ = n σ2 n n2 σ2 n2 2 4 2 2 2 2 2 MSE ( V ) = [ E (V )−σ ] + Var (V ) = σ n

(

)

( )

Then, • The estimator V 2 is unbiased for σ2, whatever the sample size. 45

Solved Exercises and Problems of Statistical Inference

The estimator V 2 is consistent (in mean of order two and therefore in probability) for σ2, since



2

lim n →∞ MSE (V ) = lim n →∞

2 σ4 =0 n

It is sufficient and necessary the sample size tending to infinite—see the mathematical appendix. In another exercise, this estimator is compared with the other two estimators of the variance. (For the expectation, it is easy to find in literature direct calculations that lead to the same value for any variables—not necessarily normal.) 2

VX (c2) For the quotient between the variances of the samples 2 VY V 2X σY2 By using T = 2 2 ∼ F n , n and the properties of the F distribution, VY σX X

E

V 2X

σ 2X V 2X σY2

Y

σ2X

V 2X σ2Y

σ 2X

nY nY σ 2X = 2 = σ Y nY −2 nY −2 σ2Y

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) [( ) ] ( )[ =E

V 2Y

Var

MSE

V 2X V 2Y

V 2X V 2Y

=

σ 2Y V Y2 σ 2X

= Var

= E

σ2Y

σ2X V 2X σ 2Y

V 2Y

[(



σ2X

V Y2 σ 2X 2

σ2X

=

σ 2Y V 2Y σ2X V 2X

E

Var

σ2Y

2

V 2X

+ Var

σ 2Y 2

V 2X σ2Y V Y2 σ 2X

=

2 n 2Y (n X +n Y −2) σ 4X

nY σ 2X = 2 − σY nY −2 σ 2Y

]

2

2

] ( ) σ2X

+

σ 2Y

2

2n 2Y (n X +n Y −2) n X (nY −2)2 (nY −4)

4

nY 2n Y (n X +n Y −2) σ X = −1 + nY −2 n X (nY −2)2 (nY −4) σ 4Y

)

(nY > 4)

n X (nY −2)2 ( nY −4) σ 4Y

σ 2X

V Y2

( nY >2)

(nY > 4)

Then, • The estimator is V 2X /V 2Y biased for σX2/σY2, but it is asymptotically unbiased since lim n

X

→∞ E

nY →∞

V 2X

( ) V 2Y

=lim n

Y

→∞

(

n Y σ2X σ 2X σ 2X 1 = lim = n →∞ nY −2 σ 2Y 2 σ2Y σ2Y 1− nY

)

Y

( )

Mathematically, only nY must tend to infinite. Statistically, since populations can be named and allocated in either order, it is deduced that both sample sizes must tend to infinite. In fact, it is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix. •

The estimator V 2X /V 2Y is consistent (in mean of order two and therefore in probability) for σ X2/σY2, since it is asymptotically unbiased and lim n

X

→∞ Var

nY →∞

V 2X

( ) V 2Y

=

=

4 X 4 Y

σ 4X σY4

lim n

σ lim n σ n

2 n2Y (n X +nY −2)

→∞ X nY →∞

→∞ Y →∞ X

nX (nY −2)2 (nY −4)

−3 −1 Y X −3 −1 Y X X

2 Y

=0

4 X 4 Y

n n 2 n (n X +nY −2) σ = lim n 2 n n n (nY −2) (nY −4) σ n

( n1 + n1 − n 2n ) =0 (1− n2 ) (1− n4 )

2

Y

→∞ Y →∞ X

X

Y

X

2

Y

Y

The numerator tends to zero if and only if so do both sample sizes. In short, it is sufficient and necessary the two sample sizes tending to infinite—this limit has been studied in the mathematical appendix. 46

Solved Exercises and Problems of Statistical Inference

In another exercise, this estimator is compared with the other two estimators of the quotient of variances. (c3) For the sample variance s 2 By using T =

n s2 ∼ χ 2n−1 and the properties of the chi-square distribution, 2 σ

2 2 n s2 σ 2 E n s = σ2 (n−1)= n−1 σ 2 E (s 2) = E σ = 2 n σ2 n n n σ

(

)

( ) 2( n−1) ns ns Var (s ) = Var ( σ = σ Var ( = σ 2(n−1) = σ ) ) n σ n σ n n 2

2

2

2

4

2

2

4

2

MSE ( s 2 )= [E ( s 2 )−σ 2 ]2 + Var ( s 2 ) =

[

2

4

2

2

]

2( n−1) 4 n−1 2 2 1 σ −σ 2 + σ = − 2 σ4 2 n n n n

(

)

Then, • The estimator s 2 is biased but asymptotically unbiased (for σ2), since 2

lim n →∞ E( s ) = lim n→∞

σ )=σ lim ( n−1 n 2

2

1 n

( ) 1−

2

=σ 1 It is sufficient and necessary the sample size tending to infinite—see the mathematical appendix. n→ ∞

The estimator s 2 is consistent (in mean of order two and therefore in probability) for σ2, since



2

lim n →∞ MSE (s ) = lim n →∞

[( ) ]

2 1 4 − σ =0 n n2

It is sufficient and necessary the sample size tending to infinite—see the mathematical appendix. In another exercise, this estimator is compared with the other two estimators of the variance. (For the expectation, it is easy to find in literature direct calculations that lead to the same value for any variables—not necessarily normal.) 2

sX (c4) For the quotient between the sample variances 2 sY By using T = 2

E

( ) sX s2Y

sY2

2

n X (n Y −1) s 2X σ 2Y ∼ Fn nY ( n X −1) s 2Y σ 2X 2

2

(

X

2

−1 , nY −1

=

nY (n X −1) σ2X nY −1 nY (n X −1) σ 2X = n X (nY −1) σ2Y ( nY −1)−2 n X (nY −3) σ 2Y 2

=

=

2

4

nY (n X −1) σ X n2X (nY −1)2 σ 4Y

Var

(

and the properties of the F distribution,

)

n Y ( n X −1) σ X n (n −1) s X σY E X Y 2 n X ( nY −1) σY nY (n X −1) sY2 σ 2X

( ) sX

2

SY σ X

=

=

2

Var

S 2X σ2Y

2

2

n X (nY −1) s X σ Y nY (n X −1) s 2Y σ2X

( nY −1> 2)

)

nY2 (n X −1)2 σ 4X 2( nY −1)2 ( n X −1+n Y −1−2) 2 n 2Y (n X −1)( n X + nY −4) σ4X = n 2X (nY −1)2 σY4 (n X −1)(nY −1−2)2 ( nY −1−4) n 2X (nY −3)2 ( nY −5) σ4Y

47

Solved Exercises and Problems of Statistical Inference

(n Y −1>4)

MSE

( )[( ) {[ s2X s 2Y

s2X

= E



s 2Y

σ2X σ 2Y

2

] ( )[ s2X

n Y ( n X −1) σ 2X σ2X + Var 2 = − n X (nY −3) σ2Y σ 2Y sY

]

2

+

}

2

nY (n X −1) 2 n2Y (n X −1)(n X +nY −4) σ 4X = −1 + n X (nY −3) n 2X (n Y −3)2 ( nY −5) σ4Y

]

2 n2Y ( n X −1)( n X +n Y −4) σ 4X n2X (nY −3)2 (nY −5)

σ4Y

(nY −1>4 )

Then, • The estimator is s 2X / s 2Y biased for σX2/σY2, but it is asymptotically unbiased since 1 s nY (n X −1) σ σ n X nY −n Y σ n X σ 2X lim n →∞ E = lim n →∞ = lim n →∞ = lim n →∞ = n X (nY −3) σ n n −3 n X σ 3 σ2Y s σ n →∞ n →∞ n →∞ X Y n →∞ 1− nY It is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix. 2 X 2 Y

( )

X

Y

[

X Y

2 X 2 Y

]

2 X 2 Y

2 X 2 Y

X Y

1−

X Y

The estimator is s 2X / s 2Y consistent (in mean of order two and therefore in probability) for σ X2/σY2, as it is asymptotically unbiased and



lim n

X

→∞ Var

nY →∞

s 2X

( ) s2Y

= lim n

→∞ X nY →∞

=

=

σ 4X lim n σ4Y n 4 X 4 Y

σ lim n σ n

[ X Y

2 n2Y (n X −1)(n X +nY −4) σ 4X n2X (nY −3)2 (nY −5) →∞ →∞

Y

]

−3 2 n−2 X nY 2 nY (n X −1)( n X + nY −4) −3 2 2 n−2 X nY n X (nY −3) (nY −5)

2 X

σ4Y

(

1 1 − nY n X nY

1 1 4 + − nY n X n X nY

)( 3 5 1− 1− ( n )( n )

→∞ →∞

2

Y

) =0

Y

It is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix. In another exercise, this estimator is compared with the other two estimators of the quotient of variances. (c5) For the sample quasivariance S 2 By using T = 2

E (S ) = E

(

( n−1)S 2 ∼ χ 2n−1 and the properties of the chi-square distribution, 2 σ σ 2 (n−1) S n−1 σ2

Var ( S 2) = Var 2

(

2

)

(

)

2 2 (n−1) S 2 2 = σ E = σ (n−1) = σ 2 n−1 n−1 σ

2 2 σ2 (n−1) S = σ 4 Var ( n−1)S = σ 4 2 (n−1) = 2 σ 4 n−1 n−1 σ2 (n−1)2 σ2 ( n−1)2

)

2

2 2

(

2

MSE ( S ) = [ E ( S )−σ ] + Var ( S ) =

)

2 4 σ n−1

Then, • The estimator S 2 is unbiased for σ2, whatever the sample size. •

The estimator S 2 is consistent (in mean of order two and therefore in probability) for σ2, since 2

lim n →∞ MSE ( s ) = lim n →∞

2 σ4 =0 n−1

It is sufficient and necessary the sample size tending to infinite—see the mathematical appendix. 48

Solved Exercises and Problems of Statistical Inference

In another exercise, this estimator is compared with the other two estimators of the variance. (For the expectation, it is easy to find in literature direct calculations that lead to the same value for any variables—not necessarily normal.) (c6) For the quotient between the sample quasivariances S 2X σ2Y By using T = 2 2 ∼ F n SY σ X 2

E

2

2

2

−1 ,nY −1

X

S 2Y

Var

S X σY

2

nY −1 nY −1 σ X = 2 E 2 2 = 2 = σY SY σ X σ Y (nY −1)−2 nY −3 σ2Y σX

S 2X S 2Y

σ 2X

=

σX

2

S 2X σ 2Y

Var

σ 2Y

=

S 2Y σ 2X

σY4 ( n X −1)(n Y −1−2) 2 (nY −1−4)

MSE

4

2 (nY −1) ( n X +nY −4) σ X 2

( )[( )

[(

)

S 2Y

( nY −1> 4)

4

( n X −1)(nY −3) (nY −5) σ Y

σ2X

S2X

S2X

= E

=

SY2



(n Y −1>2)

σ 4X 2( nY −1)2 ( n X −1+n Y −1−2)

2

=

2

SY

and the properties of the F distribution,

2

( ) ( ) ( )( ) ( ) SX

S 2X

σ 2Y

2

] ( )[

2

]

S2X

n Y −1 σ2X σ 2X 2(nY −1)2 (n X +nY −4 ) σ 4X + Var 2 = − + nY −3 σ 2Y σ2Y SY (n X −1)(nY −3)2 (nY −5) σ 4Y

]

2 nY −1 2 (nY −1)2(n X + nY −4) σ 4X −1 + nY −3 ( n X −1)( nY −3)2( nY −5) σ 4Y

(nY −1> 4)

Then, • The estimator is S 2X /S 2Y biased for σX2/σY2, but it is asymptotically unbiased since 1 1− 2 2 2 S n −1 σ X σ X n Y σ 2X lim n →∞ E X2 =lim n →∞ Y = lim = n →∞ nY −3 σ2Y 3 σ2Y SY σ 2Y n →∞ n →∞ 1− nY Mathematically, only nY must tend to infinite. Statistically, since populations can be named and allocated in either order, it is deduced that both sample sizes must tend to infinite. In fact, it is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix.

( )

X

Y



X Y

(

)

Y

The estimator is S 2X /S 2Y consistent (in mean of order two and therefore in probability) for σ X2/σY2, as it is asymptotically unbiased and 2

lim n

→∞ X nY →∞

=

Var

σ σ

4 X 4 Y

( ) SX 2

SY

lim n

= lim n

X →∞ nY →∞

→∞ X nY →∞

[

2

4

2(nY −1) (nX +nY −4) σ X 2

4

(nX −1)(n Y −3) (nY −5) σY

−1 −3 X Y −1 −3 X Y X

2

n n 2(nY −1) (n X +nY −4) n n (n −1)(n Y −3)2 (nY −5)

=

σ σ

4 X 4 Y

]

lim n

X →∞ nY → ∞

1 2 1− nY

2

1 1 4 + − n Y nX n X nY

( )( ) =0 1 3 5 1− 1− 1− ( n )( n ) ( n ) 2

X

Y

Y

It is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix. In another exercise, this estimator is compared with the other two estimators of the quotient of variances.

Conclusion: For the most important estimators, the mean square error has been calculated either directly (in few cases) or by making a proper statistic appear. The consistencies in mean square error of order two and in 49

Solved Exercises and Problems of Statistical Inference

probability have been proved. Some limits for functions of two variables arised. These kinds of limit are not trivial in general, as there is an infinite amount of ways for the sizes to tend to infinite. Nevertheless, those appearing here could be calculated directly of after doing some simple algebra transformation (multiplying and dividing by the proper quantity, as they were limits of sequences of the indetermined form infinite-over-infinite). On the other hand, it is worth noticing that there are in general several matters to be considered in selecting among different estimators of the same quantity: (a) The error can be measured by using a quantity different to the mean square error. (b) For large sample sizes, the differences provided by the formulas above may be negligible. (c) The computational or manual effort in calculating the quantities must also be taken into account—not all of them requires the same number of operations. (d) We may have some quantities already available. My notes:

Exercise 13pe-p (*) In the following situations, compare the mean square error of the following estimators when simple random samples, taken from normal populations, are considered: (A) V 2 (B)

V 2X V 2Y

s

2

s 2X s 2Y

S

2

S 2X S Y2

(Consider only the case nX = n = nY)

In the second section, suppose that the populations are independent.

Discussion: The expressions of the mean square error of these estimators have been calculated in other exercise. Comparing the coefficients is easy in some cases, but sequences may sometimes cross one another and the comparisons must be done analitically—by solving equalities and inequalities—or graphically. We plot the sequences (lines between dots are used to facilitate the identification). The mean square errors were found for static situations, but the idea of limit involves dynamic situations. By using a computer, it is also possible to study—either analytically or graphically—the asymptotic behaviour of the estimators (but it is not a “whole mathematical proof”). It is worth noticing that the formulas and results of this exercise are valid for normal populations (because of the theoretical results on which they are based); in the general case, the expressions for the mean square error of these estimators are more complex. For two populations, there is an infinite amount of mathematical ways for the two sample sizes to tend to infinite (see the figure); the case nX = n = nY, in the last figure, will be considered.

50

Solved Exercises and Problems of Statistical Inference

(A) For V 2 , s 2 and S 2 The expressions of their mean square error are: 2 1 2 4 2 2 4 MSE ( s 2 ) = − 2 σ 4 σ MSE ( S ) = σ n n−1 n n 4 Since σ appears in all these positive quantities, by looking at the coefficients it is easy to see that, for n is larger than two, 2 2 2 MSE ( s ) < MSE (V ) < MSE ( S )

(

2

MSE ( V ) =

)

That is, sequences—indexed by n—do not cross one another. We can plot the coefficients (they are also the mean square errors when σ=1). # Grid of values for 'n' n = seq(from=2,to=10,by=1) # The three sequences of coefficients coeff1 = 2/n coeff2 = 2/n - 1/(n^2) coeff3 = 2/(n-1) # The plot allValues = c(coeff1, coeff2, coeff3) yLim = c(min(allValues), max(allValues)); x11(); par(mfcol=c(1,4)) plot(n, coeff1, xlim=c(min(n),max(n)), ylim plot(n, coeff2, xlim=c(min(n),max(n)), ylim plot(n, coeff3, xlim=c(min(n),max(n)), ylim plot(n, coeff1, xlim=c(min(n),max(n)), ylim points(n, coeff2, type='b') points(n, coeff3, type='b')

= = = =

yLim, yLim, yLim, yLim,

xlab=' xlab=' xlab=' xlab='

', ', ', ',

ylab=' ylab=' ylab=' ylab='

', ', ', ',

main='Coefficients 1', type='l') main='Coefficients 2', type='b') main='Coefficients 3', type='b') main='All coefficients', type='l')

This code generates the following array of figures:

Asymptotically, the three estimators behave similarly, since

2 1 2 2 − 2≈ ≈ . n n n n−1

V 2X s 2X S 2X (B) For 2 , 2 and 2 V Y sY SY The expressions of their mean square error, when nX = n = nY, are: MSE

V 2X

4 4 2 2 2 n2 (n+ n−2) σ X 4 n(n−1) σ X n n = −1 + = −1 + n−2 n−2 n(n−2)2 (n−4) σ 4Y (n−2)2 (n−4) σ 4Y

( ) {[ ( ) {[ ( ) {[ V 2Y

]

2

} {[ ] } {[ ] } {[ ]

}

(n>4)

}

4 2 4 s2 n( n−1) 2 n2 (n−1)( n+n−4) σ X 4 (n−1)(n−2) σ X n−1 MSE X2 = −1 + = −1 + n−3 n(n−3) sY n2 (n−3)2 ( n−5) σ 4Y (n−3)2 ( n−5) σ4Y

MSE

S2X S 2Y

]

4 4 2 2 2 (n−1)2 (n+n−4 ) σ X 4(n−1)(n−2) σ X n−1 n−1 = −1 + = −1 + n−3 n−3 ( n−1)(n−3)2( n−5) σY4 (n−3)2 (n−5) σ 4Y

]

}

(n−1>4 )

(n−1>4)

For equal sample sizes, the mean square error of the last two estimators is the same (but they may behave differently under other criteria different to the mean square error, e.g. even their expectation). We can plot the coefficients (they are also the mean square errors when σX = σY), for n > 5. 51

Solved Exercises and Problems of Statistical Inference

# Grid of values for 'n' n = seq(from=6,to=15,by=1) # The three sequences of coefficients coeff1 = ((n/(n-2))-1)^2 + (4*n*(n-1))/(((n-2)^2)*(n-4)) coeff2 = (((n-1)/(n-3))-1)^2 + (4*(n-1)*(n-2))/(((n-3)^2)*(n-5)) coeff3 = coeff2 # The plot allValues = c(coeff1, coeff2, coeff3) yLim = c(min(allValues), max(allValues)); x11(); par(mfcol=c(1,4)) plot(n, coeff1, xlim=c(min(n),max(n)), ylim = yLim, xlab=' ', ylab=' plot(n, coeff2, xlim=c(min(n),max(n)), ylim = yLim, xlab=' ', ylab=' plot(n, coeff3, xlim=c(min(n),max(n)), ylim = yLim, xlab=' ', ylab=' plot(n, coeff1, xlim=c(min(n),max(n)), ylim = yLim, xlab=' ', ylab=' points(n, coeff2, type='b') points(n, coeff3, type='b')

', ', ', ',

main='Coefficients 1', type='l') main='Coefficients 2', type='b') main='Coefficients 3', type='b') main='All coefficients', type='l')

This code generates the following array of figures:

This shows that, for normal populations and samples of sizes nX = n = nY, it seems that MSE

V 2X

( ) V 2Y

?

≤ MSE

s 2X

( ) s 2Y

= MSE

S 2X

( ) S2Y

and the sequences do not cross one another. Really, a figure is not a mathematical proof, so we do the following calculations: 2 2 4 n (n−1) ? n−1 4(n−1)(n−2) n −1 + ≤ −1 + 2 2 n−2 (n−2) ( n−4) n−3 (n−3) ( n−5)

(

)

(

4(n−4)+4 n(n−1) ? 4(n−5)+4 (n−1)(n−2) ≤ ( n−2)2 (n−4) (n−3)2 (n−5) (n−2)( n+2) ? (n−3)(n+1) ≤ (n−2)2 (n−4) (n−3)2 (n−5)



)



n−4+ n 2−n ? n2−2 n−3 ≤ 2 2 (n−2) ( n−4) (n−3) (n−5) ?

(n+ 2)(n−3)(n−5)≤(n+1)(n−2)(n−4)

?

n 3−6 n 2−n+ 30≤ n3−5 n 2+ 2 n+8



?

22≤n (n+3)

This inequality is true for n≥4, since it is true for n=4 and the second side increases with n. Thus, we can guarantee that, for n > 5, V 2X s 2X S 2X MSE 2 ≤ MSE 2 = MSE 2 VY sY SY

( )

( )

( )

Asymptotically, by using infinites V 2X lim n →∞ MSE 2 = lim n →∞ VY n →∞ n →∞ X

Y

( )

X Y

=lim n

X →∞ nY → ∞

52

{[( {[(

] }

2 nY 2nY2 ( nX +nY −2) σ 4X −1 + nY −2 n X (nY −2)2 (nY −4) σ 4Y

)

] }

[

]

2 nY 2 n2 (n +n ) σ4X 2( n X +n Y ) σ4X −1 + Y X2 Y =lim =0 n →∞ 4 nY n n nX nY nY σ 4Y σ X Y n →∞ Y

)

X Y

Solved Exercises and Problems of Statistical Inference

lim n

X →∞ nY →∞

MSE

( ) {[ s 2X s 2Y

]

=lim n

X →∞ nY → ∞

lim n

→∞ nY →∞ X

MSE

}

2

nY (n X −1) 2 n2Y (n X −1)(n X +nY −4) σ 4X = −1 + n X (nY −3) n2X ( nY −3)2( nY −5) σY4

S 2X

( ) 2

SY

{[

= lim n

X →∞ nY →∞

=lim n

X →∞ nY → ∞

}

2

nY n X 2n 2Y n X (n X + nY ) σ 4X −1 + =lim n 2 2 4 nX nY n X nY nY σY n

]

{[ {[(

(nY −1>4 )

X Y

→∞ →∞

[

]

2(n X +nY ) σ 4X =0 n X n Y σ4Y

}

2

]

nY −1 2(nY −1)2(n X +n Y −4) σ 4X −1 + 2 4 nY −3 (n X −1)(nY −3) (nY −5) σY

] }

[

]

2 nY 2 n2 (n +n ) σ4X 2( n X +n Y ) σ4X −1 + Y X2 Y =lim =0 n →∞ 4 nY n n nX nY nY σ 4Y σ X Y n →∞ Y

)

X Y

The three estimators behave similarly, since the quantitative behaviour of their mean square errors is characterized by the same limit, namely: 4 2(n X +nY ) σ X lim n →∞ =0 . n X n Y σ4Y n →∞ X

Y

[

]

(It is worth noticing that this asymptotic behaviour arises when the limits are solved by using infinites—this cannot seen when the limits are solved by using other ways.)

Conclusion: The expression of the mean square error of these estimators allow us to compare then, to study their consistency and even their rate of convergence. We have proved the following result: Proposition (1) For a normal population, MSE ( s 2 ) < MSE (V 2 ) < MSE (S 2 ) (2) For two independent normal populations, when nX = n = nY MSE

V 2X

( ) V 2Y

≤ MSE

s 2X

( ) s 2Y

= MSE

S 2X

( ) S2Y

Note: For one population, V 2 has higher error than s 2 , even if the information about the value of the population mean μ is used by the former while it is estimated in the other two estimators. For two populations, the information about the value of the two population means μX and μY is used in the first quotient while they must be estimated in the other two estimators. Either way, the population mean in itself does not play an important role in studying the variance, which is based on relative distances, but any estimation using the same data reduces the amount of information available and the degrees of freedom in a unit. Again, it is worth noticing that there are in general several matters to be considered in selecting among different estimators of the same quantity: (a) The error can be measured by using a quantity different to the mean square error. (b) For large sample sizes, the differences provided by the formulas above may be negligible. (c) The computational or manual effort in calculating the quantities must also be taken into account—not all of them requires the same number of operations. (d) We may have some quantities already available. My notes:

53

Solved Exercises and Problems of Statistical Inference

Exercise 14pe-p (*) For population variables X and Y, simple random samples of size n X and nY are taken. Calculate the mean square error of the following estimators (use results of previous exercises). 1 ^ ) ( η^ + η 2 X Y

(A) For two independent Bernoulli populations: (B) For two independent normal populations: 1 2 1 2 2 2 (V +V Y ) (s + s ) 2 X 2 X Y where nX η ̂ X + nY η̂ Y n X V 2X +nY V 2Y 2 η̂ p= V p= n X + nY n X +nY

η^ p

1 2 2 (S + S ) 2 X Y 2

2

2

Vp 2

n X s X + nY s Y s = n X + nY 2 p

2

sp

Sp 2

2

(n X −1) S X +( nY −1)S Y S = n X +nY −2 2 p

(Similarly for Y.) Try to compare the mean square errors. Study the consistency in mean of order two and then the consistency in probability.

Discussion: The expressions of the mean square error of the basic estimators involved in this exercise has been calculated in another exercise, and they will be used in calculating the mean square errors of the new estimators. The errors are calculated for static situations, but limits are studied in dynamic situations

Comparing the coefficients is easy in some cases, but sequences can sometimes cross one another and the comparisons must be done analitically—by solving equalities and inequalities—or graphically. By using a computer, it is also possible to study—either analytically or graphically—the behaviour of the estimators. The results obtained here are valid for two independent Bernoulli populations and two independent normal populations, respectively. On the other hand, we must find the expression of the error for the new estimators based on semisums:

[(

2

] (

1 1 1 MSE ( θ^1 + θ^2 ) = E ( θ^1 + θ^2) −θ +Var ( θ^1 + θ^2 ) 2 2 2 and, for unbiased estimators, 1 1 MSE ( θ^1 + θ^2 ) = 0+ [Var ( θ^1)+Var ( θ^2 )] 2 4

(

)

(

(A) For Bernoulli populations:

)

)

1 ^ ) ( η^ + η 2 X Y

(a1) For the semisum of the sample proportions

and η^ p

1 ^ ) ( η^ + η 2 X Y

By using previous results and that μ=η and σ2=η(1–η), E

( 12 ( η^ + η^ )) = 12 [ E( η^ )+ E( η^ )]= 12 (η +η )=η X

Y

X

Y

54

X

Y

Solved Exercises and Problems of Statistical Inference

)

) η (1−η ) 1 1 1 + ( 12 ( η^ + η^ )) == 12 [ Var ( η^ )+Var ( η^ )]= 14 ( η (1−η )= 4 ( n + n ) η(1−η) n n 1 1 1 η (1−η ) η (1−η ) 1 1 1 MSE ( ( η^ + η^ )) = [ (η +η )−μ ] + + )= 4 ( n + n ) η(1−η) 2 2 4( n n X

Var

X

Y

X

X

Y

Y

Y

X

Y

X

Y

2

X

X

Y

X

X

Y

Y

Y

X

Then, • •

Y

X

Y

1 ^ ) is unbiased for μ, whatever the sample sizes. ( η^ + η 2 X Y 1 ^ ) is consistent (in the mean-square sense and therefore in probability) for η. The estimator ( η^ + η 2 X Y 1 1 1 1 lim n →∞ MSE ( η^ X + η^ Y ) = lim n → ∞ + η(1−η) 2 4 n X nY n →∞ n →∞ The estimator

(

X

Y

)

[( )

X Y

]

It is sufficient and necessary that both sample sizes must tend to infinite—see the mathematical appendix. (a2) For the pooled sample proportion η^ p Firstly, we write η^ p=

1 ^ ). Now, by using previous results, (n η^ +n η n X + nY X X Y Y

E (η ^ p) =

n η +n η 1 [nX E (η ^ X )+n Y E ( η ^ Y )]= X X Y Y =η n X +n Y n X +n Y

Var ( η ^ p) =

n η (1−ηX )+ nY ηY (1−ηY ) 1 1 [n2X Var ( η^ X )+ nY2 Var ( η ^ Y )]= X X = η(1−η) 2 2 n (n X + nY ) (n X +nY ) X + nY

MSE( η^ p ) =

(

2 n X ηX +nY ηY n η (1−ηX )+nY ηY (1−ηY ) 1 −η + X X = η(1−η) 2 n X + nY n X + nY (n X +nY )

)

Then, • The estimator η^ p is unbiased for η, whatever the sample sizes. •

The estimator η^ p is consistent (in mean of order two and therefore in probability) for η, since η(1−η) lim n →∞ MSE ( η^ p ) = lim n →∞ =0 n X + nY n →∞ n →∞ X

X

Y

Y

If the mean square error is compared with those of the two populations, we can see that the new denominator is the sum of both sample sizes. Again, it is worth noticing that it is sufficient and necessary at least one sample size tending to infinite, but not both. In this case, the denominator tends to infinite. The interpretation of this fact is that, in estimating, one sample can do “the whole work.” (a3) Comparison of

1 ^ ) and η^ p ( η^ + η 2 X Y

Case nX = n = nY MSE

^ ) = MSE ( η ( 12 ( η^ + η^ )) = η(1−η) 2n X

Y

p

1 ^ + η^ ) in this case. In fact, by looking at the expressions of the estimators themselves, η^ p= ( η 2 X Y General case The expressions of their mean square error are (the sample proportion is unbiased): 55

Solved Exercises and Problems of Statistical Inference

MSE

(

1 1 1 1 ^ Y) = ( η^ X + η + η(1−η) 2 4 n X nY

) (

)

MSE( η^ p ) =

1 η(1−η) n X + nY

Then n +n 1 1 1 1 ↔ (n X +nY ) X Y ≤4 ↔ n 2X + n2Y + 2 n X n Y ≤4 n X nY ↔ (n X −nY )2≤0 + ≤ n X nY 4 n X nY n X + nY

(

(

)

)

Then, the pooled estimator is always better or equal than the semisum of the sample proportions. Both estimators have the same mean square error—their behaviour may be different under other criteria different to the mean square error—only when nX=nY. Besides, Thus, (nX–nY)2 can be seen as a measure of the convenience of using the pooled sample proportion, since it shows how different the two errors are. The inequality also shows a symmetric situation, in the sense that it does not matter which sample size is bigger: the measure depends on the difference. We have proved the following result: Proposition For two independent Bernoulli populations with the same parameter, the pooled sample proportion has smaller or equal mean square error than the semisum of the sample proportions. Besides, both are equivalent only when the sample sizes are equal. We can plot the coefficients (they are also the mean square errors when η(1– η)=1) for a sequence of sample sizes, indexed by k, such that nY(k)=2nX(k), for example (but this only one possible way for the sample sizes to tend to infinite): # Grid of values for 'n' c = 2 n = seq(from=2,to=10,by=1) # The sequences of coefficients coeff1 = (1 + 1/c)/(4*n) coeff2 = 1/((1+c)*n) # The plot allValues = c(coeff1, coeff2) yLim = c(min(allValues), max(allValues)); x11(); par(mfcol=c(1,3)) plot(n, coeff1, xlim=c(min(n),max(n)), ylim = yLim, xlab=' ', ylab=' ', main='Coefficients 1', type='l') plot(n, coeff2, xlim=c(min(n),max(n)), ylim = yLim, xlab=' ', ylab=' ', main='Coefficients 2', type='b') plot(n, coeff1, xlim=c(min(n),max(n)), ylim = yLim, xlab=' ', ylab=' ', main='All coefficients', type='l') points(n, coeff2, type='b')

This code generates the following array of figures:

The reader can repeat this figure by using values closer to and farther from 1 than c(k) = 2.

(B) For normal populations (b1) For the semisum of the variance of the samples

1 2 2 (V +V Y ) 2 X

By using previous results,

56

Solved Exercises and Problems of Statistical Inference

( 12 (V +V )) = 12 [ E (V )+ E (V ) ]= 12 (σ +σ )=σ 1 1 1 σ σ 1 1 1 Var ( (V +V ) ) = [ Var ( V ) +Var ( V ) ]= ( + )= ( + )σ 2 2 n n 2 n n 2 2 X

E

2 Y

2 X

MSE Then,

(

2 X

2 Y

2 Y

2 X

2 X

2

2 Y

2

4 X

4 Y

X

Y

2 Y

X

4

Y

4 4 2 1 2 1 2 1 σX σY 1 1 1 4 2 2 2 (V X +V Y ) = ( σ X +σ Y )−σ + + = + σ 2 2 2 nX nY 2 n X nY

] (

) [

) (

)

1 2 2 (V +V Y ) is unbiased for σ2, whatever the sample sizes. 2 X 1 2 2 The estimator (V X +V Y ) is consistent (in the mean-square sense and therefore in probability) for σ2 2 since, 1 1 1 4 lim n →∞ MSE ( η^ p ) = lim n →∞ + σ =0 2 nX nY n →∞ n →∞ The estimator

• •

X

X

Y

Y

(

)

It is sufficient and necessary that both sample sizes must tend to infinite—see the mathematical appendix. 1 2 2 (s + s ) 2 X Y

(b2) For the semisum of the sample variances By using previous results, E

(

1 2 2 1 1 n X −1 2 nY −1 2 1 n X −1 nY −1 2 ( s X + s Y ) = E ( s 2X ) + E ( s 2Y ) = σX + σY = + σ 2 2 2 nX nY 2 nX nY

Var

)

(

[

(

]

) (

)

] [

[

]

1 2 2 1 1 n X −1 4 nY −1 4 1 (n X −1) (nY −1) 4 (s X + sY ) = 2 Var ( s 2X ) +Var ( s 2Y ) = σ X + 2 σY = + σ 2 2 n2X 2 2 nY n2X n2Y

)

[

]

) [(

)

[(

2

2

] [

1 1 n X −1 2 n Y −1 2 1 n X −1 4 n Y −1 4 MSE (s2X + s 2Y ) = σX+ σY −σ 2 + σ X + 2 σY 2 2 nX nY 2 n2X nY

(

] [

]

]

1 n X −1 n Y −1 2 1 n X −1 n Y −1 4 = + σ −σ 2 + + 2 σ 2 nX nY 2 n 2X nY

{[

)

1 1 1 = − + 2 n X nY

(

[

n X nY2 −nY2 + n2X nY −n2X +2 4 n 2X n2Y

]

[

=

1 n X + nY ( n X −n Y ) 4 1 1 1 ( n X −n Y ) − σ= + − σ4 2 2 2 2 2 n X nY 2 n n 2 n X nY 2 n X nY X Y

2

2

2

2 Y

4n n

[

The estimator

2 X 2

] [

]} [

(n X + nY )2 n X n2Y −n 2Y + n2X nY −n2X 4 σ= +2 σ 4 n 2X nY2 4 n 2X n 2Y 4

=

2

2 n X n Y + 2 n X nY + 2 n X nY −n X −n Y

Then, •

)]

2

4

σ=

2

2 n X nY (n X + nY )−(n X −nY ) 2 X

4n n

] [

2

2 Y

]

σ4

]

1 2 2 ( s + s ) is biased but asymptotically unbiased for σ2, since 2 X Y 1 2 2 1 n X −1 nY −1 2 21 2 lim n →∞ E ( s X +s Y ) = σ lim n → ∞ + =σ (1+1 )=σ 2 2 nX nY 2 n →∞ n →∞ X

Y

57

(

)

X Y

(

)

Solved Exercises and Problems of Statistical Inference



It is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix. 1 2 2 The estimator (s + s ) is consistent (in the mean-square sense and therefore in probability) for σ2, 2 X Y because it is asymptotically unbiased and 1 1 n X −1 nY −1 lim n →∞ Var (s 2X + s2Y ) = σ4 lim n → ∞ + 2 =0 2 2 2 n nY n →∞ n →∞ X

(

X

Y

)

(

X Y

)

Again, it is sufficient and necessary the two sample sizes tending to infinite—see the mathematical appendix. 1 2 2 (S + S ) 2 X Y

(b3) For the semisum of the sample quasivariances By using previous results,

( 12 (S + S )) = 12 [ E ( S )+ E ( S ) ]= 12 (σ + σ )=σ σ 1 1 1 σ 1 1 1 Var ( (S +S )) = [ Var ( S ) +Var ( S ) ]= ( + = ( + σ ) 2 2 n −1 n −1 2 n −1 n −1 ) 2 2 X

E

2 Y

2 X

MSE Then, • •

(

2 X

2 Y

2 Y

2 X

2 X

2

2 Y

2

4 X

2 Y

4 Y

X

Y

X

4

Y

4 2 σ4 1 2 2 1 2 1 σX 1 1 1 2 2 4 (S X + S Y ) = ( σ X + σY )−σ + + Y = + σ 2 2 2 n X −1 n Y −1 2 nX −1 nY −1

] (

) [

) (

)

1 2 2 (S + S ) is unbiased for σ2, whatever the sample sizes. 2 X Y 1 2 2 The estimator (S X + S Y ) is consistent (in the mean-square sense and therefore in probability) for σ2 2 since, 1 1 1 1 lim n →∞ MSE (S2X +S 2Y ) = lim n → ∞ + σ 4=0 2 2 n X −1 nY −1 n →∞ n →∞ The estimator

(

X

Y

)

X Y

(

)

It is sufficient and necessary both sample sizes tending to infinite—see the mathematical appendix. (b4) For the pooled variance of the samples V 2p 2

2 We can write V p=

2

n X V X +nY V Y 1 = ( n V 2 + n V 2 ) . By using previous results, n X +nY n X +n Y X X Y Y 2

2

E (V p ) =

2

2

2

n X E(V X )+n Y E(V Y ) n X σ X +n Y σ Y 2 = =σ n X + nY n X + nY 2

2

2

2

4

4

n Var (V X )+ nY Var (V Y ) n σ +n σ 2 Var (V )= X =2 X X Y 2 Y = σ4 2 n X + nY ( n X +n Y ) (n X + nY ) 2 p

2

n X σ 2X +nY σ2Y n X σ4X + nY σ 4Y 2 2 4 MSE( V ) = −σ + 2 = σ 2 n X +n y n X +n y (n X +n y ) 2 p

(

)

Then, • The estimator V 2p is unbiased for σ2, whatever the sample sizes. •

The estimator V 2p is consistent (in mean of order two and therefore in probability) for σ2, since 58

Solved Exercises and Problems of Statistical Inference

lim n

2

→∞ nY →∞ X

4

MSE (V p) = σ lim n

→∞ nY → ∞ X

2 =0 n X +nY

It is worth noticing that it is sufficient and necessary at least one sample size tending to infinite, but not both. In this case, the denominator tends to infinite. The interpretation of this fact is that, in estimating, one sample can do “the whole work.” (b5) For the pooled sample variance s 2p n X s 2X + nY s 2Y 1 = (n X s 2X + nY s 2Y ). By using previous results, We can write s = n X + nY n X + nY 2 p

2

2

E ( s p )=

2

2

2

n X E( s X )+ nY E ( sY ) (n X −1) σ X +(nY −1)σY n X + nY −2 2 = = σ n X + nY n X + nY n X + nY 2

2

2

2

4

4

n X Var (s X )+nY Var ( s Y ) (n X −1)σ X +( nY −1) σY n X + nY −2 4 Var ( s ) = =2 =2 σ 2 2 (n X + nY ) ( n X + nY ) (n X + nY )2 2 p

[

2

]

2

n + n −2 2 n +n −2 (n +n −2−n X −nY ) n +n −2 2 MSE(s ) = X Y σ −σ 2 + 2 X Y 2 σ4 = X Y + 2 X Y 2 σ 4= σ4 2 n X +n Y n + n (n X +nY ) (n X + nY ) (n X +nY ) X Y 2 p

(

)

Then, • The estimator s 2p is biased for σ2, but asymptotically unbiased n +n −2 2 n +n lim n →∞ X Y σ = lim n →∞ X Y σ 2 = σ2 n X + nY n X +nY n →∞ n →∞ X

Y

(

)

(

X

Y

)

(The calculation above for the mean suggests that a –2 in the denominator of the definition would provide an unbiased estimator—see the estimator in the following section.) •

The estimatoris s 2p consistent (in mean of order two and therefore in probability) for σ2, since 2 2 4 lim n →∞ MSE (s p ) = σ lim n →∞ =0 n +nY n →∞ n →∞ X X

X

Y

Y

It is worth noticing that it is sufficient and necessary at least one sample size tending to infinite, but not both. In this case, the denominator tends to infinite. The interpretation of this fact is that, in estimating, one sample can do “the whole work.” (b6) For the (bias-corrected) pooled sample variance S 2p 2

2 We can write S p =

2

(n X −1) S X +(nY −1) S Y 1 = [ (n −1)S 2X +(n Y −1) S 2Y ] . By using previous results, n X +nY −2 n X + nY −2 X 2

E ( S 2p) =

2

2

2

( n X −1) E (S X )+( nY −1) E ( S Y ) ( n X −1) σ X +( nY −1)σY = =σ 2 n X + nY −2 n X +nY −2

(n X −1)2 Var ( S 2X )+( nY −1)2 Var (S 2Y ) (n X −1) σ4X +( nY −1) σ4Y 2 Var ( S ) = =2 = σ4 2 2 n X +n Y −2 (n X + nY −2) (n X + nY −2) 2 p

[

2

2

2

]

4

4

(n X −1)σ X +(nY −1) σY (n −1) σ X +(nY −1)σ Y 2 MSE( S ) = −σ2 + 2 X = σ4 2 n X +n Y −2 n +n −2 (n X + nY −2) X Y 2 p

Then, • The estimator S 2p is unbiased for σ2, whatever the sample sizes.

59

Solved Exercises and Problems of Statistical Inference



The estimator S 2p is consistent (in mean of order two and therefore in probability) for σ2, since lim n

X

2 →∞ MSE (S p ) = lim n

X →∞ nY →∞

nY →∞

4

2σ =0 nX +nY −2

It is worth noticing that it is sufficient and necessary at least one sample size tending to infinite, but not both. In this case, the denominator tends to infinite. The interpretation of this fact is that, in estimating, one sample can do “the whole work.” (b7) Comparison of

1 2 2 (V X +V Y ) , 2

1 2 2 (s + s ) , 2 X Y

1 2 2 (S X + S Y ) , 2

V 2p ,

s 2p and S 2p

Case nX = n = nY

( 12 (V + V )) = 12 ( 2 1n ) σ = 1n σ 1 1 1 1 MSE ( ( s + s ) ) = (2 −0 ) σ = σ 2 2 n n 1 1 1 1 MSE ( (S + S ) ) = (2 σ= σ 2 2 n−1 ) n−1 2 X

MSE

2 X

2 Y

4

2 Y

2 X

4

4

2 Y

4

4

4

2 4 1 4 σ = σ 2n n 2 4 1 4 2 MSE(s p) = σ = σ 2n n 2 1 2 4 4 MSE(S p) = σ= σ 2 n−2 n−1 2

MSE( V p) =

Since σ4 appears in all these positive quantities, by looking at the coefficients it is easy to see the relation MSE

( 12 ( s + s )) = MSE ( 12 (V 2 X

2 Y

2 X

2

)

2

2

2

+V Y ) = MSE (V p) = MSE ( s p ) < MSE (S p) = MSE

( 12 (S

2 X

2

)

+ SY )

(For individual estimators, the order MSE ( s 2 ) < MSE (V 2 ) < MSE ( S 2 ) was obtained in other exercise.) This relation has been obtained for the case nX = n = nY and (independent) normal populations. We can plot the coefficients (they are also the mean square errors when σ=1). # Grid of values for 'n' n = seq(from=10,to=20,by=1) # The three sequences of coefficients coeff1 = 1/n coeff2 = coeff1 coeff3 = 1/(n-1) coeff4 = coeff1 coeff5 = coeff1 coeff6 = coeff3 # The plot allValues = c(coeff1, coeff2, coeff3, coeff4, yLim = c(min(allValues), max(allValues)); x11(); par(mfcol=c(1,7)) plot(n, coeff1, xlim=c(min(n),max(n)), ylim = plot(n, coeff2, xlim=c(min(n),max(n)), ylim = plot(n, coeff3, xlim=c(min(n),max(n)), ylim = plot(n, coeff4, xlim=c(min(n),max(n)), ylim = plot(n, coeff5, xlim=c(min(n),max(n)), ylim = plot(n, coeff6, xlim=c(min(n),max(n)), ylim = plot(n, coeff1, xlim=c(min(n),max(n)), ylim = points(n, coeff2, type='l') points(n, coeff3, type='b') points(n, coeff4, type='l') points(n, coeff5, type='l') points(n, coeff6, type='b')

coeff5, coeff6) yLim, yLim, yLim, yLim, yLim, yLim, yLim,

xlab=' xlab=' xlab=' xlab=' xlab=' xlab=' xlab='

', ', ', ', ', ', ',

ylab=' ylab=' ylab=' ylab=' ylab=' ylab=' ylab='

', ', ', ', ', ', ',

main='Coefficients 1', type='l') main='Coefficients 2', type='l') main='Coefficients 3', type='b') main='Coefficients 4', type='l') main='Coefficients 5', type='l') main='Coefficients 6', type='b') main='All coefficients', type='l')

This code generates the following array of figures:

60

Solved Exercises and Problems of Statistical Inference

By using this code, it is also possible to study—either analytically or graphically—the asymptotic behaviour of these estimators (but only with simulated data of some particular distributions for X, what would not be a “whole mathematical proof”). It is worth noticing that the formulas obtained in this exercise are valid for normal populations (because of the theoretical results on which they are based). In the general case, the expressions for the mean square error of these estimators are more complex. General case The expressions of their mean square error are:

( 12 ( V + V )) = 12 ( n1 + n1 ) σ 1 1 1 1 (n −n ) MSE ( ( s + s )) = + − σ 2 2 [n n 2n n ] 1 1 1 1 MSE ( (S + S ) ) = ( + σ 2 2 n −1 n −1 ) 2 X

MSE

2 Y

4

X

Y

2

2 X

2 Y

X

X

2 X

2 X

Y

2 Y

2

2

MSE( s p) = 2

MSE(S p) =

4

4

X

MSE(V p) =

Y 2 Y

Y

2 4 σ n X +n y

2 4 σ n X +n y 2 4 σ n X +nY −2

We have simplified the expressions as much as possible, and now a general comparison can be tacked by doing some pairwise comparisons. Firstly, by looking at the coefficients MSE

( 12 ( s + s )) ≤ MSE ( 12 (V 2 X

2 Y

2 X

2

)

+V Y ) < MSE

( 12 (S

2 X

2

)

+ SY )

and the equality is reached only when nX = n = nY. On the other hand,

MSE (V 2p ) = MSE ( s 2p ) < MSE (S 2p) Now, we would like to allocate V 2p , 1 2 2 (V +V Y ) , 2 X 2 1 1 1 ≤ + n X +nY 2 n X nY

(

s 2p and S 2p in the first chain. To compare V 2p and s 2p with

) ↔ 4 n n ≤(n +n )

2

X

61

Y

X

Y

↔ 4 n X nY ≤ n 2X + n2Y +2 n X n Y ↔ 0≤(n X −n Y )2

Solved Exercises and Problems of Statistical Inference

That is, 2

2

MSE ( V p ) = MSE (s p ) ≤ MSE

( 12 (V

2 X

and the equality is attained only when nX = n = nY. To compare S 2p with 2 1 1 1 ≤ + n X +nY −2 2 n X nY

(

2

+V Y )

)

1 2 2 (V +V Y ) 2 X

) ↔ 4 n n ≤(n +n )(n + n −2) ↔ 2(n +n )≤(n −n )

2

X

Y

X

Y

X

Y

X

Y

X

Y

That is,

{

( 12 (V 1 MSE (S ) ≥ MSE ( (V 2 MSE (S 2p ) ≤ MSE 2 p

2 X 2 X

) +V ) ) +V 2Y ) 2 Y

if 2(n X + nY )≤( n X −n Y )2 if 2(n X + nY )≥( n X −n Y )2

Intuitively, in the region around the bisector line the difference of the sample means is small, and therefore the pooled sample variance is worse; on the other hand, in the complementary region the square of the difference is bigger than twice the sum of the sizes, and, therefore, the pooled sample variance is better. The frontier seems to be parabolic. Some work can be done to find the frontier determined by the equality and the two regions on both sided—this is done in the mathematical appendix. Now, we write some “force-based” lines for the computer to plot these points in the frontier: N = 100 vectorNx = vector(mode="numeric", length=0) vectorNy = vector(mode="numeric", length=0) for (nx in 1:N) { for (ny in 1:N) { if (2*(nx+ny)==(nx-ny)^2) { vectorNx = c(vectorNx, nx); vectorNy = c(vectorNy, ny) } } } plot(vectorNx, vectorNy, xlim = c(0,N+1), ylim = c(0,N+1), xlab='nx', ylab='ny', main=paste('Frontier of the region'), type='p')

To compare S 2p with

1 2 2 (S X + S Y ) 2

2 1 1 1 ≤ + n X +nY −2 2 n X −1 nY −1

(

) ↔ 4 (n −1)(n −1)≤(n +n −2)

2

X

Y

X

Y

↔ 4 n X nY −4 n X −4 nY +4 ≤ n2X +n2Y + 2 n X nY +4−4 n X −4 nY ↔ 0≤( n X −n Y )2 That is, 2

MSE ( S p ) ≤ MSE

( 12 ( S + S )) 2 X

2 Y

and the equality is attained only if the sample sizes are the same. We can summarize all the results of this section in the following statement: 62

Solved Exercises and Problems of Statistical Inference

Proposition For two independent normal populations, when nX = n = nY (a) MSE

( 12 (s + s ))=MSE ( 12 (V 2 X

2 Y

2 X

)

2

2

2

2

+ V Y ) = MSE (V p )=MSE (s p) < MSE (S p)=MSE

( 12 (S

2 X

2

)

+ SY )

In the general case, when the sample sizes can be different, (b) MSE

( 12 (s + s )) ≤ MSE ( 12 (V 2 X

2 Y

2 X

2

)

+V Y ) < MSE

( 12 (S

2 X

2

)

+ SY )

(c) MSE (V 2p ) = MSE (s 2p ) < MSE (S 2p)

( 12 (V +V )) 1 MSE ( S ) ≤ MSE ( (V +V ) ) if 2(n + n )≤(n −n ) 2 (e) 1 MSE ( S ) ≥ MSE ( (V +V ) ) if 2(n + n )≥(n −n ) 2 1 (f) MSE ( S ) ≤ MSE ( (S + S )) 2 2

2

(d) MSE ( V p ) = MSE (s p ) ≤ MSE

{

2 X

2 Y

2 p

2 X

2 Y

X

Y

X

Y

2 p

2 X

2 Y

X

Y

X

Y

2 p

2 X

2

2

2 Y

In (b), (d) and (f), the equality is attained when nX = n = nY. 1 2 2 (s + s ) , but I have not managed to solve 2 X Y the inequalities. On the other hand, these relations show that, for two independent normal populations, there exist estimators with smaller mean square error than the pooled sample variance S 2p . Nevertheless, there are other criteria different to the mean square error, and, additionally, the pooled sample variance has also some advantages (see the advanced theory at the end). Note: I have tried to compare V 2p ,

s 2p and S 2p with

Conclusion: For some pooled estimators, the mean square errors have been calculated either directly or making a proper statistic appear. The consistencies in mean square error of order two and in probability have been proved. By using theoretical expressions for the mean square error, the behaviour of the pooled estimators for the proportion (Bernoulli populations) and for the variance (normal populations) have been compared with “natural” estimators consisting in the semisum of the individual estimators for each population. Once more, it is worth noticing that there are in general several matters to be considered in selecting among different estimators of the same quantity: (a) The error can be measured by using a quantity different to the mean square error. (b) For large sample sizes, the differences provided by the formulas above may be negligible. (c) The computational or manual effort in calculating the quantities must also be taken into account—not all of them requires the same number of operations. (d) We may have some quantities already available.

Advanced Theory: The previous estimators can be written as a sum ω X θ^ X +ω Y θ^ Y with weights ω=(ω X ,ω Y ) measure of the on the sample increase in the

such that ω X + ωY =1. As regards the interpretation of the weights, they can be seen as a importance that each estimator is given in the global formula. For some weights that depends sizes, it is possible for one estimator to adquire all the importance when the sample sizes proper way. On the contrary, when the weights are constant the possible effect—positive or 63

Solved Exercises and Problems of Statistical Inference

negative—due to each estimator is bounded. The errors were calculated when the data are representative of the population, but if the quality of one sample is always small, the other sample cannot do the whole estimation if the weights do not depend on the sizes. My notes:

[PE] Methods and Properties Exercise 1pe We have reliable information that suggests the probability distribution with density function 2 f ( x ; θ) = 2 (θ−x ) , x ∈[0, θ] , θ as a model for studying the population quantity X. Let X = (X1,...,Xn) be a simple random sample. (a) Apply the method of the moments to find an estimator θ̂ M of the parameter θ. (b) Calculate the bias and the mean square error of the estimator θ̂ M . (c) Study the consistency of θ̂ M . (d) Try to apply the maximum likelihood method to find and estimator θ^ ML of the parameter θ. (e) Obtain estimators of the mean and the variance. Hint: (i) Use that μ = E(X) = θ/3 and E(X 2) = θ2/6.

Discussion: This statement is mathematical. The assumptions are supposed to have been checked. We are given the density function of the distribution of X (a dimensionless quantity). The exercise involves two methods of estimation, the definition of the bias, the mean square error and the sufficient condition for the consistency (in probability). The two first population moments are provided. Note: If E(X) and E(X2) had not been given in the statement, they could have been calculated by applying the definition and solving the integrals, +∞

θ

E ( X )=∫−∞ x f ( x ; θ)dx=∫0 x 2 θ

3 θ

2(θ−x ) 2 dx= 2 2 θ θ

( [ ] [ ]) (

2 x x = 2 θ − 2 0 3 θ

=

0

+∞

θ

θ

θ

( [ ] [ ]) ( =

0

θ

x θ dx−∫0 x 2 dx 0

)

2 θ 2 θ3 1 1 θ − =θ 2 = θ 2 2 3 6 3 θ

E ( X 2)=∫−∞ x 2 f (x ; θ)dx=∫0 x 2 2 x3 x4 = 2 θ − 3 0 4 θ

θ

(∫

)

2(θ− x) 2 dx= 2 2 θ θ

θ

(∫

0

θ

x 2 θ dx−∫0 x 3 dx

)

2 θ3 θ4 4 3 2 1 θ − =2 θ 2 − = θ 2= θ 2 2 3 4 3⋅4 4⋅3 12 6 θ

)

(

)

(a) Method of the moments (a1) Population and sample moments The distribution has only one parameter, so one equation suffices. By using the information in the hint:

64

Solved Exercises and Problems of Statistical Inference

1 μ1 (θ )= θ 3

and

(a2) System of equations

μ1 (θ )=m 1 (x 1 , x 2 ,... , x n )



m1 (x 1 , x 2 ,... , x n )=

1 1 n θ= ∑ j=1 x j = ¯x 3 n

1 n x = ¯x n ∑ j =1 j



θ0 =

3 n x =3 ¯x n ∑ j =1 j

(a3) The estimator It is obtained after substituting the lower case letters xj by upper case letters Xi: 3 θ^ M = ∑ j =1 X j =3 X¯ n n

(b) Bias and mean square error (b1) Bias To apply the definition b ( θ̂ M ) =E (θ̂ M )−θ we need to calculate the expectation: ¯ )=3 E ( X )=3 θ =θ E ( θ^ M ) =E ( 3 X¯ )=3 E ( X 3 where we have used the properties of the expectation, a property of the sample mean and the information in the statement. Now b ( θ^ M ) = E ( θ^ M ) −θ = θ−θ = 0 and we can see that the estimator is unbiased (we could see it also from the calculation of the expectation). (b2) Mean square error We do not usually apply the definition MSE ( θ̂ M ) = E ( ( θ̂ M −θ)2 ) but a property derived from it, for which we need to calculate the variance: 2

2 3 2 θ2 θ 2 Var ( X ) 2 E ( X )− E ( X ) ^ ¯ Var ( θ M ) = Var ( 3 X ) = 3 =3 = − n n n 6 3

[

( )] 2

=

2 3 2 θ2 1 1 3 2 θ2 1 − = =θ n 6 9 n 18 2 n

(

)

where we have used the properties of the variance, a property of the sample mean and the information in the statement. Then 2 2 2 2 MSE ( θ^ M ) = b ( θ^ M ) + Var ( θ^ M ) = 0 + θ = θ 2n 2n

(c) Consistency ̂ We try applying the sufficient condition lim n →∞ MSE ( θ)=0 or, equivalently,

̂ lim n →∞ b( θ)=0 . Since ̂ lim n →∞ Var ( θ)=0

{

2

lim n →∞ MSE ( θ^ M ) = lim n→ ∞ θ = 0 2n

it is concluded that the estimator is consistent (in mean of order two and hence in probability) for estimating θ.

(d) Maximum likelihood method (d1) Likelihood function: The density function is f ( x ; θ)=

65

2 (θ−x ) for 0≤x ≤θ , so θ2

Solved Exercises and Problems of Statistical Inference

n

L( x 1 , x 2 , ... , x n ; θ)=∏ j =1 f (x j ; θ)=

n 2n (θ−x j ) 2 n ∏ j =1 θ

(d2) Optimization problem: First, we try to find the maximum by applying the technique based on the derivatives. The logarithm function is applied, n

log[ L( x 1 , x 2 , ... , x n ; θ)]=n log(2)−2 n log(θ)+ ∑ j =1 log(θ−x j ) and the first condition leads to a useless equation: n d 1 1 0= log[ L(x 1 , x 2 , ... , x n ; θ)]=0−2 n θ + ∑ j=1 dθ θ− x j



?

Then, we realize that global minima and maxima cannot always be found through the derivatives (only if they are also local extremes). In this case, it is difficult even to know whether L monotonically decreases with θ or not, since part of L increases and another decreases—which one changes more? We study the j-th element of the product, that is, f(xj;θ). Its first derivative is 2

θ −(θ− x j)2 θ θ(2 x j −θ) =2 so it has an extreme in θ=2 x j 4 4 θ θ This implies that L is the product of n terms having the extreme in a different way, so L does not change monotonically with the parameter θ. f ' ( x j ; θ)=2

(d3) The estimator: →

?

(e) Estimators of the mean and the variance To obtain estimators of the mean, we take into account that μ=E ( X )= θ and apply the plug-in principle: 3 ^θ M 3 X ¯ θ^ max {X } μ^ = = μ^ M = = = X¯ 3 3 3 3 ML

j

j

ML

2

2 To obtain estimators of the variance, since σ =Var ( X )= θ 6 2 ¯ )2 2( X¯ )2 θ^ (2 X σ^ 2M = M = = 6 6 3

σ^ 2ML = ?

Conclusion: The method of the moment is applied to obtain an estimator that is unbiased for any sample size n and has good behaviour when used with for large n (many data). The maximum likelihood method cannot be applied since it is difficult to optimize the likelihood function by considering either its expression or the behaviour of the density function. My notes:

Exercise 2pe Let X be a random variable following the Rayleigh distribution, whose with probability function is x

2

x − f (x ; θ) = 2 e 2 θ , x ≥ 0, (θ> 0) θ 4−π 2 such that E ( X )=θ π and Var ( X )= θ . Let X = (X1,...,Xn) be a simple random sample. 2 2 2



66

Solved Exercises and Problems of Statistical Inference

(a) Apply the method of the moments to find an estimator θ^ M of the parameter θ. (b) For θ^ M , calculate the bias and the mean square error, and study the consistency. (c) Apply the maximum likelihood method to find and estimator θ^ MV of the parameter θ. CULTURAL NOTE (From: Wikipedia.) In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for positive-valued random variables. A Rayleigh distribution is often observed when the overall magnitude of a vector is related to its directional components. One example where the Rayleigh distribution naturally arises is when wind velocity is analyzed into its orthogonal 2-dimensional vector components. Assuming that the magnitudes of each component are uncorrelated, normally distributed with equal variance, and zero mean, then the overall wind speed (vector magnitude) will be characterized by a Rayleigh distribution. A second example of the distribution arises in the case of random complex numbers whose real and imaginary components are i.i.d. (independently and identically distributed) Gaussian with equal variance and zero mean. In that case, the absolute value of the complex number is Rayleigh-distributed. The distribution is named after Lord Rayleigh.

Discussion: This is a theoretical exercise where we must apply two methods of point estimation. The basic properties must be considered for the estimator obtained through the first method. Note: If E(X) had not been given in the statement, it could have been calculated by applying integration by parts (since polynomials and

exponentials are functions “of different type”): +∞



E ( X )=∫−∞ x f ( x ; θ) dx=∫0 x 2 2θ



0

where

2

0

e−t

2

u=x →

2

2

x

2

2

]

∞ 0

has been used with

u ' =1 2

x − 2 2θ

x

x v '= 2 e θ



x

√ 2 θ2 dt=√ 2 θ2 √2π =θ √ π2

∫ u (x )⋅v ' (x )dx=u (x )⋅v (x )−∫ u ' ( x)⋅v (x )dx •

[

2

2

( ) =0+∫ e √ dx=∫ −



x

− − x − x 2 e 2θ dx = −x e 2 θ −∫ 1⋅(−e 2 θ ) dx θ



Then, we have applied the change

2

x

2

− x − v=∫ 2 e 2 θ dx=−e 2 θ θ 2

x =t √2 θ2

2

x=t √ 2 θ2



dx=dt √ 2θ 2



We calculate the variance by using the first two moments. For the second moment, we can apply integration by parts twice (as the exponent decreases one unit each time) 2



E ( X )=∫0 2

where

x

[

x

2

x

2

2

2

∫ u (x )⋅v ' (x )dx=u (x )⋅v (x )−∫ u ' ( x)⋅v (x )dx u=x



2



]

x

2 2

has been used with

u ' =2 x

→ x

2

x − v ' = 2 e 2θ θ



2

− − ∞ x − x − x 2 e 2 θ dx= −x 2 e 2 θ −∫ 2 x⋅(−e 2 θ ) dx 0 =0+2 θ 2∫0 2 e 2 θ dx=2θ 2 θ θ 2

x

2



2

x

2

− x − v=∫ 2 e 2 θ dx=−e 2 θ θ 2

2

2 2 2 2 2 4−π Var ( X )=E ( X )− E ( X ) =2 θ −θ π =θ . (In substituting, that ex changes faster than xk for any 2 2

The variance is

k has been taken into account. On the other hand, in an advanced table of integrals like those physicists or engineers use, one can find

+∞

∫0

2

e−a x dx (see the appendixes of Mathematics) or

67

+∞

∫0

2

x 2 e−a x dx directly.)

Solved Exercises and Problems of Statistical Inference

(a) Method of the moments Since there appears only one parameter in the density function, one equation suffices; moreover, since the expression of μ = E(X) involves θ, the equation and the solution are: θ π =¯x 2





μ1 (θ )=¯x





1 2 → θ= π ¯x = π ¯x 2



2¯ → θ^ M = π X

(b) Bias, mean square error and consistency E ( θ^ M ) =E

Mean or expectation:

( √ π2 X¯ )=√ 2π E ( ¯X )=√ 2π E ( X )= √ π2 θ √ π2 =θ

Bias: b ( θ^ M ) =E ( θ^ M ) −θ=θ−θ=0

(4−π) ( √ π2 X¯ )= 2π Var ( X¯ )= 2π Varn(X ) = 2π (4−π) θ= θ 2n πn

Variance: Var ( θ^ M )=Var Mean square error:

θ^ M is an unbiased estimator of θ.



2

2

2 (4−π) 2 (4−π) 2 ECM ( θ^ M ) =b ( θ^ M ) +Var ( θ^ M ) =0+ θ= θ πn πn

(4−π) 2 lim n →∞ MSE ( θ^ M ) =lim n →∞ θ =0 and therefore σ̂ M is consistent (for θ). πn

Consistency:

(c) Maximum likelihood method Likelihood function: 2



n

L( X ; θ)=∏ j=1 f ( x j ; θ)= f ( x 1 ; θ)⋯ f ( x n ; θ)=

x1 e

2

x1 2 θ2





2

θ

xn e

xn 2 θ2

=

2

θ

(∏

n j=1



)

xj e θ

1 2 xj 2∑ 2θ

2n

Log-likelihood function: To facilitate the differentiation, θ2n is moved to the numerator and a property of the logarithm is applied. n

log ( L( X ; θ) ) =log

n

(∏

j =1

)

xj −

1 x 2 + log ( θ−2 n )=log 2∑ j 2θ

(∏

n j=1

)

xj −

∑ j=1 x 2j 2

1 −2 n log (θ) θ2

Search for the maximum: n

0=

d log ( L( X ; θ) )=0− dθ

∑ j=1 x 2j −1 2

θ

4

n

1 2 θ−2 n θ =

∑ j=1 x 2j θ

3

n

2n 2n − θ → θ =

∑j=1 x 2j θ

3

n

→ θ2=

∑j =1 x 2j 2n

Now we prove the condition on the second derivative. d2 d log ( L( X ; θ) ) = 2 dθ dθ

(

n

∑ j=1 x 2j θ3

)

n

n ∑ x j 2n 2n −1 −1 − =∑ j =1 x 2j 6 3 θ2 −2 n 2 =−3 j =14 + 2 θ θ θ θ θ 2

The first term is negative and the second is positive, but it is difficult to check qualitatively whether the second is larger in absolute value than the first. Then, the extreme obtained is substituted:

(

n

n

)

(∑ j=1 x 2j)2 2 n2 (2 n)2 ∑ j=1 x 2j d2 4 n2 4 n2 4 n2 2 log L( X ; σ = ) =−3 + n =−3 n + n =−2 n r α/2 )=α / 2.

Substitution: We need to calculate the quantities involved in the previous formula, •

nA = 100 and nB = 100.



Theoretical (simple random) sample: A1,...,A100 s.r.s. (each value is 1 or 0). Empirical sample: a1,...,a100 →

100

∑j =1 a j =75



η^ A =

100 1 75 a j= =0.75 ∑ j=1 100 100

Theoretical (simple random) sample: B1,...,B100 s.r.s. (each value is 1 or 0) Empirical sample: b1,...,b100 → •

100

∑j =1 b j =65



100 1 65 b j= =0.65 . ∑ j=1 100 100 r α/ 2=1.96 .

η^ B=

95% → 1–α = 0.95 → α = 0.05 → α/2 = 0.025 →

Then, I 0.95=(0.75−0.65)∓1.96



0.75(1−0.75) 0.65(1−0.65) + =[−0.0263 , 0.226 ] 100 100

The case ηA = ηB is included in the interval.

Conclusion: The lack-of-effect case (ηA = ηB) cannot be excluded when the decision has 95% confidence. Since η ∈( 0,1), any “reasonable” estimator of η should provide values in this range or close to it. Because of the natural uncertainty of the sampling process (randomness and variability), in this case the smaller endpoint of the interval was –0.0263, which can be interpreted as being 0. When an interval of high confidence is far from 0, the case ηA = ηB can clearly be discarded or rejected. Finally, it is important to notice that a confidence interval can be used to make decisions about hypotheses on the parameter values—it is equivalent to a two-sided hypothesis test, as the interval is also two-sided. (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.)

Advanced theory: When the assumption ηA = η = ηB seems reasonable (notice that this case is included in the 95% confidence interval just calculated), it makes sense to try to estimate the common variance of the n η^ + n η^ estimator as well as possible. This can be done by using the pooled sample proportion η^ p= A A B B in n A+ n B estimating η(1– η) for the denominator; nonetheless, the pooled estimator should not be considered in the numerator, as ( η^ p− η^ p)=0 whatever the data are. The statistic would be: ~ T ( A , B)=

( η^ A− η^ B )−(ηA−ηB )



η^ p (1− η ^ p) η ^ (1− η ^ p) + p nA nB

d

→ N (0,1)

Now, the expression of the interval would be

[

I~ ^ A −η^ B )−r α/ 2 1−α = ( η





^ p (1− η ^ p) η ^ (1−η^ p ) ^ p (1−η^ p ) η^ p (1− η ^ p) η η + p , ( η^ A −η^ B )+ r α/ 2 + nA nB nA nB

The quantities involved in the previous formula are •

nA = 100 and nB = 100

80

Solved Exercises and Problems of Statistical Inference

]





η^ A =0.75 and η^ B=0.65 , the pooled estimate is n A η^A + n B η^B n( η^A + η^B ) 0.75+ 0.65 η^ p = = = = 0.70 nA+ nB 2n 2 95% → 1–α = 0.95 → α = 0.05 → α/2 = 0.025 → r α/ 2=1.96 Since

Then,



0.70(1−0.70) I~ =[−0.0270, 0.227] 0.95 =(0.75−0.65)∓1.96 2 100 One way to measure how different the results are consists in directly comparing the length—twice the margin of error—in both cases: ~ L=0.227−(−0.0270)=0.254 L=0.226−(−0.0263)=0.2523 Even if the latter length is larger, it is theoretically more trustable than the former when ηA = η = ηB is true. The general expressions of these lengths can be found too: L=2 r α/ 2





η ^ A (1−η^ A ) η^ B (1− η ^ B) + nA nB

η ^ p (1− η ^ p ) η^ p (1− η ^ p) ~ L=2 r α/2 + nA nB

Another way to measure how different the results are can be based on comparing the statistics:

~ T ( A , B)=

( η^ A− η^ B )−(ηA−ηB )

√ √ √

= T ( A , B)

η^ p (1− η ^ p) η ^ (1− η ^ p) + p nA nB

=

η ^ A (1− η ^ A) η^ B (1− η^ B ) + nA nB

( η^ A −η^ B )−(ηA−ηB )



η ^ A (1− η ^ A) η ^ (1− η ^ B) + B nA nB



L ~= L

√ √

√ √

η ^ A (1− η ^ A) η ^ (1− η ^ B) + B nA nB η^ p (1− η ^ p) η ^ (1− η ^ p) + p nA nB

η^ A (1−η^ A) η ^ (1− η ^ B) + B nA nB

=

~ T T

~~ ( so L⋅T = L⋅T )

η^ p (1− η^ p) η^ p (1− η ^ p) η ^ p (1− η ^ p) η ^ (1−η^ p ) + + p nA nB nA nB Thus, the quantity η^ A (1−η^ A ) η ^ (1− η ^ B) + B √ η^ (1− η^ A)+ η^ B (1− η^ B ) =0.994 n n = A ^ p (1− η ^ p) η ^ (1−η^ p ) η √ 2 η^ p (1− η^ p ) + p n n can be seen as a measure of the effect of using the pooled sample proportion. This effect is little in this exercise, but it could be higher in other situations. As regards the case ηA = η = ηB, it is also included in this interval, which is not worthy as it has been used as an assumption; nevertheless, the exclusion of this case would have contradicted the initial assumption.

√ √

My notes:

[CI] Minimum Sample Size Remark 4ci: In calculating the minimum sample size to guarantee a given precision by applying the method based on the margin of error, the result is obtained using other results: theorem giving the sampling distribution of the pivot T and the method of the pivot. 81

Solved Exercises and Problems of Statistical Inference

When the proper statistic T is based on the supposition that the population variable X follows a given parametric probability distribution, the whole process can be seen at a parametric approach; when T is based on an asymptotic result, the nonparametric Central Limit Theorem is indirectly being applied. On the other hand, the method based on the Chebyshev's inequality is valid whichever the probability distribution of the population variable X and nonnegative function h(x). The Central Limit Theorem, being a nonparametric result, seems more powerful than the Chebyshev's inequality, based on a rough binding (see the appendixes). As a consequence, we expect the method based on the this inequality to overestimate the minimum sample size. On the contrary, the number provided by the method based on the margin of error may be less trustable if the assumptions on which it is based are false. Remark 5ci: Once there is a discrete quantity in an equation, the unknown cannot take any possible value. This implies that, strictly speaking, equalities like



2 E=r α / 2 σ n

2

σ =α 2 nE

may be never fulfilled for continuous E, α, σ and discrete n. Solving the equality and rounding the result upward is a way alternative to solving the inequalities



2 E g≥ E=r α/2 σ n

σ2 ≤ α 2 n Eg

where the purpose is to find the minimum n for which the (possible discrete values of the) margin of error is smaller than or equal to the given precision Eg.

Exercise 1ci-s The lengths (in millimeters, mm) of metal rods produced by an industrial process are normally distributed with a standard deviation of 1.8mm. Based on a simple random sample of nine observations from this population, the 99% confidence interval was found for the population mean length to extend from 194.65mm to 197.75mm. Suppose that a production manager believes that the interval is too wide for practical use and, instead, requires a 99% confidence interval extending no further than 0.50mm on each side of the sample mean. How large a sample is needed to achieve such an interval? Apply both the method based on the confidence interval and the method based on the Chebyshev's inequality. (From: Statistics for Business and Economics, Newbold, P., W.L. Carlson and B.M. Thorne, Pearson.)

Discussion: There is one normal population with known standard deviation. By using a sample of nine elements, a 99% confidence interval was built, I1 = [194.65mm, 197.75mm], of length 197.75mm – 194.65mm = 3.1mm and margin of error 3.1mm/2 = 1.55mm. A narrower interval is desired, and the number of data necessary in the new sample must be calculated. More data will be necessary for the new margin of error to be smaller (0.50 < 1.55) while the other quantities—standard deviation and confidence—are the same.

Identification of the variable: X ~ N(μ, σ2=1.82mm2)

X ≡ Length (of one metal rod)

Sample information: Theoretical (simple random) sample: X1,..., Xn s.r.s. (the lengths of n rods are taken)

Margin of error: We need the expression of the margin of error. If we do not remember it, we can apply the method of the pivot to take the expression from the formula of the interval.

[



√ ]

2 2 ̄ −r α / 2 σ , X ̄ + rα/ 2 σ I 1−α = X n n

If we remembered the expression, we can use it. Either way, the margin of error (for one normal population 82

Solved Exercises and Problems of Statistical Inference

with known variance) is:



2 E=r α / 2 σ n

Sample size

Method based on the confidence interval: We want the margin of error E to be smaller or equal than the given Eg, 2 2 2 2 2 2 2 2 1.8 mm =86.27 → n≥87 E g≥ E=r α/2 σ → E g≥r α / 2 σ → n≥z α / 2 σ =2.58 n n Eg 0.5 mm



( )

(

)

since r α/ 2=r 0.01 /2=r 0.005 =2.58 . (The inequality does not change neither when multiplying or dividing by positive quantities nor squaring, while it changes when inverting.) Method based on the Chebyshev's inequality: For unbiased estimators, it holds that: ^ ^ |≥E )=P (|θ−E ^ ^ |≥ E)≤ Var ( θ) ≤ α P (|θ−θ ( θ) 2 E 2

^ ¯ )= σ (X so Var ( θ)=Var n

2

σ ≤α 2 n Eg

2 1 1 1.82 mm2 → n≥ α σ = =1296 → n≥1296 Eg 0.01 0.52 mm2

( )

Conclusion: At least n data are necessary to guarantee that the margin of error is equal to 0.50 (this margin

̂ ∣ will can be thought of as “the maximum error in probability”, in the sense that the distance or error ∣θ−θ be smaller that Eg with a probability of 1–α = 0.99, but larger with a probability of α = 0.01). Any number of data larger than n would guarantee—and go beyond—the precision desired. As expected, more data are necessary (86 > 9) to increase the accuracy (narrower interval) with the same confidence. The minimum sample sizes provided by the two methods are quite different (see remark 4ci). (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.) My notes:

[CI] Methods and Sample Size Exercise 1ci The mark of an aptitude exam follows a normal distribution with standard deviation equal to 28.2. A simple random sample with nine students yields the following results: 9

∑j =1 x j=1,098

9

∑j =1 x 2j=138,148

a) Find a 90% confidence interval for the population mean μ. b) Discuss without calculations whether the length of a 95% confidence interval will be smaller, greater or equal to the length of the interval of the previous section. c) How large must the minimum sample size be to obtain a 90% confidence interval with length (distance between the endpoints) equal to 10? Apply the method based on the confidence interval and also the method based on the Chebyshev's inequality.

83

Solved Exercises and Problems of Statistical Inference

Discussion: The supposition that the normal distribution is an appropriate model for the variable mark should be evaluated. The method of the pivot will be applied. After obtaining the theoretical expression of the interval, it is possible to reason on the relation confidence-length. Given the length of the interval, the expression also allows us to calculate the minimum number of data necessary. The mark can be seen as a quantity without any dimension. Finally, it is worth noticing that an approximation is used, since the mark is a discrete variable while the normal distribution is continuous.

Identification of the variable: X ~ N(μ, σ2=28.22)

X ≡ Mark (of one student)

Sample information: Theoretical (simple random) sample: X1,..., X9 s.r.s. (the marks of nine students are to be taken) → n = 9 Empirical sample:

x1,...,x9



9

9

∑j =1 x j=1,098 ∑j =1 x 2j=138,148

(the marks have been taken)

We can see that the sample values xj themselves are unknown in this exercise; instead, information calculated from them is provided; this information must be sufficient for carrying out the calculations.

a) Method of the pivotal quantity: To choose the proper statistic with which the confidence interval is calculated, we take into account that: • • •

The variable follows a normal distribution We are given the value of the population standard deviation σ The sample size is small, n = 9, so asymptotic formulas cannot be applied

From a table of statistics (e.g. in [T]), the pivot T ( X ; μ)= is selected. Then

(

1−α=P (l α/ 2≤ T ( X ;μ ) ≤r α/ 2)= P −r α/ 2≤



̄ −μ X



σ2 n

̄ −μ X



∼ N (0,1)





2 2 ̄ −μ ≤+r α/2 σ ) ≤+ r α/ 2 =P (−r α / 2 σ ≤ X n n σ2 n



)





2 2 2 2 ̄ −r α /2 σ ≤−μ ≤− X ̄ + r α/ 2 σ )=P ( X ̄ + r +α / 2 σ ≥ μ ≥ X ̄ −r α / 2 σ ) =P (− X n n n n

so

[



√ ]

2 2 ̄ −r α / 2 σ , X ̄ + rα/ 2 σ I 1−α = X n n

where r α / 2 is the value of the standard normal distribution verifying P( Z>r α /2 )=α / 2 , that is, the value such that an area equal to α /2 is on the right (upper tail). Substitution: We calculate the quantities in the formula, • •

¯x =

1 9 1 x j= 1,098=122 ∑ j=1 9 9

A 90% confidence level implies that α = 0.1, and the quantile

84

r α / 2=r 0.05=1.645 is in the table.

Solved Exercises and Problems of Statistical Inference



From the statement,



Finally, n = 9

σ=28.2

Thus, the interval is

[

I 0.9= 122−1.645

]

28.2 28.2 , 122+1.645 = [ 106.54 , 137.46 ] √9 √9

b) Length of the interval: To answer this question it is possible to argue that, when all the parameters but the length are fixed, if higher certainty is desired it is necessary to widen the interval, that is, to increase the distance between the two endpoints. The formal way to justify this idea consists in using the formula of the interval:

(

√ )(

√ )



2 2 2 ̄ +r α / 2 σ − ̄X −r α/ 2 σ =2⋅r α / 2 σ L= X n n n

Now, if σ and n remain unchanged, to study how L changes with α it is enough to see how the quantile “moves”. For the 95% interval: •

α = 0.05 → α decreases with respect to the value in section (a)



Now

r α/ 2 must leave less area (probability) on the right → r α/ 2 increases → L increases

In short, when the tails (α) get smaller the interval (1–α) gets wider, and vice versa.

c) Sample size: Method based on the confidence interval: Now the 90% confidence interval of the first section is revisited. For given α and Lg, the value of n must be found. From the expression of the length,



2 2 2 28.2 2 2 2 2 =86.08 → n≥87 L g ≥L=2 r α /2 σ → L g ≥2 r α / 2 σ → n≥ 2 z α/ 2 σ = 2⋅1.645 n n Lg 10

) (

(

)

(Only when inverting the inequality must be changed.) Method based on the Chebyshev's inequality: For unbiased estimators: ^ |≥E )=P (|θ−E ^ ^ |≥ E)≤ P (|θ−θ ( θ)

^ Var ( θ) E

2

≤α

2

^ ¯ )= σ (X so Var ( θ)=Var n

σ2 ≤ α 2 n Eg

2 28.2 2 → n ≥ σ 2= =318.10 → n≥319 α Eg 10 2 0.1⋅ 2

( )

Conclusion: Given the other quantities, confidence grows with the length, and vice versa. If a value greater than n were considered, a higher accuracy interval would be obtained; nevertheless, in practice usually this would also imply higher expense of both time and money. The minimum sample sizes provided by the two methods are quite different (see remark 4ci). (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.) My notes:

85

Solved Exercises and Problems of Statistical Inference

Exercise 2ci A 64-element simple random sample of petrol consumption (litres per 100 kilometers, u) in private cars has been taken, yielding a mean consumption of 9.36u and a standard deviation of 1.4u. Then: a) Obtain a 96% confidence interval for the mean consumption. b) Assume both normality (for the consumption) and variance σ2 = 2u2. How large must the sample be if, with the same confidence, we want the maximum error to be a quarter of litre? Apply the method based on the confidence interval and the method based on the Chebyshev's inequality. (From 2007's exams for accessing to the Spanish university.)

Discussion: For 64 data, asymptotic results can be applied. The method of the pivotal quantity will be applied. The role of the number 100 is no other than being part of the units in which the data are measured. For the second section, additional suppositions—added by myself—are considered; in a real-world situation they should be evaluated.

Identification of the variable: C ≡ Consumption (of one private car, measured in litres per 100 kilometers)

C~?

Sample information: Theoretical (simple random) sample: C = (C1,...,C64) s.r.s. → n = 64 ¯c =9.36u ,

Empirical sample: c = (c1,...,c64) →

s=1.4 u

The values cj of the sample are unknown; instead, the evaluation of some statistics is given. These quantities ̄ and s 2 . must be sufficient for the calculations, so formulas must involve C

a) Confidence interval: To select the pivot, we take into account: • • •

Nothing is said about the probability distribution of the variable of interest The sample size is big, n = 64 (>30), so an asympotic expression can be used The population variance is unknown, but it is estimated by the sample variance

From a table of statistics (e.g. in [T]), the following pivot is selected T (C ;μ)=

̄ −μ C



S2 n

→ N (0,1)

where S2 will be calculated by applying the relation n s 2=(n−1) S 2 . By applying the method of the pivot:

(

1−α=P (l α/ 2≤ T (C ; μ) ≤r α /2 )=P −r α/2 ≤

¯ α/ 2 =P (−C−r





̄ C−μ



S2 n

)

≤+ r α/ 2 =P (−r α/ 2







S2 ̄ S2 ≤ C−μ ≤+r α /2 ) n n



2 2 2 S2 ¯ +r α /2 S )=P ( C ¯ +r α / 2 S ≥ μ ≥C−r ¯ α/ 2 S ) ≤−μ ≤−C n n n n

Then, the confidence interval is

86

Solved Exercises and Problems of Statistical Inference

[

√ ]



2 2 ̄ −r α/ 2 S , C ̄ +r α/2 S I 1−α = C n n

where r α / 2 is the quantile such that P(Z> r α /2 )=α /2.

Substitution: We calculate the quantities in the formula, ¯c =9.36u .



Sample mean



For a confidence of 96%, α = 0.04 and



The sample quasivariance is



Finally, n = 64.

2

S=

r α / 2=r 0.04 /2 =r 0.02=l 0.98=2.054 .

n 2 64 2 2 2 s = 1.4 u =1.99 u . n−1 63

The interval is

[

I 0.96 = 9.36 u−2.054



√ ]

1.99 u2 1.99 u2 , 9.36 u+2.054 = [9.00 u , 9.72u] 64 64

b) Minimum sample size: Method based on the confidence interval: To select the pivot, we take into account the new suppositions: • • •

The variable of interest follows a normal distribution The population mean is being studied The population variance is known

From a table of statistics (e.g. in [T]), the following pivot is selected (now the exact sampling distribution is known, instead of the asympotic distribution) T (C ;μ)=

¯ −μ C



σ2 n

∼ N (0,1)

By doing calculations similar to those of the previous section or exercise, the interval is

[



√ ]

2 2 ¯ −r α/ 2 σ , C ¯ + rα / 2 σ I 1−α = C n n



2 from which the expression of the margin of error is obtained, namely: E=r α /2 σ . Values can be n substituted either before or after breaking an inequality; this time let us use numbers from the beginning:



2 1 2 u2 1 2 2 2u E g= u≥E =2.054 → 2 u ≥2.054 → n≥4 2⋅2.054 2⋅2=135.01 → n≥136 4 n n 4

(When inverting, the inequality must be changed.) Method based on the Chebyshev's inequality: For unbiased estimators: ^ ^ |≥E )=P (|θ−E ^ ^ |≥ E)≤ Var ( θ) ≤ α P (|θ−θ ( θ) 2 E 2

^ ¯ )= σ (X so Var ( θ)=Var n

87

Solved Exercises and Problems of Statistical Inference

σ2 ≤ α 2 n Eg



2

n ≥ σ 2= α Eg

2

2u =800 2 1 0.04⋅ u 4

( )

Conclusion: The unknown mean petrol consumption of the population of private cars belongs to the



1.99 u 2 =0.36 u , while 64 136 data are needed for the margin to be 1/4= 0.250. The minimum sample sizes provided by the two methods are quite different (see remark 4ci). (Remember: statistical results depend on the assumptions, the methods, the certainty and the data.) interval obtained with 96% confidence. For 64 data, the margin of error were 2.055

My notes:

Exercise 3ci You have been hired by a consortium of dairy farmers to conduct a survey about the consumption of milk. Based on results from a pilot study, assume that σ = 8.7oz. Suppose that the amount of milk is normally distributed. If you want to estimate the mean amount of milk consumed daily by adults: (a) How many adults must you survey if you want 95% confidence that your sample mean is in error by no more than 0.5oz? Apply both the method based on the confidence interval and the method based on the Chebyshev's inequality. (b) Calculate the margin of error if the number of data in the sample were twice the minimum (rounded) value that you obtained. Is now the margin of error half the value it was? (Based on an exercise of: Elementary Statistics. Triola M.F. Pearson.)

CULTURAL NOTE (From: Wikipedia.) A fluid ounce (abbreviated fl oz, fl. oz. or oz. fl., old forms ℥, fl ℥, f℥, ƒ ℥) is a unit of volume (also called capacity) typically used for measuring liquids. It is equivalent to approximately 30 millilitres. Whilst various definitions have been used throughout history, two remain in common use: the imperial and the United States customary fluid ounce. An imperial fluid ounce is 1⁄20 of a imperial pint, 1⁄160 of an imperial gallon or approximately 28.4 ml. A US fluid ounce is 1⁄16 of a US fluid pint, 1⁄128 of a US fluid gallon or approximately 29.6 ml. The fluid ounce is distinct from the ounce, a unit of mass; however, it is sometimes referred to simply as an "ounce" where context makes the meaning clear.

Discussion: There is one normal population with known standard deviation. In both sections, the answer can be found by using the expression of the margin of error.

Identification of the variable: X ≡ Amount of milk (consumed daily by an adult)

X ~ N(μ, σ2=8.72oz2)

Sample information: Theoretical (simple random) sample: X1,...,Xn s.r.s. (the amount is measured for n adults)

Formula for the margin of error: We need the expression of the margin of error. If we do not remember it, we can apply the method of the pivot to take the expression from the formula of the interval. 88

Solved Exercises and Problems of Statistical Inference

[

√ ]



2 2 ̄ −r α / 2 σ , X ̄ + rα/ 2 σ I 1−α = X n n

If we remembered the expression, we can directly use it. Either way, the margin of error (for one normal population with known variance) is: 2 E=r α / 2 σ n



(a) Sample size

Method based on the confidence interval: The equation involves four quantities, and we can calculate any of them once the others are known. Here:



2 2 2 2 2 2 2 2 8.7 oz =1163.08 → n≥1164 E g≥ E=r α /2 σ → E g≥r α/ 2 σ → n≥z α/ 2 σ =1.96 n n Eg 0.5 oz

( )

(

)

since r α/ 2=r 0.05 /2=r 0.025 =1.96 . (The inequality does not change neither when multiplying or dividing by positive quantities nor squaring, while it changes when inverting.) Method based on the Chebyshev's inequality: For unbiased estimators: ^ ^ |≥E )=P (|θ−E ^ ^ |≥ E)≤ Var ( θ) ≤ α P (|θ−θ ( θ) 2 E 2

^ ¯ )= σ (X so Var ( θ)=Var n

σ2 ≤ α 2 n Eg

2

2

2 1 1 8.7 oz → n≥ α σ = =6055.2 Eg 0.05 0.5 2 oz 2

( )

→ n≥6056

(b) Margin of error Way 1: Just by substituting.



2 8.7 2 oz 2 E=r α / 2 σ =1.96 =0.3534 oz n 2⋅1164



When the sample size is doubled, the margin of error is not reduced by half but by less than this amount. Way 2 (suggested to me by a student): By managing the algebraic expression.







2 ~ σ2 = 1 r σ 2 = E = 0.5 oz =0.3535 oz E =r α/ 2 σ =r α / 2 α /2 ~ n 2 n √2 n √2 √2

Now it is easy to see that if the sample size is multiplied by 2, the margin of error is divided by √2. Besides, more generally: Proposition For the confidence interval estimation of the mean of a normal population with known variance, based on the method of the pivot, when the sample size is multiplied by any scalar c the margin of error is divided by √c. (Notice that 0.5 is slightly smaller than the real margin of error after rounding n upward; that is why there is a small different between the results of both ways.)

Conclusion: At least 1164 or 6056 data are necessary to guarantee that the margin of error is equal to 0.50 (this margin can be thought of as “the maximum error in probability”, in the sense that the distance or error 89

Solved Exercises and Problems of Statistical Inference

̂ ∣ will be smaller that Eg with a probability of 1–α = 0.95, but larger with a probability of α = 0.05). ∣θ−θ When the sample size is multiplied by c, the margin of error is divided by √c. Using more data would also guarantee the precision desired. The minimum sample sizes provided by the two methods are quite different (see remark 4ci). (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.) My notes:

Exercise 4ci A company makes two products, A and B, that can be considered independent and whose demands follow the distributions N(μA, σA2=702u2) and N(μB, σB2=602u2), respectively. After analysing 500 shops, the two simple random samples yield a = 156 and b = 128. (a) Build 95 and 98 percent confidence intervals for the difference between the population means. (b) What are the margin of errors? If sales are measured in the unit u = number of boxes, what is the unit of measure of the margin of error? (c) A margin of error equal to 10 is desired, how many shops are necessary? Apply both the method based on the confidence interval and the method based on the Chebyshev's inequality. (d) If only product A is considered, as if product B had not been analysed, how many shops are necessary to guarantee a margin of error equal to 10? Again, apply the two methods. LINGUISTIC NOTE (From: Longman Dictionary of Common Errors. Turton, N.D., and J.B.Heaton. Longman.) company. an organization that makes or sells goods or that sells services: 'My father works for an insurance company.' 'IBM is one of the biggest companies in the electronics industry.' factory. a place where goods such as furniture, carpets, curtains, clothes, plates, toys, bicycles, sports equipment, drinks and packaged food are produced: 'The company's UK factory produces 500 golf trolleys a week.' industry. (1) all the people, factories, companies etc involved in a major area of production: 'the steel industry', 'the clothing industry' (2) all industries considered together as a single thing: 'Industry has developed rapidly over the years at the expense of agriculture.' mill. (1) a place where a particular type of material is made: 'a cotton mill', 'a textile mill', 'a steel mill', 'a paper mill' (2) a place where flour is made from grain: 'a flour mill' plant. a factory or building where vehicles, engines, weapons, heavy machinery, drugs or industrial chemicals are produced, where chemical processes are carried out, or where power is generated: 'Vauxhall-Opel's UK car plants', 'Honda's new engine plant at Microconcord. Swindon', 'a sewage plant', 'a wood treatment plant', 'ICI's ₤100m plant', 'the Sellafield nuclear reprocessing plant in Cumbria' works. an industrial building where materials such as cement, steel, and bricks are produced, or where industrial processes are carried out: 'The drop in car and van sales has led to redundancies in the country's steel works.'

Discussion: It should statistically be proved the supposition that the normal distribution is appropriate to model both variables. The independence of the two populations should be tested as well. The method of the pivot will be applied. After obtaining the theoretical expression of the interval, it is possible to argue about the relation confidence-length. Given the length of the interval, the expression allows us to calculate the minimum number of data necessary. The number of units demaned can be seen as dimensionless quantities. An approximation is implicitly being used in this exercise, since the number of units demanded is a discrete variable while the normal distribution is continuous.

(a) Confidence interval The variables are

90

Solved Exercises and Problems of Statistical Inference

A ≡ Number of units of product A sold (in one shop)

A ~ N(μA, σA2=702u2)

B ≡ Number of units of product B sold (in one shop)

B ~ N(μB, σB2=602u2)

(a1) Pivot: We know that • • •

There are two independent normal populations We are interested in μA – μB Variances are known

Then, from a table of statistics (e.g. in [T]), we select T ( A , B ; μ A ,μ B )=

̄ )−(μ A−μ B ) ( ̄A− B



2 A

2 B

∼ N (0,1)

σ σ + nA nB

(a2) Event rewriting

(

1−α=P (l α/ 2≤ T (A , B ;μ A μ B ) ≤r α /2 )=P −r α/2 ≤

( √ ( √ ( √

( ̄A− ̄B )−(μ A−μ B )



2 A

2 B

σ σ + n A nB

≤+ r α / 2

)

√ )

σ 2A σ 2B σ 2A σ2B ̄ )−(μ A−μ B ) ≤+r α / 2 + ≤ ( ̄A− B + nA nB n A nB

=P −r α /2

̄ )−r α/ 2 =P −( ̄A− B =P ( ̄A − ̄B )+ r α/ 2

2

√ )

2

2

2

σA σB σA σ B ̄ )+ r α / 2 + ≤−(μ A−μ B ) ≤−( ̄A− B + nA nB nA nB

√ ) √ ]

σ 2A σ2B σ2A σ 2B + ≥ μ A −μ B ≥( ̄A− ̄B )−r α/ 2 + nA nB n A nB

(a3) The interval

[

I 1−α = ( ̄A− ̄B )−r α /2



2

2

2

2

σ A σB σ A σB ̄ )+r α/2 + , ( ̄A− B + nA nB nA nB

Substitution: The quantities in the formula are •

¯a =156 u



σ 2A =702 u 2



n A =500

and and and

¯b =128 u σ 2B =602 u 2 n B =500



At 95%, 1–α = 0.95 → α = 0.05 → α/2 = 0.025 →

r α/ 2=r 0.025=l 0.975=1.96



At 98%, 1–α = 0.98 → α = 0.02 → α/2 = 0.01

r α/ 2=r 0.01=l 0.99=2.326



Thus, at 95%

[

I 0.95= (156−128)−1.96 and at 98%

91





]

702 60 2 702 60 2 + , (156−128)+1.96 + =[19.92, 36.08] 500 500 500 500

Solved Exercises and Problems of Statistical Inference

[

I 0.98= (156−128)−2.326



]



70 2 60 2 702 602 + , (156−128)+ 2.326 + =[18.41, 37.59] 500 500 500 500

(b) Margin of error: Regarding the units, they can be treated as any other algebraic letter representing a numerical quantity. The quantile and the sample sizes are dimensionless, while the variances are expressed in the unit u2—because of the square in the definition σ2 = E([X–E(X)]2)—when data X are measured in the unit u. At 95% E 0.95=r α / 2

and at 98% E 0.98=r α/ 2





2 2 2 2 2 2 σ2A σ 2B 70 u 60 u 70 60 2 + =1.96 + =1.96 + u =8.08u √ n A nB 500 500 500 500

√ √

√ √

2 2 2 2 2 2 σ 2A σ 2B 70 u 60 u 70 60 + =2.326 + =2.326 + u2=9.59 u √ n A nB 500 500 500 500

(c) Minimum sample sizes Method based on the confidence interval: Since here both samples sizes are equal to the number of shops, E g≥ E=r α/2



σ 2A σ 2B + n n

2 g

E ≥r



2 α/ 2

σ 2A+ σ2B n

→ n≥r

2 α /2

σ2A +σ 2B 2 σ A 2 2 σ B =r α/ 2 + r α/ 2 Eg Eg E 2g

( )

2

( )

and hence at 95% and 98%, respectively, n≥1.962

2

2

2

2

70 u +60 u =326.54 → n≥327 10 2 u2

n≥2.326 2

and

2

2

2

2

70 u +60 u =459.87 → n≥460 102 u 2

Method based on the Chebyshev's inequality: For unbiased estimators: ^ ^ |≥E )=P (|θ−E ^ ^ |≥ E)≤ Var ( θ) ≤ α P (|θ−θ ( θ) 2 E 2 A

2 B

σ 2A σ2B + 2 2 σ 2A +σ 2B 1 σ A 2 1 σB n n σ A+ σ B = ≤α → n≥ =α +α → Eg Eg E 2g n E 2g α E2g

σ σ ^ ¯ )= + ( ¯A)+Var ( B If Var ( θ)=Var n n so 702 u2+602 u 2 n≥ =1700 0.05⋅102 u2

2

( ) ( )

2

and

n≥

2

2

2

70 u + 60 u =4250 2 2 0.02⋅10 u

(d) Minimum sample size nA Method based on the confidence interval: In this case, when the method of the pivotal quantity is applied (we do not repeat the calculations here), the interval and the margin of error are, respectively,

[ √

I 1−α = ¯ A−r α/2

2

√ ] 2

σA σA , ¯ A+ r α/ 2 nA nA

and

E=r α/ 2



σ2A nA

(Note that this case can be thought of as a particular case where the second population has values B = 0, μB=0 and σB2=0.) Then, 2 2 σ 2A 2 2 σA 2 σA E g≥ E=r α/2 → E g≥r α/ 2 → n A ≥r α / 2 2 nA nA Eg



92

Solved Exercises and Problems of Statistical Inference

and hence at 95% and 98%, respectively, n A ≥1.962

70 2 u2 =188.24 → n A ≥189 2 2 10 u

and

n A ≥2.326 2

702 u 2 =265.10 → n A ≥266 2 2 10 u

Method based on the Chebyshev's inequality: For unbiased estimators: ^ ^ |≥E )=P (|θ−E ^ ^ |≥ E)≤ Var ( θ) ≤ α P (|θ−θ ( θ) 2 E ^ ( ¯A)= If Var ( θ)=Var

σ 2A nA



σ 2A nA E 2g

=

σ2A n A E 2g

≤α



σ2A nA ≥ α E 2g

so nA ≥

702 =980 0.05⋅102

2

and

nA ≥

2

70 u =2450 2 2 0.02⋅10 u

Conclusion: As expected, when the probability of the tails α decreases the margin of error—and hence the length—increases. For either one or two products and given the margin of error, the more confidence (less significance) we want the more data we need. Since 500 shops were really considered to attain this margin of error, there has been a waste of time and money—fewer shops would have sufficed for the desired accuracy (95% or 98%). When two independent quantities are added or subtracted, the error or uncertainty of the result can be as large as the total of the two individual errors or uncertainties; this also holds for random quantities (if they are dependent, a correction term—covariance—appears); for this reason, to guarantee the same margin of error, more data are necessary in each of the two samples—notice that for two populations the minimum value is larger than or equal to the sum of the minimum values that would be necessary for each population individually (for the same precision and confidence). The minimum sample sizes provided by the two methods are quite different (see remark 4ci). (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.) My notes:

93

Solved Exercises and Problems of Statistical Inference

Hypothesis Tests Remark 1ht: Like confidence, the concept of significance can be interpreted as a probability (so they are, although we sometimes use a 0-to-100 scale). See remark 1pt, in the appendix of Probability Theory, on the interpretation of the concept of probability. Remark 2ht: The quantities α, p-value, β, 1–β and φ are probabilities, so their values must be between 0 and 1. Remark 3ht: For two-tailed tests, since there is an infinite number of pairs of quantiles such that P (a 1≤T 0≤a2 )=1−α , those that determine tails of probability α/2 are considered by convention. This criterion is also applied for confidence intervals. Remark 4ht: To apply the second methodology, binding the p-value is sometimes enough to compare it with α. To do that, the proper closest value included in the table is used. Remark 5ht: In calculating the p-value for two-tailed tests, by convention the probability of the tail determined by T0(x,y) is doubled. When T0(X,Y) follows an asymmetric distribution, it is difficult to identify the tail if the value of T0(x,y) is close to the median. In fact, knowing the median is not necessary, since if we select the wrong tail, twice its probability will be greater than 1 and we will realize that the other tail must have been considered. Alternatively, it is always possible to calculate the two probabilities (on the left and on the right) and double the minimum of them (this is useful in writing code for software programs). Remark 6ht: When more than one test can be applied to make a decision about the same hypotheses, the most powerful should be considered (if it exists). Remark 7ht: After making a decision, it is possible to evaluate the strengh with which it was made: for the first methodology, by comparing the distance from the statistic to the critical values—or, better, the area between this set of values and the density function of T0—and, for the second methodology, by looking at the magnitude of the p-value. Remark 8ht: For small sample sizes, n=2 or n=3, the critical region—obtained by applying any methodology—can be plotted in the two- or threedimensional space.

[HT] Parametric Remark 9ht: There are four types of pair of hypotheses: (1) simple versus simple (2) simple versus one-sided composite (3) one-sided composite versus one-sided composite (4) simple versus two-sided composite We will directly apply Neyman-Pearson's lemma for the first case. When the solution of the first case does not depend upon any particular value of the parameter θ1 under H1, the same test will be uniformly most powerful for the second case. In addition, when there is a uniformly most powerful test for the second case, it will also be uniformly most powerful for the third case. Remark 10ht: Given H0 and α, different decisions can be made for one- and two-tailed tests. That is why: (i) describing the details of the framework is of great important in Statistics; and (ii) as a general rule, all trustworthy information must be used, which implies that a one-sided test should be used when there is information that strongly suggests so—compare the estimate calculated from the sample with the hypothesized values.

α (θ)= P (Reject H 0 ∣ θ∈Θ0) and 1−β(θ) = P ( Reject H 0 ∣ θ∈Θ1) , so to plot the power function ϕ(θ) = P ( Reject H 0 ∣ θ∈Θ0 ∪Θ1) it is usually enough to enter θ∈Θ0 in the analytical expression of 1−β(θ). This is the method that we have used in some exercises where the computer has been used. Remark 11ht: For parametric tests,

Remark 12ht: A reasonable testing process should verify that

1−β(θ1 )=P (T 0 ∈Rc ∣ θ∈Θ1) > P (T 0 ∈ Rc ∣ θ∈Θ0 ) = α(θ 0 ) with 1–β(θ1) ≈ α(θ0) when θ1 ≈ θ0. This can be noticed in the power functions plotted in some exercises, where there is a local minimum at θ0. Remark 13ht: Since one-sided tests are, in its range of parameter values, more powerful than the corresponding two-sided test, the best way of testing an equality consists in accepting it when it is compared with the two types of inequality. Similarly, the best way 94

Solved Exercises and Problems of Statistical Inference

to test an inequality consists in accepting it when it is allocated either in the null hypothesis or in the alternative hypothesis. (This ideas, among others, are rigurously explained in the materials of professor Alfonso Novales Cinca.)

[HT-p] Based on T Exercise 1ht-T The lifetime of a machine (measured in years, y) follows a normal distribution with variance equal to 4y 2. A simple random sample of size 100 yields a sample mean equal to 1.3y. Test the null hypothesis that the population mean is equal to 1.5y, by applying a two-tailed test with 5 percent significance level. What is the type I error? Calculate the type II error when the population mean is 2y. Find the general expression of the type II error and then use a computer to plot the power function.

Discussion: First of all, the supposition that the normal distribution reasonably explains the lifetime of the machine should be evaluated by using proper statistical techniques. Nevertheless, the purpose of this exercise is basically to apply the decision-making methodologies.

Statistic: Since • •

There is one normal population The population variance is known

the statistic T ( X ; μ)=

̄ −μ X



σ2 n

∼ N (0,1)

is selected from a table of statistics (e.g. in [T]). Two particular cases of T will be used: ̄ −μ1 X̄ −μ 0 X and T 0 ( X )= ∼ N (0,1) T ( X )= ∼ N (0,1) 1 σ2 σ2 n n





To apply any of the two methodologies, the value of T0 at the specific sample x = (x1,...,x100) is necessary: T 0 ( x)=

̄x −μ 0



2

σ n

=

1.3−1.5 −0.2⋅10 = =−1 2 4 100



Hypotheses: The two-tailed test is determined by H 0 : μ = μ 0 = 1.5

and

H 1 : μ = μ 1 ≠ 1.5

For these hypotheses

95

Solved Exercises and Problems of Statistical Inference

Decision: To make the final decision about the hypotheses, two main methodologies are available. To apply the first one, the critical values a1 and a2 that determine the rejection region are found by applying the definition of type I error, with α = 0.05 at μ0 = 1.5, and the criterion of leaving half the probability in each tail: α (1.5) = P (Type I error )= P (Reject H 0 ∣ H 0 true)= P (T ( X ; μ)∈Rc ∣ H 0 ) = P ( {T 0 ( X ) a 2 })

{

α (1.5) = P(T 0 ( X )< a1) → a1=l α / 2=−1.96 2 α (1.5) =P (T 0 (X )>a 2 ) → a2=r α/ 2=+1.96 2

→ Rc ={T 0 ( X )+1.96 }={∣T 0 ( X )∣>+1.96 } The decision is: T 0 ( x)=−1 → T 0 ( x)∉ Rc → H0 is not rejected. The second methodology is based on the calculation of the p-value: pV =P ( X more rejecting than x ∣ H 0 true)=P (∣T 0 ( X )∣>∣T 0 ( x )∣) =P (∣T 0 ( X )∣>∣−1∣)=2⋅P (T 0 (X ) 0.05=α → H0 is not rejected.

Type II error: To calculate β, we have to work under H1, that is, with T1. Nonetheless, the critical region is expressed in terms of T0. Thus, the mathematical trick of adding and subtracting the same quantity is applied: β(μ 1) = P(Type II error) = P ( Accept H 0 ∣ H 1 true) = P (T 0 ( X )∉ Rc ∣ H 1 )= P (∣T 0 ( X )∣≤1.96 ∣ H 1) ̄ −μ 0 X

(

= P (−1.96≤T 0 ( X )≤+1.96 ∣ H 1 ) = P −1.96≤

(

= P −1.96≤

̄ −μ 1 +μ 1−μ 0 X





∣) (

σ2 n

∣)

≤+1.96 H 1

≤+1.96 H 1 = P −1.96−

μ 1−μ 0



σ2 σ2 n n μ −μ μ −μ = P T 1 ( X )≤+1.96− 1 0 − P T 1 (X ) pnorm(-0.54,0,1)-pnorm(-4.46,0,1) [1] 0.2945944

β(2) = P ( T 1 ( X )≤−0.54 )− P ( T 1 ( X ) 2 or H 1 : σ=σ 1 ≠ 2 , which one would you have selected? Why? Hint: Be careful to use S2 and σ2 wherever you work with a variance instead of a standard deviation. (Based on an exercise of Statistics for Business and Economics. Newbold, P., W.L. Carlson and B.M. Thorne. Pearson.)

LINGUISTIC NOTE (From: Longman Dictionary of Common Errors. Turton, N.D., and J.B. Heaton. Longman.) actual = real (as opposed what is believed, planned or expected): 'People think he is over fifty but his actual age is forty-eight.' 'Although buses are supposed to run every fifteen minutes, the actual waiting time can be up to an hour.' present/current = happening or existing now: 'No one can drive that car in its present condition.' 'Her current boyfriend works for Shell.' LINGUISTIC NOTE (From: Common Errors in English Usage. Brians, P. William, James & Co.) “Device” is a noun. A can-opener is a device. “Devise” is a verb. You can devise a plan for opening a can with a sharp rock instead. Only in law is “devise” properly used as a noun, meaning something deeded in a will.

97

Solved Exercises and Problems of Statistical Inference

Discussion: Because of the mathematical theorems available, we are able to study the variance only for normally distributed random variables. Thus, we need the supposition that the temperature follows a normal distribution. In practice, this normality should be evaluated.

Statistic: We know that • •

There is one normal population The population mean is unknown

and hence the following (dimensionless) statistic, involving the sample quasivariance, is chosen T ( X ; σ)=

(n−1) S 2 ∼ χ 2n−1 2 σ

We will work with the two following particular cases: ( n−1) S 2 T 0 ( X )= ∼ χ 2n−1 2 σ0

and

(n−1) S 2 T 1 ( X )= ∼ χ2n−1 2 σ1

To make the decision, we need to evaluate the statistic T0 at the specific data available x: T 0 ( x)=

(20−1) 2.39 2 F 2 =27.13 22 F 2

Hypothesis test Hypotheses:

H 0 : σ 2 = σ 20 ≤ 22 and

H 1 : σ 2 = σ 21 > 22

Then,

Decision: To determine the rejection region, under H0, the critical value a is found by applying the definition of type I error, with α = 0.05 at σ02 = 4ºF2 :

α (4) = P (Type I error ) = P ( Reject H 0 ∣ H 0 true)= P (T ( X ;θ)∈ Rc ∣ H 0 ) = P (T 0 (X )>a) → a=r α=r 0.05=30.14 → Rc = {T 0 ( X )>30.14 } To make the final decision: T 0 ( x)=27.13 < 30.14 → T 0 ( x)∉ Rc → H0 is not rejected. The second methodology requires the calculation of the p-value: pV =P ( X more rejecting than x | H 0 true)=P (T 0 ( X )> T 0 ( x))=P (T 0 ( X )>27.13)=0.102

→ pV =0.102> 0.05=α → H0 is not rejected.

> 1 - pchisq(27.13, 20-1) [1] 0.1016613

Type II error: To calculate β, we have to work under H1, that is, with T1. Since the critical region is already expressed in terms of T0, the mathematical trick of multiplying and dividing by same quantity is applied: β(σ12) = P (Type II error ) = P ( Accept H 0 | H 1 true) = P (T 0 ( X )∉ Rc | H 1 ) = P (T 0 ( X )≤30.14 | H 1 ) 98

Solved Exercises and Problems of Statistical Inference

=P

(

| ) (

2 30.14⋅σ20 n s2 n s2 σ 1 ≤30.14 H = P ≤30.14 H = P T (X )≤ 1 1 1 σ 20 σ12 σ 20 σ12

| ) (

)

For the particular value σ12 = 4.5ºF2,

(

β(4.5) = P T 1 ( X )≤

30.14⋅4 = P ( T 1 ( X )≤26.79 ) = 0.89 4.5

)

> pchisq(26.79, 20-1) [1] 0.8903596

By using a computer, many other values σ12 ≠ 4.5ºF2 can be considered so as to numerically determine the power curve 1–β(σ12) of the test and to plot the power function. ϕ(σ 2 ) = P ( Reject H 0) =

{

α( σ2 ) if σ ∈Θ0 1−β(σ 2) if σ∈Θ1

# Sample and inference n = 20 alpha = 0.05 theta0 = 4

# Value under the null hypothesis H0

q = qchisq(1-alpha,n-1) theta1 = seq(from=4,to=15,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = 1 - pchisq(q*theta0/paramSpace, n-1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

Conclusion: The null hypothesis H 0 : σ=σ 0 ≤ 2 is not rejected. When any of these factors is different, the decision might be the opposite. As regards the most appropriate alternative hypothesis, the value of S suggests that the test with σ1 > 2 is more powerful than the test with σ1 ≠ 2 (the test with σ1 < 2 against the equality would be the least powerful as both the methodologies—H0 is the default hypothesis—and the data “tend to help H0”). (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.) My notes:

Exercise 3ht-T Let X = (X1,...,Xn) be a simple random sample with 25 data taken from a normal population variable X. The sample information is summarized in 99

Solved Exercises and Problems of Statistical Inference

25

∑ j=1 x j=105

and

25

∑ j=1 x 2j=579.24

(a) Should the hypothesis H0: σ2 = 4 be rejected when H1: σ2 > 4 and α = 0.05? Calculate β(5). (b) And when H1: σ2 ≠ 4 and α = 0.05? Calculate β(5). Use a computer to plot the power function.

Discussion: The supposition that the normal distribution is appropriate to model X should be statistically proved. This statement is theoretical.

Statistic: We know that • •

There is one normal population The population mean is unknown

and hence the following statistic is selected T ( X ; σ)=

n s2 ∼ χ 2n−1 2 σ

We will work with the two following particular cases: T 0 ( X )=

n s2 ∼ χ 2n−1 2 σ0

and

T 1 ( X )=

n s2 ∼ χ2n−1 2 σ1

To make the decision, we need to evaluate the statistic at the specific data available x: 25 T 0 ( x)=

[

1 1 x 2j − ∑ ∑ xk 25 25 4

(

) ] = 25⋅5.53 =34.56 2

where to calculate the sample variance, the general property s 2=

4

1 n 1 n 2 X − ∑ ∑ X n j =1 j n j=1

(

j

)

2

has been used.

(a) One-tailed alternative hypothesis Hypotheses:

2 2 H 0 : σ = σ 0 = 4 and

2

2

H 1: σ = σ1 > 4

For these hypotheses,

Decision: To determine the rejection region, under H0, the critical value a is found by applying the definition of type I error, with α = 0.05 at σ02 = 4:

α (4) = P (Type I error ) = P ( Reject H 0 ∣ H 0 true)= P (T ( X ;θ)∈ Rc ∣ H 0 ) = P (T 0 (X )>a)

100

Solved Exercises and Problems of Statistical Inference

→ a=r α=r 0.05=36.4 → Rc = {T 0 ( X )>36.4 } To make the final decision: T 0 ( x)=34.56 < 36.4 → T 0 ( x)∉ Rc → H0 is not rejected. The second methodology requires the calculation of the p-value:

pV =P ( X more rejecting than x ∣ H 0 true)=P (T 0 ( X )>T 0 ( x))=P (T 0 ( X )>34.56)=0.075 > 1 - pchisq(34.56, 25-1) [1] 0.07519706

→ pV =0.075> 0.05=α → H0 is not rejected.

Type II error: To calculate β, we have to work under H1, that is, with T1. Since the critical region is expressed in terms of T0, the mathematical trick of multiplying and dividing by same quantity is applied: β(σ12) = P (Type II error ) = P ( Accept H 0 ∣ H 1 true)= P (T 0 ( X )∉R c ∣ H 1) = P (T 0 ( X )≤36.4 ∣ H 1)

∣ ) (

2 36.4⋅σ20 n s2 n s2 σ 1 =P ≤36.4 H 1 = P ≤36.4 H 1 = P T 1 (X )≤ σ 20 σ12 σ 20 σ12

∣ ) (

(

)

For the particular value σ12 = 5,

(

β(5) = P T 1 ( X )≤

36.4⋅4 = P ( T 1 ( X )≤29.12 ) = 0.78 5

)

> pchisq(29.12, 25-1) [1] 0.7843527

By using a computer, many other values σ12 ≠ 5 can be considered so as to numerically determine the power curve 1–β(σ12) of the test and to plot the power function. ϕ(σ 2 ) = P ( Reject H 0) =

{

α( σ2 ) if σ ∈Θ0 1−β(σ 2) if σ∈Θ1

# Sample and inference n = 25 alpha = 0.05 theta0 = 4

# Value under the null hypothesis H0

q = qchisq(1-alpha,n-1) theta1 = seq(from=4,to=15,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = 1 - pchisq(q*theta0/paramSpace, n-1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

(b) Two-tailed alternative hypothesis Hypotheses:

2 2 H 0 : σ = σ 0 = 4 and

101

2

2

H 1: σ = σ1 ≠ 4

Solved Exercises and Problems of Statistical Inference

For these hypotheses,

Decision: Now there are two tails, determined by two critical values a1 and a2 that are found by applying the definition of type I error, with α = 0.05 at σ02 = 4, and the criterion of leaving half the probability in each tail: α (4)= P(Type I error )=P ( Reject H 0 ∣ H 0 true)= P(T ( X ; θ)∈R c ∣ H 0 )=P (T 0 ( X )
a 2 ) We always consider two tails with the same probability,

{

α (4) =P (T 0 ( X )< a1) → a1=r 1−α/ 2=12.4 2 α (4) =P (T 0 ( X )>a 2) → a 2=r α / 2=39.4 2



Rc ={T 0 ( X ) 39.4 }

To make the final decision: T 0 ( x)=34.56 → T 0 ( x)∉Rc → H0 is not rejected To base the decision on the p-value, we calculate twice the probability of the tail: pV =P ( X more rejecting than x ∣ H 0 true)=2⋅P (T 0 ( X )> T 0 (x )) =2⋅P (T 0 ( X )>34.56)=2⋅0.075=0.15

> 1 - pchisq(34.56, 25-1) [1] 0.07519706

→ pV =0.15> 0.05=α → H0 is not rejected

Note: The wrong tail would have been selected if we had obtained a p-value bigger than 1.

Type II error: To calculate β, β(σ12) = P (Type II error ) = P ( Accept H 0 ∣ H 1 true) = 1−P (T ( X ; θ)∈ Rc ∣ H 1)

[(

= 1−P ({T 0 ( X )< 12.4 }∪{T 0 ( X )>39.4 }| H 1 ) = 1− P

[(

= 1− P

2

∣ ) (

2

| ) (

| )]

n s2 n s2 39.4 H 1 1 σ 20 σ02

∣ )]

n s2 12.4⋅σ 0 n s 2 39.4⋅σ0 < H +1−P ≤ H1 1 σ 21 σ 21 σ 21 σ 21

(

2

| ) (

2

| ) (

2

) (

2

39.4⋅σ0 12.4⋅σ0 n s 2 12.4⋅σ 0 n s 2 39.4⋅σ0 =−P < H + P ≤ H = P T ( X )≤ −P T ( X )< 1 1 1 1 σ 21 σ21 σ 21 σ12 σ21 σ12

)

For the particular value σ12 = 5, β(5) = P ( T 1 ( X )≤31.52 ) −P ( T 1 ( X )< 9.92 ) = 0.86−0.0051 = 0.85

> pchisq(c(9.92, 31.52), 25-1) [1] 0.00513123 0.86065162

Again, the computer allows the power function to be plotted. # Sample and inference n = 25 alpha = 0.05 theta0 = 4

# Value under the null hypothesis H0

q = qchisq(c(alpha/2,1-alpha/2),25-1)

102

Solved Exercises and Problems of Statistical Inference

theta1 = seq(from=0,to=15,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = 1 - pchisq(q[2]*theta0/paramSpace, n-1) + pchisq(q[1]*theta0/paramSpace, n-1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

Comparison of the power functions: For the one-tailed test, the power of the test at σ12 = 5 is 1–β(5) = 1–0.78 = 0.22, while for the two-tailed test it is 1–β(5) = 1–0.85 = 0.15. As expected, this latter test has smaller power (higher type II error), since in the former test additional information is being used when one tail is previously discarded. Now we compare the power functions of the two tests graphically, for the common values (> 4), by using the code # Sample and inference n = 25 alpha = 0.05 theta0 = 4 # Value under the null hypothesis H0 q = qchisq(c(alpha/2,1-alpha/2),25-1) theta1 = seq(from=0,to=15,0.01) paramSpace1 = sort(unique(c(theta1,theta0))) PowerFunction1 = 1 - pchisq(q[2]*theta0/paramSpace1, n-1) + pchisq(q[1]*theta0/paramSpace1, n-1) q = qchisq(1-alpha,n-1) theta1 = seq(from=4,to=15,0.01) paramSpace2 = sort(unique(c(theta1,theta0))) PowerFunction2 = 1 - pchisq(q*theta0/paramSpace2, n-1) plot(paramSpace1, PowerFunction1, xlim=c(0,15), xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l') lines(paramSpace2, PowerFunction2, lty=2)

It can be noticed that the curve of the one-sided test is over the curve of the two-sided test for any σ2 > 4, 103

Solved Exercises and Problems of Statistical Inference

which makes it uniformly more powerful. In this exercise, from the sample information we could have calculated the estimator S2 of σ2 so as to see if its value is far from 4 and therefore one of the two one-sided tests should be considered better.

Conclusion: The hypothesis that the population variance is equal to 4 is not rejected in either of the two sections. Although it has not happened in this case, different decisions may be made for the one- and two-tailed cases. (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.) My notes:

Exercise 4ht-T Imagine that you are hired as a cook. Not an ordinary one but a “statistical cook.” For a normal population, in testing the two hypotheses 2 2 H 0 : σ = σ0 =4 H 1 : σ 2 = σ21 >4

{

the data (sample x of size n = 11 such that S2=7.6u2) and the significance (α=0.05) have led to rejecting the null hypothesis because

1−α

r 0.05=18.31

T 0 ( x)=19

where T0 is the usual statistic. A decision depends on several factors:  Methodology  Statistic T0  Form of the alternative hypothesis H1  Significance α  Data x

(edu.glogster.com/)

Since the chef—your boss—wants the null hypothesis H0 not to be rejected, find three different ways to scientifically make the opposite decision by changing any of the previous factors. Give qualitative explanations and, if possible, quantitative ones.

Discussion: Metaphorically, Statistics can be thought of as the kitchen with its utensils and appliances, the first two factors as the recipe, and the next three items as the ingredients—if H1, α or x are inappropriate, there is little to do and it does not matter how good the kitchen, the recipe and you are. Our statistical knowledge allows us to change only the last three elements. The statistic to study the variance of a normal population is T ( X )=

(n−1) S 2 ∼ χ2n −1 2 σ

104

so, under H0, T 0 ( x)=

(n−1) S 2 (11−1)7.6 u2 76 = = =19. 4 σ 20 4 u2

Solved Exercises and Problems of Statistical Inference

Qualitative reasoning: By looking at the figure above, we consider that: A) If a two-tailed test is considered (H1: σ2 = σ12 ≠ 4), the critical value would be r α / 2 instead of r α and, then, the evaluation T 0 (x) may not lie in the rejection region (tails). B) Equivalently, for the original one-tailed test, the critical value r α increases when the significance α decreases, perhaps with the same implication as in the previous item. C) Finally, for the same one-sided alternative hypothesis and significance, that is, for the same critical value r α , the evaluation T 0 (x) would lie out ot the critical region (tail) if the data x—the values themselves or only the sample size—are such that T 0 (x) < r α=18.31 . D) Additionally, a fourth way could consist of some combinations of the previous ways.

Quantitative reasoning: The previous qualitative explanations can be supported with calculations. A) For the two-tailed test, now the critical value would be r 0.05 /2=r 0.025=20.48 . Then T 0 ( x)=19 < 20.48=r 0.025 → T 0 ( x)∉Rc → H0 is not rejected.

B) The same effect is obtained if, for the original one-tailed H1, the significance is taken to be 0.025 instead of 0.05. Any other value smaller than 0.025 would lead to the same result. Is 0.025—suggested by the previous item—the smallest possible value? The answer is made by using the p-value, since it is sometimes defined as the smallest significance level at which the null hypothesis is rejected. Then, since pV =P ( X more rejecting than x | H 0 true)=P (T 0 ( X )> 19)=0.0403

> 1 - pchisq(19, 11-1) [1] 0.04026268

for any α < 0.0403 it would hold that 0.0403= pV > α → H0 is not rejected C) Finally, for the original test and the same value for n, since ~ 2 ~2 (n−1) S 2 ~ S 2 (n−1) S S ~ T 0 ( x)= = 2 = 19 < 18.31=r α 2 2 σ0 S σ0 7.6 u 2 the opposite decision would be made for any sample quasivariance such that 2

7.6 u ~2 2 S < 18.31 =7.324 u → T 0 ( x)∉ Rc → H0 is not rejected 19 On the other hand, for the original test and the same value for S, since 2 2 (~ n −1) S ( ~ n −1) ( n−1) S (~ n −1) T~0 ( x)= = = 19 < 18.31=r α 2 2 (n−1) (11−1) σ0 σ0 the opposite decision would be made for any sample size such that (11−1) ~ n ≤ 10 → T 0 ( x)∉ Rc → H0 is not rejected n < 18.31 +1=10.63684 ↔ ~ 19 D) Some combinations can easily be proved to lead to rejecting H0.

Conclusion: This exercise highlights how much careful one must be in either writing or reading statistical works. My notes:

105

Solved Exercises and Problems of Statistical Inference

Exercise 5ht-T The distribution of a variable is supposed to be normally distributed in two independent biological populations. The two population variances must be compared. After gathering information through simple random samples of sizes nX = 11, nY = 10, respectively, we are given the value of the estimators 2

SX=

n 1 2 ( x j− ¯x ) =6.8 ∑ j=1 n X −1

2

X

sY =

n 1 2 ( y j− ¯y ) =7.1 ∑ j=1 nY Y

For α = 0.1, test: (a) H0: σX = σY against H1: σX < σY (b) H0: σX = σY against H1: σX > σY (c) H0: σX = σY against H1: σX ≠ σY In each section, calculate the analytical expression of the type II error and plot the power function by using a computer.

Discussion: In a real-world situation, suppositions should be proved. We must pay careful attention to the details: the sample quasivariance is provided for one group, while the sample variance is given for the other.

Statistic: From the information in the statement, • •

There are two independent normal populations The population means are unknown

the statistic T ( X , Y ; σ X , σ Y )=

S 2X σ2X S 2Y

=

S 2X σ2Y S 2Y σ 2X

∼ Fn

X

−1 ,nY −1

σ2Y is selected from a table of statistics (e.g. in [T]). It will be used in two forms (we can write σX2/ σY2 = θ1):

T 0 ( X ,Y )=

S 2X σ 2X S 2Y

2

=

S 2X S 2Y

∼ Fn

X

−1 ,nY −1

and

σ2Y

T 1 ( X , Y )=

SX 2 θ 1⋅σY S 2Y

2

1 SX = ∼ Fn θ1 S 2 Y

X

−1 , nY −1

σ2Y

(On the other hand, the pooled sample variance Sp2 should not be considered even under H0: σX = σ = σY, as T 0=( S 2p /S 2p )=1 whatever the data are.) To apply any of the two methodologies we need to evaluate T0 at the samples x and y: 2

T 0 ( x , y )=

2

SX SX 6.8 = = =0.86 2 nY 2 10 SY 7.1 s nY −1 Y 10−1

Since we were given the sample quasivariance of population X, but the sample variance of population Y, the general property n s 2 = (n−1) S 2 has been used to calculate SY2.

106

Solved Exercises and Problems of Statistical Inference

(a) One-tailed alternative hypothesis σX < σY H 0 : σ 2X =σ 2Y

Hypotheses:

and

σ 2X Or, equivalently, H 0 : 2 = θ0 = 1 σY

H 1 : σ 2X < σ2Y and

σ2X H 1 : 2 = θ1 < 1 σY

For these hypotheses,

Decision: To determine the critical region, under H0, the critical value a is found by applying the definition of type I error, with α = 0.1 at θ0 = 1: α (1)= P (Type I error )=P (Reject H 0 ∣ H 0 true)= P (T ( X , Y )
T 0 ( X ,Y ) a

)



1 =2.35 a

(From the definition of the F distribution, it is easy to see that if X follows a Fk1,k2 then 1/X follows a Fk2,k1. We use this property to consult our table.)

1 =0.43 → Rc = {T 0 ( X , Y )< 0.43} 2.35

To make the final decision about the hypotheses:

T 0 ( x , y )=0.86 → T 0 ( x)∉ Rc → H0 is not rejected. The second methodology requires the calculation of the p-value:

pV =P ( X ,Y more rejecting than x , y ∣ H 0 true) =P (T 0 (X , Y ) pf(0.86, 11-1, 10-1) [1] 0.406005

→ pV =0.41> 0.1=α → H0 is not rejected.

Power function: To calculate β, we have to work under H1, that is, with T1. Since in this case the critical region is already expressed in terms of T0, the mathematical trick of multiplying and dividing by the same quantity is applied:

β(θ1 ) = P (Type II error) = P( Accept H 0 ∣ H 1 true) = P (T 0 ( X )∉ Rc ∣ H 1 ) = P (T 0 ( X )≥0.43 ∣ H 1 ) =P

(

2

SX

∣ ) (

2

∣ )

1 S 1 0.43 0.43 ≥0.43 H 1 = P θ X2 ≥ θ 0.43 H 1 = P T 1 ( X )≥ θ = 1−P T 1 ( X )< θ 2 1 1 1 1 SY SY

(

)

(

)

By using a computer, many values θ1 can be considered so as to determine the power curve 1–β(θ1) of the test and to plot the power function. ϕ(θ) = P ( Reject H 0 ) = α (θ) if θ∈Θ0 1−β(θ) if θ ∈Θ1

{

# Sample and inference nx = 11;

ny = 10

alpha = 0.1 theta0 = 1 q = qf(alpha,nx-1,ny-1) theta1 = seq(from=0,to=1,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = pf(q/paramSpace, nx-1, ny-1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

107

Solved Exercises and Problems of Statistical Inference

(b) One-tailed alternative hypothesis σX > σY H 0 : σ 2X =σY2

Hypotheses:

and

H 1 : σ 2X > σ2Y

2

σX Or, equivalently, H 0 : 2 = θ0 = 1 σY

2

and

σ H 1 : X2 = θ1 > 1 σY

For these hypotheses,

Decision: To apply the methodology based on the rejection region, the critical value a is found by applying the definition of type I error, with α = 0.1 at θ0 = 1: α (1)= P (Type I error )=P ( Reject H 0 ∣ H 0 true)= P (T ( X , Y )>a ∣ H 0 )=P (T 0 ( X , Y )> a) → a=r α=2.42 → Rc = {T 0 ( X , Y )> 2.42 } The final decision is: T 0 ( x , y )=0.86 → T 0 ( x)∉ Rc → H0 is not rejected. The second methodology requires the calculation of the p-value: pV =P ( X ,Y more rejecting than x , y | H 0 true)= P(T 0 ( X , Y )>T 0 ( x , y )) =P (T 0 (X , Y )> 0.86)= 1−0.41=0.59

→ pV =0.59> 0.1=α → H0 is not rejected.

> pf(0.86, 11-1, 10-1) [1] 0.406005

Power function: Now β(θ1 )= P (Type II error )= P ( Accept H 0 | H 1 true) = P(T 0 ( X )∉ Rc | H 1) = P (T 0 (X )≤2.42 | H 1) =P

(

2

SX

| ) (

≤2.42 H 1 = P 2

SY

2

| )

1 SX 1 2.42 ≤ 2.42 H 1 = P T 1 ( X )≤ θ 1 S 2 θ1 θ1 Y

(

)

By using a computer, many values θ1 can be considered so as to plot the power function. 108

Solved Exercises and Problems of Statistical Inference

# Sample and inference nx = 11;

ny = 10

alpha = 0.1 theta0 = 1 q = qf(1-alpha,nx-1,ny-1) theta1 = seq(from=1,to=15,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = 1 - pf(q/paramSpace, nx-1, ny-1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

(c) Two-tailed alternative hypothesis σX ≠ σY Hypotheses:

2

2

H 0 : σ X =σ Y

and

2

2

H 1 : σ X ≠σY

σ 2X Or, equivalently, H 0 : 2 = θ0 = 1 and σY

σ2X H 1 : 2 = θ1 ≠ 1 σY

For these hypotheses,

Decision: For the first methodology, the critical region must be determined by applying the definition of type I error, with α = 0.1 at θ1 = 1, and the criterion of leaving half the probability in each tail: α (1)= P (Type I error )= P( Reject H 0 | H 0 true)=P (T 0 ( X ,Y )
a2 )



{

α (1) = P(T 0 ( X , Y )a 2 ) → a2=r α / 2=3.14 2

→ Rc ={T 0 ( X , Y ) 3.14 }

> qf(c(0.05, 0.95), 11-1, 10-1) [1] 0.3310838 3.1372801

The decision depends on whether the evaluation of T0 is in the rejection region: T 0 ( x , y )=0.86 → T 0 ( x)∉ Rc → H0 is not rejected. 109

Solved Exercises and Problems of Statistical Inference

To apply the methodology based on the p-value, we calculate the median qf(0.5, 11-1, 101)=1.007739; thus, since T(x,y) is in the left-hand tail: pV =P ( X ,Y more rejecting than x , y | H 0 true)=2⋅P (T 0 ( X , Y ) ηR (one-tailed test). Should this hypothesis be written as a null or as an alternative hypothesis? In general, since we fix the type I error in our methodologies, a strong sample evidence is necessary to reject H0. Thus, the decision of allocating the condition to be tested in H0 or H1 depends on our choice (usually on what “making a wrong decision” means or implies for the specific framework we are working in). We are going to solve both cases. From a theoretical point of view, H0: ηI ≥ ηR is essentialy the same as H0: ηI = ηR. As a final remark, in this exercise it holds that 0.53 + 0.47 = 1; this happens just by chance, since these two quantities are independent and can take any value in [0,1]. On the other hand, proportions are always dimensionless.

Statistic: We know that • •

There are two independent Bernoulli populations The sample sizes are larger than 30

so we use the asymptotic result involving two proportions: T ( I , R)=

^ I − η^ R )−(ηI −ηR ) (η



? I (1−? I ) ? R (1−? R ) + nI nR

d

→ N (0,1)

where each ? must be substituted by the best possible information: supposed or estimated. Two particular versions of this statistic will be used: T 0 ( I , R)=

^ I −η ^ R)−θ0 (η



^ R )−θ 1 (η^ I − η

d

η ^ I (1−η^ I ) η ^ (1− η ^ R) + R nI nR

→ N (0,1) and T 1 (I , R)=



η ^ I (1− η ^ I) η ^ (1−η^ R ) + R nI nR

d

→ N (0,1)

To determine the critical region or to calculate the p-value, both under H0, we need the value of the statistic for the particular samples available: (0.53−0.47)−0 T 0 (i , r )= =2.25 0.53(1−0.53) 0.47(1−0.47) + 700 700



1) Question in H0 Hypotheses: If we want to allocate the question in the null hypothesis to reject it only when the data strongly suggest so, H 0 : ηI −ηR = θ 0 ≥ 0 and H 1 : ηI −ηR = θ1 < 0 By looking at the alternative hypothesis, we deduce the form of the critical region:

112

Solved Exercises and Problems of Statistical Inference

The quantity c can be thought of as a margin over θ0 not to exclude cases where ηI – ηR = θ0 = 0 really holds while values slightly smaller than θ0 are due to mere random effects. Decision: To apply the first methodology, the critical value a that determines the rejection region is found by applying the definition of type I error, with the value α = 1 – 0.99 = 0.01 at θ0 = 0: α (0) = P (Type I error) = P( Reject H 0 | H 0 true) = P( T (I , R)∈R c | H 0 )= P (T 0 ( I , R) 0 By looking at the alternative hypothesis, we deduce the form of the critical region:

The quantity c can be thought of as a margin over θ0 not to exclude cases where ηI – ηR = θ0 = 0 really holds while values slightly larger than θ0 are due to mere random effects. Decision: To apply the first methodology, the critical value a is calculated as follows: α (0)= P (Type I error) = P( Reject H 0 | H 0 true) = P( T (I , R)∈R c | H 0 )= P (T 0 ( I , R)>a)

→ a=r 0.01=2.326 → Rc = {T 0 ( I , R)> 2.326 } The decision is: T 0 ( i , r )=2.25 → T 0 (i , r )∉ Rc → H0 is not rejected. The second methodology consists in doing:

114

Solved Exercises and Problems of Statistical Inference

pV =P ( I , R more rejecting than i , r | H 0 true)= P (T 0 (I , R) > T 0 (i , r )) =P (T 0 (I , R) > 2.25)=1−P (T 0 ( I , R) ≤ 2.25)=0.0122 → pV =0.0122 > 0.01=α → H0 is not rejected. Type II error: Finally, to calculate β:

β(θ1 )= P (Type II error) = P( Accept H 0 ∣ H 1 true) = P (T 0 ( I , R)∉ Rc | H 1) = P

=P

(√

(√

(η ^ I − η^ R )+ 0−θ 1 ^ I (1− η ^ I) η ^ ( 1− η ^ R) η + R nI nR

(

= P T 1 ( I , R)≤2.326−

+

(η ^ I −η ^ R)−θ0 ^ I (1− η^ I ) η ^ (1− η ^ R) η + R nI nR θ1



^ I (1− η ^ I) η ^ (1−η^ R ) η + R nI nR θ1



|) |)

≤2.326 H 1

0.53 (1−0.53) 0.47(1−0.47) + 700 700

≤2.326 H 1

)

For the particular value θ1 = 0.1,

(

β(0.1)= P T 1 ( I , R)≤2.326−



0.1 = P ( T 1 ( I , R)≤−1.42 ) =0.078 0.53(1−0.53) 0.47(1−0.47) + 700 700

)

By using a computer, many more values θ1 ≠ 0.1 can be considered so as to numerically determine the power of the test curve 1–β(θ1) and to plot the power function. ϕ(θ) = P ( Reject H 0 ) =

{

α (θ) if θ∈Θ0 1−β(θ) if θ ∈Θ1

# Sample and inference ni = 700; nr = 700 sPi = 0.53; sPr = 0.47 alpha = 0.01 theta0 = 0 # Value under the null hypothesis H0

115

Solved Exercises and Problems of Statistical Inference

q = qnorm(1-alpha,0,1) theta1 = seq(from=0,to=+0.25,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = 1 - pnorm(q-paramSpace/sqrt(sPi*(1-sPi)/ni + sPr*(1-sPr)/nr),0,1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

This code generates the figure above.

Conclusion: The hypothesis that the two proportions are equal is not rejected when the question is allocated in either the alternative or the null hypothesis (the best way of testing an equality). That is, it seems that both populations wish to visit Spain with the same desire. The sample information η^ I =0.53 and η^ R =0.47 suggested the alternative hypothesis H1: ηI – ηR > 0. The two power functions show how symmetric the situations are. (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.)

Advanced theory: Under the hypothesis H0: ηI = η = ηR, it makes sense to try to estimate the common variance η(1–η) of the estimator—in the denominator—as well as possible. This can be done by using the n η^ + n η^ pooled sample proportion η^ p= I I R R . Nevertheless, the pooled estimator should not be considered in n I + nR the numerator, since ( η^ p− η^ p)=0 whatever the data are. Now, the statistic under the null hypothesis is:

T~0 ( I , R)=

(η ^ I − η^ R )−θ0



η ^ p (1− η ^ p ) η^ p (1−η^ p ) + nI nR

= T 0 ( I , R)

Then, η^ p =

√ √

=

( η^ I −η^ R )−θ 0



η^ I (1− η ^ I ) η^ R (1−η^ R ) + nI nR

η^ I (1− η ^ I ) η^ R (1−η^ R ) + nI nR η^ p (1− η ^ p) η ^ (1− η ^ p) + p nI nR

η ^ p (1− η ^ p) η ^ (1−η^ p ) + p nI nR

d

→ N (0,1)

700⋅0.53+ 700⋅0.47 0.53+0.47 1 = = =0.5 → 700+ 700 1+1 2

116

√ √

η ^ I (1−η^ I ) η ^ (1− η ^ R) + R nI nR

√ √

η^ I (1− η ^ I ) η^ R (1−η^ R ) + nI nR η^ p (1− η^ p) η ^ (1− η ^ p) + p nI nR

Solved Exercises and Problems of Statistical Inference

= 0.9981983

~ → T 0 ( i , r )=2.25⋅0.9981983=2.24 .

The same decisions are made with T 0 and T~0 because of the little effect of using η^ p in this exercise (see the value of the quotient of square roots above); in other situations, both ways may lead to paradoxical results. As regards the calculations of the type II error, both the mathematical trick of multiplying and dividing by the same quantity and the mathematical trick of adding and subtracting the same quantity should be applied now. For section (a): β(θ1 ) = P (Type II error )= P ( Accept H 0 | H 1 true) = P ( T~0 ( I , R)∉ Rc | H 1) = P

=P

=P

(√ (√

(√

^ I −η ^ R)−0−θ 1+ θ1 (η η ^ I (1− η^ I ) η ^ (1− η ^ R) + R nI nR (η ^ I−η ^ R)−θ1 ^ I (1− η ^ I) η ^ ( 1− η ^ R) η + R nI nR

(

= P T 1 ( I , R)≥−2.330−

(η ^ I− η ^ R)−θ0 ^ p ( 1−η^ p ) η^ p (1−η^ p) η + nI nR

√ √

≥−2.325⋅

+

η^ p (1− η ^ p) η ^ (1− η ^ p) + p nI nR η^ I (1− η ^ I ) η^ R (1−η^ R ) + nI nR θ1



|)

≥−2.325 H 1

^ I (1− η ^ I) η ^ (1−η^ R ) η + R nI nR

|)

≥−2.325⋅1.002 H 1

θ1



|) H1

0.53(1−0.53) 0.47(1−0.47) + 700 700

)

For the particular value θ1 = –0.1,

(

β(−0.1) = P T 1( I , R)≥−2.330−



)

−0.1 = P ( T 1( I , R)≥1.41 )=0.079 . 0.53(1−0.53) 0.47(1−0.47) + 700 700

Similarly for section (b). My notes:

[HT-p] Based on Λ Exercise 1ht-Λ A random quantity X follows a Poisson distribution. Let X = (X1,...,Xn) be a simple random sample. By applying the results involving Neyman-Pearson's lemma and the likelihood ratio, study the critical region (estimator that arises and form) for the following pairs of hypotheses.

117

Solved Exercises and Problems of Statistical Inference

{

{

H 0: λ = λ0 H 1: λ = λ1

H 0 : λ = λ0 H 1 : λ = λ1 > λ0

{

H 0 : λ = λ0 H 1 : λ = λ1 < λ0

{

H 0 : λ ≤ λ0 H 1 : λ = λ 1> λ0

{

H 0 : λ ≥ λ0 H 1 : λ = λ 1< λ 0

Discussion: This is a theoretical exercise where no assumption should be evaluated. First of all, Neyman-Pearson's lemma will be applied. We expect the maximum-likelihood estimator of the parameter—calculated in a previous exercise—and the “usual” critical region form to appear. If the critical region does not depend on any particular value θ1, the uniformly most powerful test will have been found.

Poisson distribution: X ~ Pois(λ) For the Poisson distribution,

Identification of the variable:

{

Hypothesis test

X ~ Pois(λ)

H 0: λ = λ0 H 1: λ = λ1

Likelihood function and likelihood ratio: n

L( X ; λ)=

λ

∑ j=1 X j

n

∏ j=1 X j !

e−n λ

L ( X ; λ 0) λ0 ∑ Λ ( X ; λ 0 , λ1 ) = = L( X ; λ 1) λ1

( )

and

n j=1

X

j

e−n(λ −λ ) 0

1

Rejection region: Rc = { Λ < k } =

{(

{( ∑

j

=

n j=1

λ0 ∑ λ1

n

X

}

( ) } λ λ ¯ ⋅log X )⋅log ( ) < log (k )+ n( λ −λ ) = n X ( λ } { λ ) < log (k )+n (λ −λ )} )

j=1

j

−n(λ 0−λ 1)

e

λ 0 then log

{ {

} }

log(k )+n (λ 0−λ 1) λ0 ̄ < > 0 and hence Rc = X λ1 λ n log 0 λ1

( )

( )

log(k )+n (λ 0−λ 1) λ0 ̄ > < 0 and hence Rc = X λ1 λ n log 0 λ1

( )

( )

̄ =λ̂ ML (calculated in a previous exercise) and regions of the form This suggests the estimator X Rc = {Λ< k } = ⋯= { λ̂ ML c }= ⋯= {T 0 > a } Hypothesis tests

{

H 0 : λ = λ0 H 1 : λ = λ 1> λ 0

{

H 0 : λ = λ0 H 1 : λ = λ 1< λ 0

In applying the methodologies, and given α, the same critical value c or a will be obtained for any λ1 since it only depends upon λ0 through λ^ ML or T0: 118

Solved Exercises and Problems of Statistical Inference

or

α=P (Type I error)= P (T 0 a)

This implies that the uniformly most powerful test has been found.

{

Hypothesis tests

{

H 0 : λ ≤ λ0 H 1 : λ = λ 1> λ 0

H 0 : λ ≥ λ0 H 1 : λ = λ 1< λ 0

A uniformly most powerful test for H 0 : λ = λ0 is also uniformly most powerful for H 0 : λ ≤ λ0 .

Exponential distribution: For the exponential distribution,

Identification of the variable: Hypothesis test

{

X ~ Exp(λ)

H 0: λ = λ0 H 1: λ = λ1

Likelihood function and likelihood ratio: n

n −λ ∑ j=1 X

L( X ; λ) = λ e

and

j

L ( X ; λ 0) λ0 n −(λ −λ )∑ Λ ( X ; λ 0 , λ1 ) = = e L( X ; λ 1) λ1

( )

0

1

n j=1

X

j

Rejection region: Rc = { Λ < k } =

{(

λ0 n −(λ −λ )∑ e λ1

)

n

0

1

j=1

X

{

j

}{

< k = n log

( λλ )−(λ −λ ) ∑

n

0 1

0

1

j=1

X j < log(k )

( )} {

}

( )}

n λ λ ¯ < log( k )−n log 0 = (λ1−λ 0) ∑ j=1 X j < log(k )−n log λ 0 = (λ 1−λ 0) n X λ1 1

Now it is necessary that λ 1≠λ 0 and





{

̄ > if λ 1< λ 0 then (λ 1−λ 0 )< 0 and Rc = X

{

λ log (k )−n log λ 0 1

}{

( )= 1
λ 0 then (λ 1−λ 0 )> 0 and Rc = X

λ0

}{

̄ X

(λ ) = 1 > 1

n(λ 1−λ0 )

̄ X

λ0 λ1

}

λ0 λ1

}

n (λ 1−λ 0) log (k )−n log

( )

n (λ 1−λ 0) log(k )−n log

( )

1 ̂ =λ ML (calculated in a previous exercise) and regions of the form X̄ Rc = {Λ< k } = ⋯= { λ̂ ML c }= ⋯= {T 0 > a }

This suggests the estimator

Hypothesis tests

{

H 0 : λ = λ0 H 1 : λ = λ 1> λ 0

{

H 0 : λ = λ0 H 1 : λ = λ 1< λ 0

In applying the methodologies, and given α, the same critical value c or a will be obtained for any λ1 since it 119

Solved Exercises and Problems of Statistical Inference

only depends upon λ0 through λ^ ML or T0:

α=P (Type I error)= P (T 0 a)

This implies that the uniformly most powerful test has been found.

{

Hypothesis tests

{

H 0 : λ ≤ λ0 H 1 : λ = λ 1>λ 0

H 0 : λ ≥ λ0 H 1 : λ = λ 1 0 and Rc = X η1( 1−η0)

)

η0 (1−η1) ̄ > a }

{

Hypothesis tests

{

H 0 : η = η0 H 1 : η = η1 >η0

H 0 : η= η0 H 1 : η= η1η0

H 0 : η≥ η0 H 1 : η= η1μ 0

{

n log(k ) σ2 + (μ 20−μ21 ) 2 ̄ > then (μ 0−μ 1)a } Hypothesis tests

{

H 0 : μ = μ0 H 1 : μ = μ1 >μ 0

{

H 0 : μ = μ0 H 1 : μ = μ 1μ 0

{

H 0 : μ ≥ μ0 H 1 : μ = μ 1a) → a=r α=6.359 → Rc = {T 0 ( X SA , X FO , X NY ) > 6.359 }

> qf(0.99, 2, 15) [1] 6.358873

Decision: Finally, it is necessary to check if this region “suggested by H0” is compatible with the value that the data provide for the statistic. If they are not compatible because the value seems extreme when the hypotheses is true, we will trust the data and reject the hypothesis H0. Since

T 0 ( x SA , x FO , x NY )=6.97 > 6.359 → T 0 ( x)∈Rc → H0 is rejected.

The second methodology is based on the calculation of the p-value: pV =P (( X SA , X FO , X NY ) more rejecting than (x SA , x FO , x NY ) ∣ H 0 true) =P (T 0 ( X SA , X FO , X NY )>T 0 ( x SA , x FO , x NY ))=P (T 0 >6.97)=0.0072 → pV =0.007243< 0.01=α → H0 is rejected.

> 1-pf(6.97, 2, 15) [1] 0.007235116

Conclusion: As suggested by the sample means, the population means of the three magazines are not equal with a confidence of 0.99, measured in a 0-to-1 scale. Pairwise comparisons could be applied to identify the differences.

Code to apply the analysis “semimanually” We have not done the calculations by hand but using the programming language R. The code is: # To enter the three samples SA = c(15.75, 11.55, 11.16, 9.92, 9.23, 8.20) FO = c(12.63, 11.46, 10.77, 9.93, 9.87, 9.42) NY = c(9.27, 8.28, 8.15, 6.37, 6.37, 5.66) # To join the samples in a unique vector Data = c(SA, FO, NY) # To calculate the sample mean of the three groups and the total sample mean mean(SA) ; mean(FO) ; mean(NY) ; mean(Data) # To calculate the measures and the statistic (for large datasets, the previous means should have been saved) SSG = 6*((mean(SA) - mean(Data))^2) + 6*((mean(FO) - mean(Data))^2) + 6*((mean(NY) - mean(Data))^2) MSG = SSG/(3-1) SSW = sum((SA - mean(SA))^2) + sum((FO - mean(FO))^2) + sum((NY - mean(NY))^2) MSW = SSW/(18-3) T0 = MSG/MSW # To find the quantile 'a' that determines the critical region a = qf(0.99, 2, 15) # To calculate the p-value pValue = 1 - pf(T0, 2, 15)

(In the console, write the name of a quantity to print its value.)

Code to apply the analysis with R Statistical software programs have many built-in functions to apply the most basic methods. Now we use R to obtain the analysis of variance table. As regards the syntaxis, it is based on the linear regression framework, 124

Solved Exercises and Problems of Statistical Inference

X p , j = μ p + ϵ p , j , where this linear dependence of X on the factor effect μp is denoted by Data ~ Group (see the call to the function aov below). ## After running the first block of lines of the previous code: # To create a vector with the membership labels Group = factor(c(rep("SA",length(SA)), rep("FO",length(FO)), rep("NY",length(NY)))) # To apply a one-factor analyis of variance objectAV = aov(Data ~ Group) # To print the table with the results summary(objectAV)

The ANOVA table is Df Group 2 Residuals 15 --Signif. codes:

Sum Sq 48.53 52.22 0

‘***’

Mean Sq 24.264 3.481 0.001

‘**’

F value 6.97 0.01

‘*’

Pr(>F) 0.00723 ** 0.05

‘.’

0.1

‘ ’

1

(Compare these quantities with those obtained in the previous calculations.) An equivalent way of applying the analysis of variance with R consists in substituting the lines # To apply a one-factor analyis of variance objectAV = aov(Data ~ Group) # To print the table with the results summary(objectAV)

by the lines # To fit a linear regression model Model = lm(Data ~ Group) # To apply and print the analysis of variance anova(Model)

Code to check the assumptions By using a computer it is also easy to evaluate the fulfillment of the assumptions. # To enter the three samples SA = c(15.75, 11.55, 11.16, 9.92, 9.23, 8.20) FO = c(12.63, 11.46, 10.77, 9.93, 9.87, 9.42) NY = c(9.27, 8.28, 8.15, 6.37, 6.37, 5.66) # To join the samples in a unique vector Data = c(SA, FO, NY) # To create a vector with the membership labels Group = factor(c(rep("SA",length(SA)), rep("FO",length(FO)), rep("NY",length(NY)))) # To test the normality of the sample SA by applying two different hypothesis tests shapiro.test(SA) ks.test(SA, "pnorm", mean=mean(SA), sd=sd(SA)) # To test the normality of the sample FO by applying two different hypothesis tests shapiro.test(FO) ks.test(FO, "pnorm", mean=mean(FO), sd=sd(FO)) # To test the normality of the sample NY by applying two different hypothesis tests shapiro.test(NY) ks.test(NY, "pnorm", mean=mean(NY), sd=sd(NY)) # To test the equality of the variances bartlett.test(Data ~ Group)

My notes:

125

Solved Exercises and Problems of Statistical Inference

[HT] Nonparametric Remark 14ht: Nonparametric methods involve questions not based on parameters, and therefore it is not usually necessary to evaluate some kinds of supposition that were present in the parametric hypothesis tests.

Exercise 1ht-np Occupational Hazards. The following table is based on data from the U.S. Department of Labor, Bureau of Labor Statistics. Taxi Guards Drivers

Police

Cashiers

Homicide

82

107

70

59

Cause of death other than homicide

92

9

29

42

490 A) Use the data in the table, coming from a simple random sample, to test the claim that occupation is independent of whether the cause of death was homicide. Use a significance α = 0.05 and apply a nonparametric chi-square test. B) Does any particular occupation appear to be most prone to homicides? If so, which one? (Based on an exercise of Essentials of Statistics, Mario F. Triola, Pearson)

LINGUISTIC NOTE (From: Longman Dictionary of Common Errors. Turton, N.D., and J.B.Heaton. Longman.) job. Your job is what you do to earn your living: 'You'll never get a job if you don't have any qualifications.' 'She'd like to change her job but can't find anything better.' Your job is also the particular type of work that you do: 'John's new job sounds really interesting.' 'I know she works for the BBC but I'm not sure what job she does.' A job may be full-time or part-time (NOT half-time or half-day): 'All she could get was a part-time job at a petrol station.' do (for a living). When you want to know about the type of work that someone does, the usual questions are What do you do? What does she do for a living? etc 'What does your father do?' - 'He's a police inspector.' occupation. Occupation and job have similar meanings. However, occupation is far less common than job and is used mainly in formal and official styles: 'Please give brief details of your employment history and present occupation.' 'People in manual occupations seem to suffer less from stress.' post/position. The particular job that you have in a company or organization is your post or position: 'She's been appointed to the post of deputy principal.' 'He's applied for the position of sales manager.' Post and position are used mainly in formal styles and ofter refer to jobs which have a lot of responsability. career. Your career is your working life, or the series of jobs that you have during your working life: 'The scandal brought his career in politics to a sudden end.' 'Later on in his career, he became first secretary at the British Embassy in Washington.' Your career is also the particular kind of work for which you are trained and that you intend to do for a long time: 'I wanted to find out more about careers in publishing.' trade. A trade is a type of work in which you do or make things with your hands: 'Most of the men had worked in skilled trades such as carpentry or printing.' 'My grandfather was a bricklayer by trade.' profession. A profession is a type of work such as medicine, teaching, or law which requires a high level of training or education: 'Until recently, medicine has been a male-dominated profession.' 'She entered the teaching profession in 1987.' LINGUISTIC NOTE (From: The Careful Writer: A Modern Guide to English Usage. Bernstein, T.M. Atheneum) occupations. The words people use affectionately, humorously, or disparagingly to describe their own occupations are their own affair. They may say, “I'm in show business” (or, more likely, “show biz”), or “I'm in the advertising racket,” or “I'm in the oil game,” or “I'm in the garment line.” But outsiders should use more caution, more discretion, and more precision. For instance, it is improper to write, “Mr. Danaher has been in the law business in Washington.” Law is a profession. Similarly, to say someone is “in the teaching game” would undoubtedly give offense to teachers. Unless there is some special reason to be slangy or colloquial, the advisable thing to do is to accord every occupation the dignity it deserves.

126

Solved Exercises and Problems of Statistical Inference

Discussion: In this exercise, it is clear from the statement that we need to test the independence of two variables. A particular sample (x1,...,x490) were grouped and we are given the absolute frequencies in the empirical table. By looking at the table, the cashier occupation appears to be most prone to homicides.

Statistic: Since we have to apply a test of independence, from a table of statistics (e.g. in [T]) we select (N lk − e^lk )2 d 2 T 0 ( X )=∑l =1 ∑ k=1 → χ( L−1)(K −1) e^lk L

K

for L and K classes, respectively.

Hypotheses: The null hypothesis supposes that the two variables are independent, H 0 : X , Y independent

H 1 : X , Y dependent

and

or, probabilistically, and

H 0 : f ( x , y)= f X ( x)⋅ f Y ( y )

H 1 : f ( x , y )≠ f X ( x )⋅ f Y ( y )

This implies that the probability at any cell is the product of the marginal probabilities of its file and column. Note that two underlying probability distributions are supposed for X and Y, although we do not care about them, and we will directly estimate the probabilities from the empirical table. As

by substituting in the expression of the statistic, 318⋅174 2 172⋅101 2 82− 42− 490 490 T 0 ( x)= +⋯+ =65.52 318⋅174 172⋅101 490 490

(

)

(

)

This value, calculated under H0 and using the data, is necessary both to determine the critical region and to calculate the p-value. On the other hand, for any chi-square tests T0 is a nonnegative measure of the dissimilarity between the two tables; therefore, a value close to zero means that the two tables are similar, while the critical region is always of the form:

Decision: There are L= 2 and K = 4 classes, respectively, so d

T 0 ( X ) → χ(2L−1)(K −1) ≡ χ2(2−1 )(4−1) ≡ χ32 For the first methodology, to calculate a the definition of type I error is applied with α = 0.05:

α=P (Type I error)= P ( Reject H 0 ∣ H 0 true)= P(T ( X )∈ Rc ∣ H 0)≈ P (T 0 ( X )>a) 127

Solved Exercises and Problems of Statistical Inference

→ a=r α=7.81 → Rc = {T 0 ( X )>7.81 } The decision is:

T 0 ( x) = 65.52 ∈ Rc → H0 is rejected.

If we apply the methodology based on the p-value, pV = P ( X more rejecting than x ∣ H 0 true)=P (T 0 ( X )> T 0 ( x)) = P (T 0 ( X )>65.52) = 3.885781⋅10 →

−14

pV 65.52) < P(T 0 ( X )>11.3)=0.01 →

pV 0.25

Solved Exercises and Problems of Statistical Inference

For this alternative hypothesis, the critical region takes the form ̂ c }= Rc ={ η>

{√

η−η ̂ 0 η0 (1−η0) n

>

c−η0



η0 (1−η0 ) n

}

={T 0 >a }

Decision: To determine Rc, the quantile is calculated from the type I error with α = 0.1 at η0 = 0.25: α (0.25)=P (Type I error)= P( Reject H 0 ∣ H 0 true)= P(T 0 >a) → a=r 0.1=l 0.9=1.28 → Rc = {T 0 ( X )>1.28 }. Now, the decision is: T 0 ( x)=0.843 < 1.28 → T 0 ( x)∉Rc → H0 is not rejected. p-value

pV = P ( X more rejecting than x ∣ H 0 true)=P (T 0 ( X )>T 0 ( x)) = P (T 0 ( X )> 0.843)=0.200 → pV = 0.200 > 0.1=α → H0 is not rejected. Type II error: To calculate β, we have to work under H1. Since the critical region has been expressed in terms of T0, and we must use T1, we could apply the mathematical trick of adding and subtracting the same quantity. ̂ c } has not been calculated yet; now, since we Nevertheless, this way is useful when the value c in Rc ={ η> ̂ 0.3} it is easier to directly standardize with η1: have been said that Rc ={ η> ^ β(η) = P (Type II error ) = P ( Accept H 0 | H 1 true)= P (T 0 ( X )∉ Rc | H 1 )= P ( η≤0.3 | H 1) =P

(√

̂ −η1 η η1 (1−η1) n



0.3−η1



η1 (1−η1 ) n

∣) (

H 1 = P T 1≤

0.3−η1



η1 (1−η1 ) n

)

For the particular value η1 = 0.35,

(

β(0.35) = P T 1≤



0.3−0.35 = P ( T 1 ≤−1.15 ) = 0.125 0.35(1−0.35) 120

)

> pnorm(-1.15,0,1) [1] 0.125

By using a computer, many more values η1 ≠ 0.35 can be considered to plot the power function ϕ(η) = P (Reject H 0) =

{

α(η) if p∈Θ0 1−β(η) if p ∈Θ1

# Sample and inference n = 120 alpha = 0.1 theta0 = 0.25 # Value under the null hypothesis H0 c = 0.3 theta1 = seq(from=0.25,to=1,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = 1 - pnorm((c-paramSpace)/sqrt(paramSpace*(1-paramSpace)/n),0,1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

This code generates the power function: 149

Solved Exercises and Problems of Statistical Inference

(b) Confidence interval Statistic: From a table of statistics (e.g. in [T]), the same statistic is selected T ( X ; η)=



d ̂ η−η → N (0,1) ?(1−?) n

where the symbol ? is substituted by the best information available. In testing hypotheses we were also studying the unknown quantity η, although it was provisionally supposed to be known under the hypotheses; for confidence intervals, we are not working under any hypothesis and η must be estimated in the denominator: T ( X ; η)=



d ̂ −η η → N (0,1) η(1− ̂ η) ̂ n

The interval is obtained with the same calculations as in previous exercises involving a Bernoulli population,

[

^ −r α/ 2 I 1−α = η





η(1− ^ η) ^ η(1− ^ η) ^ ^ r α/ 2 , η+ n n

]

where r α / 2 is the value of the standard normal distribution such that P( Z>r α/2 )=α / 2. By using •

n = 120.



Sample proportion:



90% → 1–α = 0.9 → α = 0.1 → α/2 = 0.05 →

^ η=

34 =0.283 . 120 r 0.05=l 0.95 =1.645 .

the particular interval (for these data) appears

[

I 0.9= 0.283−1.645





]

0.283 (1−0.283) 0.283 (1−0.283) , 0.283+1.645 =[ 0.215 , 0.351] 120 120

Thinking about the interval as an acceptance region, since η0=0.25 ∈ I the hypothesis that η may still be 0.25 is not rejected.

Conclusion: With confidence 90%, the proportion of births by mothers of over 30 years of age seems to be 0.25 at most. The same decision is still made by considering the confidence interval that would correspond to 150

Solved Exercises and Problems of Statistical Inference

a two-sided (nondirectional) test with the same confidence, that is, by allowing the new proportion to be different because it had severely increased or decreased. (Remember: statistical results depend on: the assumptions, the methods, the certainty and the data.) My notes:

Exercise 4pe-ci-ht A random quantity X is supposed to follow a distribution whose probability function is, for θ > 0,

{

θx

f (x ; θ) =

θ−1

0

if 0≤x ≤1 otherwise

A) Apply the method of the moments to find an estimator of the parameter θ. B) Apply the maximum likelihood method to find an estimator of the parameter θ. C) Use the estimators obtained to build others for the mean μ and the variance σ2. D) Let X = (X1,...,Xn) be a simple random sample. By applying the results involving Neyman-Pearson's lemma and the likelihood ratio, study the critical region for the following pairs of hypotheses.

{

H 0 : θ = θ0 H 1 : θ = θ1

{

H 0 : θ = θ0 H 1 : θ = θ1 >θ 0

{

{

H 0 : θ = θ0 H 1 : θ = θ 1θ 0

{

H 0 : θ ≥ θ0 H 1 : θ = θ1 64.625).

Search for a known distribution: Since we do not know the sampling distribution of S2, we cannot calculate this probability directly. Instead, just after reading 'sample quasivariance' we should think about the following theoretical result ( n−1)S 2 ( 25−1) S 2 2 T= ∼ χn −1 , or, in this case, T = ∼ χ225−1 , 2 2 55 cm σ

Rewriting the event: The event has to be rewritten by completing some terms until (the dimensionless statistic) T appears. Additionally, when the table of the χ 2 distribution gives lower-tail probabilities P(X ≤ x), it is necessary to consider the complementary event: 2

(

2

)

(25−1) S ( 25−1) 64.625 cm P (S > 64.625)=P > =P ( T > 28.2 )=1− P ( T ≤ 28.2 )=1−0.75=0.25 . 2 55 cm 55 cm2 2

In these calculations, one property of the transformations has been applied: multiplying or dividing by a positive quantity does not modify an inequality.

Conclusion: The probability of the event is 0.25. This means that S2 will sometimes take a value bigger than 64.625cm2, when evaluated at specific data x coming from the population distribution. My notes:

Exercise 2ae Let X be a random variable with probability function θ−1

f ( x ; θ) =

154

θx , x ∈[0,3] 3θ

Solved Exercises and Problems of Statistical Inference

such that E(X) = 3θ/(θ+1). Supposed a simple random sample X = (X1,...,Xn), apply the method of the moments to find an estimator θ̂ M of the parameter θ.

Discussion: This statement is mathematical. Although it is given, the expectation of X could be calculated as follows 3

[ ]

θ x θ−1 x θ +1 3θ+1 3θ θ θ μ1 (θ )=E ( X )=∫−∞ x f ( x ; θ)dx=∫0 x θ dx= θ = = 3 3 θ+1 0 3θ θ+ 1 θ+1 3

+∞

Method of the moments Population and sample centered moments: The first-order moments are μ1 (θ )=

3θ θ +1

and

m1 (x 1 , x 2 ,... , x n )=

1 n x = ¯x n ∑ j =1 j

System of equations: Since the parameter θ appears in the first-order moment of X, the first equation is sufficient to apply the method: 3θ 1 n x μ1 (θ )=m 1 (x 1 , x 2 ,... , x n ) → = ∑ j =1 x j= ¯x → 3 θ=θ ¯x + ¯x → θ(3−¯x )=¯x → θ= ¯ θ +1 n 3−¯x The estimator:

¯ X θ^ M = ¯ 3− X

My notes:

Exercise 3ae A poll of 1000 individuals, being a simple random sample, over the age of 65 years was taken to determine the percent of the population in this age group who had an Internet connection. It was found that 387 of the 1000 had one. Find a 95% confidence interval for η. (Taken from an exercise of Statistics, Spiegel and Stephens, Mc-Graw Hill)

Discussion: Asymptotic results can be applied for this large sample of a Bernoulli population. The cutoff age value determines the population of the statistical analysis, but it plays no other role. Both η and η^ are dimensionless.

Identification of the variable: Having the connection or not is a dichotomic situation; then X ≡ Connected (an individual)?

X ~ Bern(η)

(1) Pivot: We take into account that: • •

There is one Bernoulli population The sample size is big, n = 1000, so an asymptotic approximation can be applied

A statistic is selected from a table (e.g. in [T]):

155

Solved Exercises and Problems of Statistical Inference

T ( X ; η)=



d ̂ −η η → N (0,1) η(1− ̂ η) ̂ n

(2) Event rewriting:

(

1−α=P (l α / 2≤ T ( X ; η) ≤r α / 2 )=P −r α /2≤

( √ ( √

=P −r α /2



√ √

η ^ −η ≤+ r α / 2 η(1− ^ η ^) n

)



) ( )



η ̂ (1− η) ̂ η(1− ̂ η ̂) η ̂ (1− η ̂) η(1− ̂ η) ̂ ̂ −η ≤+ r α / 2 ̂ α/2 ̂ rα / 2 ≤η =P −η−r ≤ −η ≤−η+ n n n n

̂ +r +α/2 =P η

)

η ̂ (1− η) ̂ η(1− ̂ η ̂) ̂ α/ 2 ≥ η≥ η−r n n

(3) The interval: Then,

[

̂ −r +α/ 2 I 1−α = η





η(1− ̂ η) ̂ η(1− ̂ η) ̂ ̂ r +α / 2 , η+ n n

]

where r α / 2 is the value of the standard normal distribution verifying P( Z> r α /2 )=α /2.

Substitution: We need to calculate the quantities involved in the previous formula, •

n = 1000



Theoretical (simple random) sample: X1,...,X1000 s.r.s. (each value is 1 or 0) Empirical sample: x1,...,x1000 →



1000

∑j =1 x j=387



95% → 1–α = 0.95 → α = 0.05 → α/2 = 0.025 →

1000 1 387 x j= =0.387 ∑ j=1 1000 1000 r α/ 2=1.96

^ η=

Finally,

[

I 0.95= 0.387−1.96





]

0.387 (1−0.387) 0.387 (1−0.387) , 0.387+ 1.96 =[0.357 , 0.417 ] 1000 1000

Conclusion: The unknown proportion of individuals over the age of 65 years with Internet connection is inside the range [0.357, 0.417] with a probability of 0.95, and outside the interval with a probability of 0.05. Perhaps a 0-to-100 scale facilitates the interpretation: the percent of individuals is in [35.7%, 41.7%] with 95% confidence. Proportions and probabilities are always dimensionless quantities, though expressed in percent. My notes:

Exercise 4ae A company is interested in studying its clients' behaviour. For this purpose, the mean time between consecutive demands of service is modelized by a random variable whose density function is: 1 − x−2 f ( x ; θ)= θ e θ , x≥2, (θ>0) The estimator provided by the method of the moments is θ^ M = X¯ −2 .

156

Solved Exercises and Problems of Statistical Inference

1st Is it an unbiased estimator of the parameter? Why? 2nd Calculate its mean square error. Is it a consistent estimator of the parameter? Why? Note: E(X) = θ + 2 and Var(X) = θ2

Discussion: The two sections are based on the calculation of the mean and the variance of the estimator given in the statement. Then, the formulas of the bias and the mean square error must be used. Finally, the limit of the mean square error is studied.

Mean and variance of θ^ M ¯ −2)=E ( X¯ )−E (2)= E ( X )−2=θ+2−2=θ E ( θ^ M )= E ( X 2 2 Var ( X ) ¯ −2)=Var ( X ¯ )−Var (2)= Var ( θ^ M )=Var ( X −0= σ = θ n n n

Unbiasedness: The estimator is unbiased, as the expression of the mean shows. Alternatively, we calculate the bias

b ( θ^ M )= E(θ^ M )−θ=θ−θ=0

Mean square error and consistency: The: 2

2

MSE ( θ^ M ) = b ( θ^ M )2 +Var (θ^ M ) = 02 + θ = θ n n

The population variance θ2 does not depend on the sample, particularly on the sample size n. Then, 2

lim n →∞ MSE ( θ^ M ) = lim n →∞ θ = 0 n Note: In general, the population variance can be finite or infinite (for some “strange” probability distributions we do not consider in

^ ) exists, in the sense that they are infinite; in this subject). If the variance is infinite, σ 2 = ∞, neither Var ( θ^ M ) not MSE ( θ M this particular exercise it is finite, θ2 < ∞. In the former case, the mean square error would not exist and the consistency (in probability) could not be studied by using this way. In the latter case, the mean square error would exist and tend to zero (consistency in the mean-square sense), which is sufficient for the estimator of θ to be consistent (in probability).

Conclusion: The calculations of the mean and the variance are quite easy. They show that the estimator is unbiased and, if the variance is finite, consistent.

Advanced Theory: The If E(X) had not been given in the statement, it could have been calculated by applying integration by parts (since polynomials and exponentials are functions “of different type”): +∞



E ( X )=∫−∞ x f ( x ;θ)dx=∫2

[

= −x e That



x−2 θ

x−2 ∞ − θ 2

−θ e



u=x

has been used with

u ' =1



1 v '= θ e

x−2 2 − θ ∞

] =[( x +θ)e ] =2+θ .

∫ u (x )⋅v ' (x )dx=u (x )⋅v (x )−∫ u ' (x)⋅v (x )dx •

[

x−2 x−2 − − 1 − x−2 x e θ dx= −x e θ −∫ 1⋅(−e θ )dx θ

x−2 − θ



x−2 − 1 − x−2 v=∫ θ e θ dx=−e θ

On the other hand, ex changes faster than xk for any k. To calculate E(X2):

157

Solved Exercises and Problems of Statistical Inference

]

∞ 2

+∞



E ( X )=∫−∞ x f (x ;θ)dx=∫2 2

[

2



= x2 e

x−2 2 θ ∞

] +2 θ∫

∞ 2

u=x 2 →

u ' =2 x



1 − x−3 v '= θ e θ





]

2

1 − x−2 x θ e θ dx=(2 2−0)+2 θ μ=4+ 2 θ(2+θ)=2θ 2 +4 θ +4 .

Again, integration by parts has been applied: •

[

x−2 x−2 − − 1 − x−2 x θ e θ dx= −x 2 e θ +2∫ x e θ dx 2

∫ u (x )⋅v ' (x )dx=u (x )⋅v (x )−∫ u ' (x)⋅v (x )dx

with

x−3 − 1 − x−3 v=∫ θ e θ dx=−e θ

Again, ex changes faster than xk for any k. Finally, the variance is σ 2=E ( X 2)−E ( X )2=2 θ2 + 4 θ+4−(θ+2) 2=2 θ 2+4 θ+4−θ2−4 θ−4=θ 2 . Regarding the original probability distribution: (i) the expression reminds us the exponential distribution; (ii) the term x–2 suggests a translation; and (iii) the variance θ 2 is the same as the variance of the exponential distribution. After translating all possible values x, the mean is also translated but the variance is not. Thus, the distribution of the statement is a translation of the exponential distribution, which has two equivalent notations

Equivalently, when θ = λ–1,

My notes:

Exercise 5ae Is There Intelligent Life on Other Planets? In a 1997 Marist Institute survey of 935 randomly selected Americans, 60% of the sample answered “yes” to the question “Do you think there is intelligent life on other planets?” (http://maristpoll.marist.edu/tag/mipo/). Let's use this sample estimate to calculate a 90% confidence interval for the proportion of all Americans who believe there is intelligent life on other planets. What are the margin of error and the length of the interval? (From Mind on Statistics. Utts, J.M., and R.F. Heckard. Thomson)

LINGUISTIC NOTE (From: Common Errors in English Usage. Brians, P. William, James & Co.) American. Many Canadians and Latin Americans are understandably irritated when U.S. citizens refer to themselves simply as “Americans.” Canadians (and only Canadians) use the term “North American” to include themselves in a two-member group with their neighbor to the south, though geographers usually include Mexico in North America. When addressing and international audience composed largely of people from the Americas, it is wise to consider their sensitivities. However, it is pointless to try to ban this usage in all contexts. Outside of the Americas, “American” is universally understood to refer to things relating to the U.S. There is no good substitute. Brazilians, Argentineans, and Canadians all have unique terms to refer to themselves. None of them refer routinely to themselves as “Americans” outside of contexts like the “Organization of American States.” Frank Lloyd Wright promoted “Usonian,” but in never caught on. For better or worse, “American” is standard English for “citizen or resident of the United States of America.” LINGUISTIC NOTE (From: Wikipedia.) American (word). The meaning of the word American in the English language varies according to the historical, geographical, and political context in which it is used. American is derived from America, a term originally denoting all of the New World (also called the Americas). In some expressions, it retains this Pan-American sense, but its usage has evolved over time and, for various historical reasons, the word came to denote people or things specifically from the United States of America.

158

Solved Exercises and Problems of Statistical Inference

In modern English, Americans generally refers to residents of the United States; among native English speakers this usage is almost universal, with any other use of the term requiring specification. [1] However, this default use has been the source of controversy, [2][3] particularly among Latin Americans, who feel that using the term solely for the United States misappropriates it. They argue instead that "American" should denote persons or things from anywhere in North, Central or South America, not just the United States, which is only a part of North America.

Discussion: There are several complementary pieces of information in the statement that help us to identify the distribution of the population variable X (Bernoulli distribution) and select the proper statistic T: (a) The meaning of the question—for each item there are two possible values: “yes” or “no”. (b) The value 60% suggests that this is a proportion expressed in percent. (c) The words Let's use this sample estimate and confidence interval for the proportion. Thus, we must construct a confidence interval for the proportion η (a percent is a proportion expressed in a 0-to-100 scale) of one Bernoulli population. The sample information available consists of two data: the sample ^ . The relation between these quantities is the following: size n = 935 and the sample proportion η=0.6 n

∑ j=1 X i # 1 ' s 1 n η= ^ X = = ∑ n j =1 i n n

. ( = # Yeses n )

Although it is not necessary, we could calculate the number of ones: #1's ^ # 1' s=935⋅0.6=561 → 0.6=η= 935 ^ Now, if we had not realized that 0.6 was the sample proportion, we would do η=

561 =0.6 . 935

Identification of the variable: X ≡ Answered with “yes” (one American)?

X ~ B(η)

Confidence interval For this kind of population and amount of data, we use the statistic: T ( X ; η)=



d ^ η−η → N (0,1) ?(1−? ) n

^ For confidence intervals η is unknown and no value is supposed, and where ? is substituted by η or η. hence it is estimated through the sample proportion. By applying the method of the pivot:

(

1−α=P (l α/ 2≤ T ( X ; η) ≤r α/ 2 )=P −r α /2≤

( √ ( √

=P −r α/2

√ √



η ^ −η ≤+ r α / 2 η(1− ^ η ^) n

) ( )

)





η ^ (1− η) ^ η(1− ^ η ^) η ^ (1− η ^) η( ^ 1−η) ^ ^ −η ≤+ r α / 2 ^ α /2 ^ rα/ 2 ≤η =P −η−r ≤ −η ≤−η+ n n n n

^ +r +α/2 =P η

η ^ (1− η) ^ η(1− ^ η ^) ^ α/ 2 ≥ η≥ η−r n n

159

Solved Exercises and Problems of Statistical Inference

)

Then, the interval is

[

^ −r +α/ 2 I 1−α = η





η(1− ^ η) ^ η(1− ^ η) ^ ^ r +α / 2 , η+ n n

]

Substitution: We calculate the quantities in the formula, • • •

n = 935

η=0.6 ^ 90% → 1–α = 0.90 → α = 0.10 → α/2 = 0.05 →

r α /2=r 0.05=l 0.95=1.645

So

[

I 0.99= 0.6−1.645



]



0.6 (1−0.6) 0.6(1−0.6) , 0.6+ 1.645 =[0.574 , 0.626 ] 935 935

Margin of error and length To calculate the previous endpoints we had calculated the margin of error, which is E = r + α/ 2





η(1− ^ η) ^ 0.6 (1−0.6) = 1.645 =0.0264 n 935

The length is twice the margin of error L=2⋅E=2⋅0.0264=0.0527 In general, even if the T follows and asymmetric distribution and we do not talk about margin of error, the length can always be calculated through the difference between the upper and the lower endpoints: L=0.626−0.574=0.052

Conclusion: Since the population proportion is in the interval (0,1) by definition, the values obtained seem reasonable. Both endpoints are over 0.5, which means that most US citizens think there is intelligent life on other planets. With a confidence of 0.90, measured in a 0-to-1 scale, the value of η will be in the interval obtained. As regards the methodology applied, 90% times in average it provides a right interval. Nonetheless, frequently we do not know the real η and therefore we will never know if the method has failed or not. My notes:

Exercise 6ae It is desired to know the proportion η of female students at university. To that end, a simple random sample of n students is to be gathered. Obtain the estimators η^ M and η^ ML for that proportion, by applying the method of the moments and the maximum likelihood method.

Discussion: This statement is mathematical, really. Although it is given in the statement, the expectation of X could be calculated as follows 1

x

1− x

μ1 (η)=E ( X )=∑ Ω x f (x ; θ)=∑ x=0 x η (1−η)

160

=0⋅1⋅(1−η)+ 1⋅η⋅1=η

Solved Exercises and Problems of Statistical Inference

Method of the moments Population and sample centered moments: The probability distribution has one parameter. The first-order moments are 1 n and μ1 (η)=E ( X )=η m1 ( x 1 , x 2 ,... , x n )= ∑ j =1 x j= ¯x n System of equations: Since the parameter η appears in the first-order moment of X, the first equation is sufficient to apply the method: 1 n μ1 (η)=m1 ( x1 , x 2 , ... , x n ) → η= ∑ j =1 x j= ¯x n The estimator: ¯ η^ M = X

Maximum likelihood method Likelihood function: For the distribution the mass function is f ( x ; η)=η x (1−η)1−x . n

n

∑ j=1 x j

L( x 1 , x 2 , ... , x n ; η)=∏ j=1 f (x j ; η)=ηx (1−η)1−x ⋯ ηx (1−η)1− x =η 1

1

n

n

n

(1−η)

n−∑ j=1 x j

Optimization problem: The logarithm function is applied to facilitate the calculations, n

∑ j=1 x j

log[ L( x 1 , x 2 , ... , x n ; η)]=log [η

n

n−∑ j=1 x j

]+ log [(1−η)

n

n

]=( ∑ j=1 x j )log (η)+( n−∑ j =1 x j)log (1−η).

To find the local or relative extreme values, the necessary condition is n

n n d 1 −1 0= log [ L( x 1 , x 2 , ... , x n ; η)]=( ∑ j=1 x j ) +(n−∑ j =1 x j) dη 1−η η



n−∑ j =1 x j 1−η

n

=

∑ j=1 x j η

n

n

n

n

n

→ η n−η∑ j =1 x j=∑ j=1 x j−η ∑ j=1 x j → η n=∑ j =1 x j → η0=

∑ j=1 x j n

= ¯x

To verify that the only candidate is a local or relative maximum, the sufficient condition is n

n

n n ∑ j=1 x j n−∑ j =1 x j d2 −1 −1 log [ L ( x , x ,... , x ; η)]=( x ) −(n− x ) (−1)=− − 1 - pchisq(3.75, 10-1) [1] 0.9270832

By using a computer, many other values σ12 ≠ 2 can be considered so as to numerically determine the power of the test curve 1–β(σ12) and to plot the power function. ϕ(σ 2 ) = P ( Reject H 0) =

{

α( σ2 ) if σ ∈Θ0 1−β(σ 2) if σ∈Θ1

# Sample and inference n = 10 alpha = 0.05 theta0 = 2.25 # Value under the null hypothesis H0 q = qchisq(alpha,n-1) theta1 = seq(from=0,to=2.25,0.01) paramSpace = sort(unique(c(theta1,theta0))) PowerFunction = pchisq(q*theta0/paramSpace, n-1) plot(paramSpace, PowerFunction, xlab='Theta', ylab='Probability of rejecting theta0', main='Power Function', type='l')

Conclusion: With a confidence of 0.95, measured in a 0-to-1 scale, the real value of σ 2 will be smaller than 2.25mm2, that is, the quality of the product will be appropriate. In average, the method we are applying provides a right decision 95% times; however, since frequently we do not known the true value of σ 2 we never know whether the decision is true or not. My notes:

Exercise 9ae If 132 of 200 male voters and 90 of 159 female voters favor a certain cantidate running for governor of 164

Solved Exercises and Problems of Statistical Inference

Illinois, find a 99% confidence interval for the difference between the actual proportions of male and female voters who favor the candidate. (From: Mathematical Statistics with Applications. Miller, I., and M. Miller. Pearson.)

Discussion: There are two independent Bernoulli populations whose proportions must be compared (populations will not be independent if, for example, males and females would have been selected from the same couples or families). The value 1 has been used to count the number of voters who favor the candidate. The method of the pivot will be used.

Identification of the variable: Favoring or not is a dichotomic situation, M ≡ Favoring the candidate

M ~ Bern(ηM)

F ≡ Favoring the candidate

F ~ Bern(ηF)

(1) Pivot: We take into account that: There are two independent Bernoulli populations Both sample sizes are large, so an asymptotic approximation can be applied

• •

From a table of statistics (e.g. in [T]), the following pivot is selected T ( M , F ; ηM , η F )=

^ M −η ^ F )−(ηM −ηF ) (η



η ^ M (1−η^ M ) η ^ (1− η ^F) + F nM nF

d

→ N (0,1)

(2) Event rewriting:

(

1−α=P (l α/ 2≤ T (M , F ; ηM , ηF )≤r α/ 2)≈ P −r α/2≤

( √

=P −r α/2

( η^ M −η^ F )−( ηM −ηF )



^ M ) η^ F (1− η^ F ) η^ M (1− η + nM nF

≤+ r α / 2



)

^ M (1−η^ M ) η ^ F (1− η ^F) ^ F (1− η ^ F) η η^ M ( 1−η^ M ) η + ≤ ( η^ M −η^ F )−(ηM −ηF )≤+ r α/ 2 + nM nF nM nF

(

=P −( η^ M −η^ F )−r α /2

(

=P ( η^ M − η ^ F )+r α/2





≤−(ηM −ηF ) ≤−( η^ M −η^ F )+ r α / 2

)

√ )



^ M (1− η^ M ) η ^ (1−η^ F ) ^ (1−η^ M ) η ^ (1− η ^F) η η + F ≥ ηM −ηF ≥( η^ M −η^ F )−r α/ 2 M + F nM nF nM nF

(3) The interval:

[

I 1−α = ( η^ M − η ^ F )−r α / 2





^ (1− η ^F) ^ M (1−η^ M ) η ^ (1− η ^F) η^ M (1−η^ M ) η η + F , (η ^ M −η ^ F )+r α /2 + F nM nF nM nF

where r α / 2 is the value of the standard normal distribution such that P( Z>r α/2 )=α / 2.

Substitution: We need to calculate the quantities involved in the previous formula, •

nM = 200 and nF = 159.



Theoretical (simple random) sample: M1,...,M200 s.r.s. (each value is 1 or 0).

165

Solved Exercises and Problems of Statistical Inference

]

)

Empirical sample: m1,...,m200 →

200

∑j =1 m j=132



η^ M =

200 1 132 m j= =0.66 . ∑ j=1 200 200

Theoretical (simple random) sample: F1,...,F159 s.r.s. (each value is 1 or 0) Empirical sample: f1,...,f159 → •

159

∑j =1 f j=90



η^ F =

99% → 1–α = 0.99 → α = 0.01 → α/2 = 0.005 →

159 1 90 f j= =0.56 . ∑ j=1 159 159 r α/ 2=2.576 .

Then, I 0.99=(0.66−0.56)∓2.576



0.66( 1−0.66) 0.56 (1−0.56) + =[−0.03906 , 0.2270 ] 200 159

Conclusion: The case ηM = ηF cannot formally be excluded when the decision is made with 99% confidence. Since η ∈( 0,1), any “reasonable” estimator of η should provide values in this range or close to it; but because of the natural uncertainty of the sampling process (randomness and variability), in this case the smallest endpoint of the interval was –0.03906, which can be interpreted as being 0. When an interval of high confidence is far from 0, the case η M = ηF can clearly be rejected. Finally, it is important to notice that a confidence interval can be used to make decisions about hypotheses on the parameter values. My notes:

Exercise 10ae For two Bernoulli populations with the same parameter, prove that the pooled sample proportion is an unbiased estimator of the population proportion. For two normal populations, prove that the pooled sample variance is an unbiased estimator of the population variance.

Discussion: It is necessary to calculate the expectation of the pooled sample proportion by using its expression and the basic properties of the mean. Alternatively, the most general pooled sample variance can be used. For Bernoulli populations, the mean and the variance can be written as μ=η and σ 2=η(1−η) .

Mean of η^ p : This estimator can be used when η X =η=ηY . On the other hand, E ( η^ )= E( X )=η. E ( η^ p )=E

(

nX η ^ X + nY η^ Y 1 1 = nX E ( η ^ X )+ nY E ( η ^ Y ) ]= ( n + n ) E ( X )=η [ n X + nY n X + nY n X + nY X Y

)

^ p ) =E ( η^ p )−η=η−η=0 . Then, the bias is b ( η

Mean of S 2p : This estimator can be used when σ 2X =σ=σ2Y . On the other hand, E ( S 2 )=σ2 . 2

E (S p) = E

(

(n X −1)S 2X +(nY −1)S Y2 (n X −1) E ( S 2X )+(n Y −1) E (S Y2 ) n X −1+ nY −1 2 2 = = σ =σ n X + n y −2 n X +n y −2 n X +n y −2

)

Then, the bias is b (S 2p )= E ( S 2p ) −σ 2=σ 2−σ 2=0 . My notes:

166

Solved Exercises and Problems of Statistical Inference

Exercise 11ae A research worker wants to determine the average time it takes a mechanic to rotate the tires of a car, and she wants to be able to assert with 95% confidence that the mean of her sample is off by at most 0.50 minute. If she can presume from past experience that σ = 1.6 minutes (min), how large a sample will she have to take? (From Probability and Statistics for Engineers. Johnson, R. Pearson Prentice Hall.)

Discussion: In calculating the minimum sample size, the only case we consider (in our subject) is that of one normal population with known standard deviation. Thus, we can suppose that this is the distribution of X.

Identification of the variable: X ≡ Time (of one rotation)

X ~ N(μ, σ=1.6min)

Sample information: Theoretical (simple random) sample: X1,..., Xn s.r.s. (the time measurement of n rotations will be considered)

Margin of error: We need the expression of the margin of error. If we do not remember it, we can apply the method of the pivot to take the expression from the formula of the interval.

[

√ ]



2 2 ¯ −r α / 2 σ , X ¯ + rα/ 2 σ I 1−α = X n n

If we remembered the expression, we can use it. Either way, the margin of error (for one normal population with known variance) is: 2 E=r α / 2 σ n



Sample size

Method based on the confidence interval: We want the margin of error E to be smaller or equal than the given Eg,



2 2 2 1.6 min 2 2 2 =6.2722=39.3 → n≥40 E g≥ E=r α/2 σ → E g≥r α / 2 σ → n≥ z α /2 σ = 1.96 n n Eg 0.50 min

(

) (

)

since r α/ 2=r 0.05 /2=r 0.025 =l 0.975 =1.96 . (The inequality does not change neither when multiplying or dividing by positive quantities nor squaring.)

Conclusion: At least n = 40 data are necessary to guarantee that the margin of error is 0.50min at most. Any number of data larger than n would guarantee—and go beyond—the precision desired. (This margin can be ^ | will be thought of as “the maximum error in probability”, in the sense that the distance or error |θ−θ smaller that E with a probability of 1–α = 0.95, but larger with a probability of α = 0.05.) My notes:

Exercise 12ae To estimate the average tree height of a forest, a simple random sample with 20 elements is considered, 167

Solved Exercises and Problems of Statistical Inference

yielding ¯x =14.70 u and S =6.34 u where u denotes a unit of length and S 2 is the sample quasivariance. If the population variable height is supposed to follow a normal distribution, find a 95 percent confidence interval. What is the margin of error?

Discussion: In this exercise, the supposition that the normal distribution reasonably explains the variable height should be evaluated by using proper statistical techniques. To build the interval and find the margin of error, the method of the pivotal quantity will be applied.

(1) Pivot: From the information in the statement, we know that: • • •

The variable follows a normal distribution The population variance σ2 is unknown, so it must be estimated The sample size is n = 20 (asymptotic results cannot be considered)

To apply this method, we need a statistic with known distribution, easy to manage and involving μ. From a table of statistics (e.g. in [T]), we select T ( X , μ)=

¯ −μ X



S2 n

∼ t n−1

where X =( X 1 , X 2 , ... , X n) is a simple random sample, S2 is the sample quasivariance and tκ denotes the t distribution with κ degrees of freedom.

(2) Event rewriting: The interval is built as follows.

(

1−α=P (l α/ 2≤ T ( X ;μ ) ≤r α/ 2)= P −r α/ 2≤

¯ −r α /2 =P (− X



¯ −μ X



S2 n

)

≤+ r α/ 2 =P (−r α / 2







S2 ¯ S2 ≤ X −μ ≤+r α/ 2 ) n n





2 2 2 S2 ¯ + r α/ 2 S )=P ( X ¯ + r α/ 2 S ≥ μ ≥ X¯ −r α/ 2 S ) ≤−μ ≤− X n n n n

(3) The interval:

[



√ ]

2 2 ¯ −r α/2 S , X ¯ + r α/ 2 S I= X n n



2 ¯ ∓r α/ 2 S = X n 1−α

Note: We have simplified the notation, but it is important to notice that the quantities rα/2 and S depend on the sample size n.

To use this general formula with the specific data we have, the quantiles of the t distribution with κ = n–1 = 20–1 = 19 degrees of freedom are necessary 95% → 0.95 = 1–α → α = 0.05 In the table of the t distribution, we must search the quantile provided for the probability p = 1–α/2 = 0.975 in a lower-tail probability table, or p = α/2 = 0.025 in an upper-tail probability table; if a two-tailed table is used, the quantile given for p = 1–α = 0.950 must be used. Whichever the table used, the quantile is 2.093. Finally, I 0=¯x ∓r 0.05 /2



s2 u =14.70 u∓2.093 6.34 =14.70 u∓2.97 u=[11.73 u , 17.67 u] 20 √ 20

By applying the definition of the margin of error, 168

Solved Exercises and Problems of Statistical Inference

E=r α/ 2



S2 u =2.093 6.34 =2.97 u n √ 20

Conclusion: With 95% confidence we can say that the mean tree height is in the interval obtained. The margin of error, which is expressed in the same unit of measure as the data, can be thought of as the maximum distance—when the interval contains the true value—from the real unknown mean and the middle point of the interval, that is, “the maximum error in probability”. My notes:

169

Solved Exercises and Problems of Statistical Inference

Appendixes [Ap] Probability Theory Remark 1pt: The probability is a measure in a 0-to-1 scale of the chances with which an event involving a random quantity occurs; alternatively, it can be interpreted as the proportion of times it occurs when many values of the random quantity are considered repeatedly and independently. For example, an event of confidence 1–α = 0.95 can be considered in two equivalent ways: (i) that a measure of its occurring, within [0,1], takes the value 0.95; or (ii) that when the experiment is independently repeated many times, the event will occur more or less 95% of the times. Once values for random quantities are determined, the event will have occurred or not, but no probability is involved any more.

Some Reminders ● Markov's Inequality. Chebyshev's Inequality. For any (real) random variable X, any (real) function h(x) taking nonnegative values, and any (real) positive a > 0, E(h(X ))=∫Ω h ( x) dP=∫{h (X )
a)=0.1

(f ) X ∼ χ216 ,

P ( X ≤ a ) =0.025

(h) X ∼ F10 , 8 ,

(i) X ∼ F 15, 6 , P ( X > a )=0.01

( j) X ∼ t 12 ,

P( X ≥3.5)

P( X > 5.81) P ( { X ≤ 1.356 }∪{X > 3.055 })

Discussion: Several distributions, discrete and continuous, are involved in this exercise. Different ways can be considered to find the answers: the probability function f(x), the probability tables or a statistical software program. Sometimes events need to be rewritten or decomposed. For discrete distributions, tables can contain either individual {X=x} or cumulative {X≤x} (or {X>x}) probabilities; for continuous distributions, only cumulative probabilities.

(a) The parameter value is λ = 2.7, and for the Poisson distribution the possible values are always 0, 1, 2... If the table provides cumulative probabilities of the form P(X≤x), P (1≤ X 3.5) .

(e) Here the parameter values are μ = –1 and σ2 = 4, and the value of a normally distributed random variable can always be any real number. Because of the standardization, a table with probabilities and quantiles for the standard normal distribution suffices. In using this table, we must pay attention to the form of the events whose probabilities are provided 174

Solved Exercises and Problems of Statistical Inference

P ( X > −4.4) = P

(

X −μ −4.4−μ −4.4−(−1) > = P Z> = P ( Z >−1.7 )=P (Z −4.4)=∫−4.4 f ( x)dx =∫−4.4

− 1 e √2 π σ2

( x−μ) 2 2σ

dx = ?

we are not able to find an antiderivative of f(x)... because it does not exist. Then, we may remember that the antiderivative of e−x does not exist and that the definite integral of f(x) can be solved exactly only for some limits of integration but it can always be solved numerically. On the other hand, by using the statistical software program R, whose function contains cumulative probabilities for events of the form {X 1 - pnorm(-4.4, -1, sqrt(4)) [1] 0.9554345 To plot the probability function values = seq(-10, +10, length=100) probabilities = dnorm(values, -1, 2) plot(values, probabilities, type="l", lwd=2, ylim=c(0,1), xlab="Value", ylab="Probability", main="N(-1, sd=2)")

(f) The parameter value is κ = 16. The set of possible values is always composed of all positive real numbers. Most tables of the chi-square distribution provide the probability of events of the form P(X>x). In this case, it is necessary to consider the complementary event before looking for the quantile:

P ( X ≤a)=0.025



P ( X >a)=1−0.025=0.975



a = 6.91

We do not use the density function, as it is too complex. By using the statistical software program R, whose function gives quantiles for events of the form {X qchisq(0.025, 16) [1] 6.907664 To plot the probability function values = seq(0, +40, length(100)) probabilities = dchisq(values, 16) plot(values, probabilities, type="l", lwd=2, ylim=c(0,1), xlab="Value", ylab="Probability", main="Chi-Sq(16)")

(g) Now the parameter value is κ = 27. A variable enjoying the t distribution can take any real value. Most tables of this distribution provide the probability of events of the form P(X>x). In this case, it is not necessary to rewrite the event: P ( X > a)=0.1 → a = 1.314 The density function is too complex to be used. The statistical software program R allows doing (the function provides quantiles for events of the form {X qt(1-0.1, 27) [1] 1.313703 To plot the probability function values = seq(-5, +5, length=100) probabilities = dt(values, 27) plot(values, probabilities, type="l", lwd=2, ylim=c(0,1), xlab="Value", ylab="Probability", main="t(27)")

(h) The parameter values for this F distribution are κ1 = 10 and κ2 = 8. The possible values are always all positive real numbers. Again, most tables of this distribution provide the probability for events of the form {X>x}, so: P ( X >5.81)=0.01 The density function is also complex. Finally, by using the computer, > 1 - pf(5.81, 10, 8) [1] 0.01002326 To plot the probability function values = seq(0, 10, length=100) probabilities = df(values, 10, 8) plot(values, probabilities, type="l", lwd=2, ylim=c(0,1), xlab="Value", ylab="Probability", main="F(10,8)")

(i) Now, the parameter values are κ1 = 15 and κ2 = 6. Then: P ( X > a)=0.01



a = 7.56

The density function is also complex. By again using the computer, > qf(1-0.01, 15, 6) [1] 7.558994 To plot the probability function values = seq(0, 10, length=100) probabilities = df(values, 15, 6) plot(values, probabilities, type="l", lwd=2, ylim=c(0,1), xlab="Value", ylab="Probability", main="F(15, 6)")

(j) Since the parameter value is κ = 12, after decomposing the event into two disjoint tails P ({ X ≤1.356 }∪{ X >3.055 })=P ({ X ≤1.356 })+P ({ X >3.055 }) =1− P ({ X > 1.356})+P ({ X >3.055 })=1−0.1+0.005=0.905 The density function is also complex. Finally,

176

Solved Exercises and Problems of Statistical Inference

> pt(1.356, 12) + 1 - pt(3.055, 12) [1] 0.9049621 To plot the probability function values = seq(-10, +10, length=100) probabilities = dt(values, 12) plot(values, probabilities, type="l", lwd=2, ylim=c(0,1), xlab="Value", ylab="Probability", main="t(12)")

My notes:

Exercise 2pt Weekly maintenance costs (measured in dollars, $) for a certain factory, recorded over a long period of time and adjusted for inflation, tend to have an approximately normal distribution with an average of $420 and a standard deviation of $30. If $450 is budgeted for next week, what is an approximate probability that this budgeted figure will be exceeded? (Taken from Mathematical Statistics with Applications. W. Mendenhall, D.D. Wackerly and R.L. Scheaffer. Duxbury Press)

Discussion: We need to extract the mathematical information from the statement. There is a quantity, the weekly maintenance costs, say C, that can be assumed to follow the distribution C ∼ N (μ=$ 420, σ=$ 30 )

or, in terms of the variance, C ∼ N (μ=$ 420 , σ 2=$ 2 302=$ 2 900 )

(In practice, this supposition should be evaluated.) We are asked for the probability P (C > 450) . Since C does not follow a standard normal distribution, we standardize both sides of the inequality, by using 2 2 2 μ=E (C )=$ 420 and σ =Var (C )=$ 30 , to be able to use the table of the standard normal distribution: P (C > 450) = P

450−μ $ 450−$ 420 30 > =P T> = P (T > ) ( C−μ ) ( √ $ 30 ) 30 √σ √σ 2

2

2

2

= P (T > 1 ) = 1−P (T ≤ 1 ) = 1−0.8413 = 0.1587

My notes:

Exercise 3pt (*) Find the first two raw (or crude) moments of a random variable X when it enjoys: (1) (2) (3) (4) (5) (6)

The Bernoulli distribution The binomial distribution The geometric distribution The Poisson distribution The exponential distribution The normal distribution 177

Solved Exercises and Problems of Statistical Inference

Use the following concepts to do the calculations in several ways: (i) their definition; (ii) the probability generating function; (iii) the moment generating function; (iv) the characteristic function; or (v) others. Then, find the mean and the variance of X.

Discussion: Different methods can be applied to calculate the first two moments. We have practiced as many of them as possible, both to learn as much as possible and to compare their difficulty; besides, some of them are more powerful that others. Some of these calculations are advanced. To work with characteristic functions, the definitions and rules of the analysis for complex functions of a real variable must be considered, and even some calculations may be easier if we work with the theory for complex functions of a complex variable. Most of these definitions and rules are “natural generalizations” of those of real analysis, but we must be careful not to apply them without the necessary justification.

(1) The Bernoulli distribution By applying the definitions 1

E(X )=∑x=0 x ηx (1−η)1− x =0⋅1⋅(1−η)+1⋅η⋅1=η 1

E( X 2 )=∑x=0 x 2 ηx ( 1−η)1− x =02⋅η0⋅(1−η)+12⋅η1⋅1=η By using the probability generating function 1

G(t)=E(t X )=∑ x=0 t x η x (1−η)1−x =t 0⋅1⋅(1−η)+ t 1⋅η⋅1=1−η+ ηt This function exists for any t. Now, the usual definitions and rules of the mathematical analysis for real functions of a real variable imply that

E(X )=G(1) ( 1)= [ η ]t=1=η 2

(2)

E( X )=G (1)+ E(X )=[ 0 ]t =1+ η=η By using the moment generating function tX

1

tx

x

1− x

M (t )=E(e )=∑ x=0 e η (1−η)

t⋅0

t⋅1

=e ⋅1⋅(1−η)+ e ⋅η⋅1=1−η+ηe

t

This function exists for any real t. Because of the mathematical real analysis, E(X )=M (1) ( 0)=[ ηe t ]t =0=η

E( X 2 )=M (2) (0)= [ ηe t ]t =0=η By using the characteristic function itX

1

itx

x

1−x

φ (t)=E( e )=∑ x=0 e η (1−η)

i⋅t⋅0

i⋅t⋅1

it

=e ⋅1⋅(1−η)+e ⋅η⋅1=1−η+ ηe

This complex function exists for any real t. Complex analysis is considered to do, E( X )=

φ(1) (0) [ ηe it i ]t =0 ηi = = =η i i i

φ(2) (0) [ ηe it i 2 ]t=0 ηi 2 E(X )= 2 = = 2 =η i i2 i 2

178

Solved Exercises and Problems of Statistical Inference

Mean and variance μ=E( X )=η σ2 =Var (X )=E(X 2 )−E(X )2=η−η2=η(1−η)

(2) The binomial distribution By applying the definitions κ E ( X )=∑ x=0 x κ η x (1−η)κ−x =? x

( )

κ E ( X 2)=∑ x=0 x 2 κ ηx (1−η)κ− x =? x

( )

κ

A possible way consists in writing X as the sum of κ independent Bernoulli variables: X =∑ j=0 Y j . κ

κ

E ( X )= E (∑ j=0 Y j )=∑ j=0 E( Y j )=κ⋅η

E ( X 2)=E

([ ∑

2

κ

Y j=0

j

] )=?

This way can also be used to calculate the variance easily, but not to calculate the second moment: κ

κ

σ 2=Var ( X )=Var ( ∑ x=0 Y i )=∑ x=0 Var (Y i )=κ⋅η(1−η) .

By using the probability generating function κ κ ηt x ηt G(t)=E(t X )=∑ x=0 t x κ ηx (1−η)κ−x =(1−η)κ ∑x=0 κ =(1−η) κ 1+ x x 1−η 1−η

( )(

( )

[ (

= (1−η) 1+

ηt 1−η

)

(

)

κ

κ

)] =( 1−η+ ηt)

κ

where the binomial theorem (see the appendixes of Mathematics) has been applied. Alternatively, this function can also be calculated by looking at X as a sum of Bernoulli variables Yj and applying a property for probability generating functions of a sum of independent random variables, κ

G(t)=[ GY (t )] =( 1−η+ηt )

κ

This function exists for any t. Again, complex analysis allows us to do E(X )=G (1) ( 1)=[ κ ( 1−η+ηt )κ−1 η]t=1=κ⋅1 κ−1⋅η=κ η

E(X 2 )=G(2) (1)+ E(X )=[ κ(κ−1) ( 1−η+ ηt ) κ−2 η2 ]t =1 + κ η=κ(κ−1)η2 +κ η=κ η(κ η−η+1) By using the moment generating function x

κ κ ηe t ηe t M (t)=E(e tX )=∑ x=0 e tx κ η x (1−η) κ−x =(1−η)κ ∑ x=0 κ =(1−η)κ 1+ x x 1−η 1−η

( )(

( )

[

(

= (1−η) 1+

ηe t 1−η

)

(

)

κ

κ

)]

=( 1−η+ ηe t )κ

Again, it is also possible to look at X as a sum of Bernoulli variables Yj and apply a property for moment 179

Solved Exercises and Problems of Statistical Inference

generating functions of a sum of independent random variables, κ

M (t)=[ M Y (t ) ] =( 1−η+ ηe t )

κ

This function exists for any real t. Because of the mathematical real analysis,

[

E(X )=M (1) ( 0)= κ ( 1−η+ ηe t ) 2

κ−1

]

ηe t t =0=κ η

[

t κ−2

E( X )=M (0)= κ( κ−1) ( 1−η+ ηe (2)

) ( ηe t )2+ κ ( 1−η+ηe t )

κ−1

ηe

t

]

t=0

2

=κ(κ−1) η + κ η=κ η(κ η−η+1) By using the characteristic function κ

( )

[ (

κ

itx

ηeit = (1−η) 1+ 1−η

x

it it κ ηx (1−η)κ−x =(1−η)κ κ κ ηe =(1−η)κ 1+ ηe ∑ x=0 x x 1−η 1−η

φ (t)=E( e )=∑x=0 e itX

)]

( )(

)

(

κ

)

=( 1−η+ ηe it ) κ

Once more, by looking at X as a sum of Bernoulli variables Yj and applying a property for characteristic functions of a sum of independent random variables, κ

φ (t)=[ φ Y (t) ] =( 1−η+ ηeit )

κ

This complex function exists for any real t. Again, complex analysis is considered in doing, φ(1) (0) [ κ ( 1−η+ηe ) E(X )= = i i

it κ−1

ηeit i ]t=0

[

it φ(2) (0) κ( κ−1) ( 1−η+ηe ) E( X )= 2 = i 2

[ κ(κ−1) ( 1−η+ ηe ) =

it κ−2

=

it

κ−2

=

κ ηi =κ η i κ−1

(ηeit i)2 + κ ( 1−η+ ηe it )

]

ηeit i 2 t=0

i2

2

( ηe i) + κ ( 1−η+ηe i2

it κ−1

)

it 2

ηe i

]

t =0

=

[ κ(κ−1) η2 i2+ κ ηi2 ]t=0 i2

[ κ( κ−1)η2 i2+ κ ηi2 ]t=0 = κ η(κ η−η+ 1) i2 =κ η( κ η−η+1) i2

i2

Mean and variance μ=E( X )=κ η σ2 =Var (X )=E(X 2 )−E(X )2=η−η2=κ η(κ η−η+1)−( κ η)2=κ η(1−η)

(3) The geometric distribution By applying the definitions +∞

E ( X )=∑ x=1 x⋅η⋅(1−η)

x−1

=?

+∞

E ( X 2)=∑x=1 x 2⋅η⋅(1−η)x−1=? As an example, I include a way to calculate E(X) that I found. To prove that any moment of order r is finite or, equivalently, that the series is (absolutely) convergent, we apply the ratio test for nonnegative series: 180

Solved Exercises and Problems of Statistical Inference

a x+1 |(x +1)r⋅η⋅(1−η)x+1−1| (x +1)r =lim x →∞ =lim |1−η|=|1−η| < 1 x→∞ ax |x r⋅η⋅(1−η)x−1| xr

lim x→ ∞

Mathematically, the radius of convergence is 1, that is,

−1 < 1−η < +1

−2 < −η < 0



2 > η> 0



Probabilistically, the meaning of the variable η (in the geometric distribution, it is a probability between 0 and +∞ r x−1 1) implies that the series is convergent for any η. Either way, this implies that ∞ < ∑x=1 x ⋅η⋅(1−η) . Once the convergence has been proved, the rules of “the usual arithmetic for finite quantities” can be applied. The convergence of the series is crucial for the following calculations. +∞

+∞

E ( X )=∑ x=1 x⋅η⋅(1−η) x−1=η ∑ x=0 x⋅(1−η) x =η [ 1(1−η)1 + 2(1−η)2+⋯] =η =η

+∞

+∞

+∞

+∞

[ ∑x=0 (1−η)x+∑x=1 (1−η)x+⋯]=η[ ∑x=0 (1−η)x+(1−η)∑x=0 (1−η)x+⋯] [

+∞

]

∑x=0 (1−η)x ⋅[ 1+( 1−η)+⋯]=η

[

2

+∞

]

∑x=0 (1−η)x =η

(

2 1 1 2 1 =η η = η . 1−(1−η)

) ()

where the formula of the geometric sequence (see the appendixes of Mathematics) has been used. Alternatively, μ can be calculated by applying the formula available in literature for arithmetico-geometric series. By using the probability generating function X

+∞

x

x−1

G(t)=E(t )=∑ x=1 t ⋅η⋅(1−η)

+∞

x

=ηt ∑ x=0 [t (1−η)] =

ηt 1−(1−η)t

Given η, this function exists for t such that |t(1–η)| < 1 (otherwise, the series does not converge), as the following criterion shows x+1 a x+1 |t (1−η)| lim x→ ∞ =lim x →∞ =|t(1−η)| < 1 . x ax |t (1−η)| The definitions and rules of the mathematical analysis for real functions of a real variable, E(X )=G(1) (1)=

[

η[1−(1−η) t ]−ηt [−(1−η)] [1−(1−η) t ]2

E( X 2 )=G(2) (1)+ E( X )=

[

] [ ] [ =

t=1

η2 [1−(1−η)t ](1−η) [1−(1−η)t ]4

t=1

η [1−(1−η)t ]2

1 + η=

]

=

t=1

η2(1−η) [1−(1−η)t ]3

η 1 = η2 η

]

1 2(1−η) 1 2−η + η= + η= 2 η2 η t=1

By using the moment generating function tX

+∞

tx

x−1

M (t)=E(e )=∑ x=1 e ⋅η⋅(1−η)

t

=ηe

+∞

∑x=0 [e t (1−η)]x=

ηe t 1−(1−η)et

This function exists for any real t such that |et(1–η)| < 1 (otherwise, the series does not converge), as the following criterion shows x+1 a x+1 |et (1−η)| lim x→ ∞ =lim x →∞ t =|e t (1−η)| < 1. x ax |e (1−η)| Because of the mathematical real analysis,

[

E(X )=M (1) ( 0)=

ηe t [1−(1−η)e t ]−ηe t [−(1−η)et ] [1−(1−η)e t ]2

181

] [ =

t=0

ηet [1−(1−η)e t +( 1−η)e t ] [1−(1−η)et ]2

Solved Exercises and Problems of Statistical Inference

]

t =0

[

ηe t = [1−(1−η) et ]2

]

=

t =0

η 1 =η η2

[

ηe t [1−(1−η)e t ]2−ηe t 2[1−(1−η)et ][−(1−η)e t ] E( X )=M (0)= [1−(1−η)e t ]4 2

(2)

=

[

t

t

t

ηe [1−(1−η) e +2(1−η)e ] [1−( 1−η)e t ]3

] [ =

t =0

t

t

ηe [1+(1−η)e ] [1−(1−η) e t ]3

]

=

t =0

]

t =0

η( 2−η) 2−η = 2 η3 η

By using the characteristic function +∞

+∞

φ (t)=E( eitX )=∑ x=1 eitx⋅η⋅(1−η) x−1=ηe it ∑ x=0 [e it (1−η)]x =

it

ηe 1−(1−η)e it

This complex function exists for any real t such that |eit(1–η)| < 1, where |z| denotes the modulus of a complex number z (otherwise, the series does not converge), as the following criterion shows x+ 1

a x+1 |eit (1−η)| lim x→ ∞ =lim x →∞ it =|e it (1−η)| < 1. x ax |e (1−η)| Once more, complex analysis allows us to do, E(X )=

[

φ(1) (0) 1 ηe it i[1−(1−η)e it ]−ηe it [−(1−η) e it i] = i i [1−(1−η)e it ]2

=

[

it

it

it

1 ηe i[1−(1−η) e +(1−η) e ] i [1−(1−η) eit ]2

] [ =

t=0

]

t =0

it

ηe i 1 i [1−(1−η) eit ]2

]

t =0

=

1 ηi 1 = i η2 η

[

φ(2) (0) 1 ηeit i 2 [1−(1−η) eit ]2 −ηe it i 2[1−( 1−η) e it ][−(1−η)e it i] E(X )= 2 = 2 i i [1−(1−η) e it ]4 2

[

it 2

it

it

1 ηe i [1−(1−η)e + 2(1−η)e ] = 2 i [1−(1−η)eit ]3

] [

it 2

it

1 ηe i [1+( 1−η)e ] =2 i [1−(1−η)eit ] 3 t=0

]

]

t =0

2 1 ηi (2−η) 2−η = 2 = 2 i η3 η t=0

Mean and variance 1 μ=E( X )= η σ2 =Var ( X )=E( X 2 )−E(X )2=

2−η 1 2 1−η − η = 2 η2 η

( )

Advanced theory: Additional way 1: In Cálculo de probabilidades I, by Vélez, R., and V. Hernández, UNED, the first four moments are calculated as follows (I write the calculations for the first two moments, with the notation we are using) +∞

+∞

E ( X )=∑ x=1 x⋅η⋅(1−η) x−1=η ∑ x=1 x⋅(1−η) x−1=η =η

d d (1−η)

+∞

(∑

x=1

)

(1−η) x =η

1⋅[1−(1−η)]−(1−η)(−1) 1 1 =η 2 = η 2 [1−(1−η)] η +∞

+∞

+∞

E ( X 2)=∑ x=1 x 2⋅η⋅(1−η)x−1=η∑ x=1 ( x+ 1) x⋅(1−η) x−1−η ∑ x=1 x⋅(1−η) x−1

182

Solved Exercises and Problems of Statistical Inference

1−η d d (1−η) 1−( 1−η)

(

)

d2 =η d (1−η)2

+∞

(∑

x+ 1

x=1

(1−η)

)

(1−η)2 d2 −E ( X )=η −E(X ) d (1−η)2 1−(1−η)

(

)



2 (1−η)[1−(1−η)]−(1−η)2 (−1) d −E(X ) d (1−η) [1−(1−η)]2



2 (1−η)−2(1−η) +(1−η) 2(1−η)−(1−η) d d −E(X )=η −E(X ) 2 2 d (1−η) d (1−η) [1−(1−η)] [1−(1−η)]



[2−2(1−η)][1−(1−η)]2−[2(1−η)−(1−η)2]2[1−(1−η)](−1) 1 −η [1−(1−η)] 4

( (

)

2

2

)

2

(

)

2[ 1−(1−η)]2 +2[2(1−η)−(1−η)2 ] 1 =η −η [1−(1−η)]3 2 η2 +4 (1−η)−2(1−η)2 η 2 η2+ 4−4 η−2−2 η2+ 4 η−η 2−η = − 2= = 2 η2 η η2 η (We have already justified the convergence of the series involved.) Additional way 2: In trying to find a way based on calculating the main part of the series by using an ordinary differential equation, as I had previously done for the Poisson distribution (in the next section), I found the following way that is essentially the same as the additional way above. A series can be differentiated and integrated term by term inside the circle of convergence (the radius of convergence was one, which included all possible values for η). The expression of the mean suggests the following definition for g(η): +∞

E ( X )=∑ x=1 x⋅η⋅(1−η)

x−1

+∞

g (η)=∑ x=1 x⋅(1−η)



=η⋅g ( η)

x−1

and it follows, since g is a well-behaved function of η, that +∞

+∞

G( η) = ∫ g (η) d η=∑ x=1 ∫ x⋅(1−η)x−1 d η=−∑x=1 (1−η) x + c=−

1−η η−1 +c= +c 1−(1−η) η

I spent some time searching a differential equation... and I found this integral one. Now, by solving it, g( η)=G' (η)=

η−(η−1) 1 +0= 2 2 η η

(This is a general method to calculate some infinite series.) Finally, the mean is 1 1 E(X )=η⋅g( η)=η⋅ 2 = η η For the second moment, we define +∞

E ( X 2)=∑ x=1 x 2⋅η⋅(1−η)x−1=η⋅g (η)

+∞

g (η)=∑ x=1 x 2⋅(1−η) x−1



and it follows that +∞

+∞

G( η) = ∫ g (η) d η=∑ x=1 x ∫ x⋅(1−η)x−1 d η=−∑ x=1 x (1−η)x +c 1−η +∞ η−1 =− η ∑ x=1 x η(1−η)x−1 +c=c + 2 η Now, by solving this trivial integral equation, g( η)=G' ( η)=0+

η2−(η−1)2 η η2−2 η2 +2 η 2−η = = 3 η4 η4 η

Finally, the second moment is E( X 2 )=η⋅g ( η)=η

183

2−η 2−η = 2 3 η η

Solved Exercises and Problems of Statistical Inference

Remark: Working with the whole series of μ(η) or σ2(η), as functions of η, is more difficult than working with the previous functions g(η), since the variable η would appear twice instead of once (I spent some time until I realize it).

(4) The Poisson distribution By applying the definitions x +∞ E ( X )=∑ x=0 x⋅λ e−λ =? x! x

+∞ E ( X 2)=∑x=0 x 2⋅λ e−λ =? x!

To prove that any moment of order r is finite or, equivalently, that the series is convergent, we apply the ratio test for nonnegative series: x+1 (x +1)r⋅ λ e−λ a x+1 ( x+ 1)! ( x+1)r |λ| lim x→ ∞ =lim x →∞ =lim =0 < 1 x→ ∞ x r ax x+1 r λ −λ x x⋅ e x! x x +∞ +∞ r λ −λ −λ r This implies that ∞ > ∑x=0 x ⋅ e =e ∑x=0 x ⋅λ . Once the (absolute) convergence has been proved, x! x! the rules of “the usual arithmetic for finite quantities” could be applied. Nevertheless, working with factorial numbers in series makes it easy to prove the convergence but difficult to find the value.

|

|

|

|

By using the probability generating function x

x +∞ +∞ (t λ) G(t)=E(t X )=∑ x=0 t x⋅λ e−λ =e−λ ∑x=0 =e−λ e t λ =e λ(t−1) x! x!

This function exists for any t, as the following criterion shows

| | | | x+1

lim x→ ∞

a x+1 =lim x →∞ ax

( t λ) |t λ| x+1! =lim x →∞ =0 < 1 . x x +1 (t λ) x!

Now, the definitions and rules of the mathematical analysis for real functions of a real variable, E(X )=G(1) ( 1)= [ e λ(t −1) λ ]t =1=λ E(X 2 )=G(2) (1)+ E(X )=[ e λ(t −1) λ2 ]t =1 + E( X )=λ2 +λ By using the moment generating function t

x

x +∞ (e λ) M (t)=E(e )=∑x=0 e ⋅λ e−λ =e−λ ∑ x=0 =e−λ e e λ =e λ(e −1 ) x! x! tX

+∞

tx

t

t

This function exists for any real t, as the following criterion shows lim x→ ∞

184

a x+1 =lim x →∞ ax

( e t λ) x+1 |e t λ| x+1! =lim =0 < 1 . x→∞ x +1 (e t λ)x x!

| | | |

Solved Exercises and Problems of Statistical Inference

Because of the mathematical real analysis, E( X )=M (1) (0)=[ e λ(e −1 ) λ e t ]t =0=λ t

E(X 2 )=M (2) (0)= [ e λ(e −1 ) (λ et )2 +e λ(e −1) λ e t ]t =0=[ eλ (e −1) λ et (λ e t +1) ]t =0=λ (λ +1)=λ 2+ λ t

t

t

By using the characteristic function x

+∞

+∞

φ (t)=E( eitX )=∑x=0 eitx⋅ λ e−λ=e−λ ∑x=0 x!

(eit λ) x −λ e λ λ (e =e e =e x! it

it

−1)

This function exists for any real t, as the following criterion shows lim x→ ∞

a x+1 =lim x →∞ ax

| |

| |

( eit λ)x +1 |e it λ| x+ 1! =lim =0 < 1. x→∞ x+1 (e it λ) x x!

The definitions and rules of the analysis for complex functions have been applied in the previous calculations (they are similar to those for real functions of real variable). Now, by using the analysis for complex functions of one real variable, φ(1) (0) [ e E(X )= = i

it

λ(e −1)

φ(2) (0) [ e E(X )= 2 = i

λ e it i ]t=0 λ i = =η i i

it

λ(e −1)

2

(λ e it i)2+ eλ (e −1) λ e it i 2 ]t =0 it

i2

[ eλ (e −1) λ e it i2 (λ eit +1)]t =0 it

=

i2

=

λ i 2( λ+1) 2 =λ + λ 2 i

Mean and variance μ=E( X )=λ σ2 =Var (X )=E(X 2 )−E(X )2=λ2 +λ−λ2=λ Advanced theory: Additional way 1: In finding ways, I found the following one. A series can be differentiated and integrated term by term inside its circle of convergence. The limit calculated at the beginning was the same for any λ, so the radius of convergence for λ is infinite when the series is looked at as a function of λ. The expression of the mean suggests the following definition for g(λ): x

+∞ E ( X )=∑ x=0 x⋅λ e−λ =e−λ g (λ) x! and it follows, since g is a well-behaved function of λ, that



+∞

x

∑x=0 x⋅λx! = g ( λ)

x−1 x−1 x−1 +∞ +∞ +∞ +∞ x λ x−1 λ g '(λ)= ∑ x=1 x⋅ =∑x=1 ( 1+ x −1)⋅ λ =∑ x=1 λ + ∑ x=1 ( x−1)⋅ λ =e + g (λ) x! (x −1)! (x−1)! ( x−1)! λ Now, we solve the first-order ordinary differential equation g ' (λ)−g( λ)=e .

Homogeneous equation:

dg 1 =g → dg=d λ → log(g)=λ+k → gh (λ)=e λ+k =c eλ dλ g

g '(λ)−g(λ)=0 →

Particular solution: We apply, for example, the method of variation of parameters or constants. Substituting in λ λ λ the equation g( λ)=c (λ)e and g ' (λ)=c ' (λ) e +c ( λ)e λ

λ

λ

c ' (λ )e +c (λ)e −c( λ)e =e

185

λ

λ → c ' (λ )=1 → c (λ )=λ → g p (λ)=λ e

Solved Exercises and Problems of Statistical Inference

g( λ)=gh ( λ)+ g p (λ )=c e λ + λ e λ =( c+ λ) e λ

General solution:

Any g(λ) given by the previous expression verifies the differential equation, so an additional condition is necessary to determine the value of c. The initial definition implies that g(0) = 0, so c = 0. Finally, the mean is −λ

−λ

λ

E(X )=e g( λ)=e λ e =λ (The same can be done to calculate some infinite series.) For the second moment, we define x

+∞ E ( X 2)=∑ x=0 x 2⋅λ e−λ =e −λ g (λ) x! and it follows, since g is a well-behaved function of λ, that



+∞

x

∑x=0 x 2⋅λx! = g ( λ)

x−1

x−1 x−1 +∞ +∞ xλ g '( λ) = ∑ x=1 x ⋅ =∑ x=1 (1+ x−1)2⋅ λ =∑x=1 [1+(x−1)2+ 2( x−1)]⋅ λ x! (x−1)! ( x−1)! +∞

2

x−1 x−1 x−1 +∞ +∞ +∞ =∑ x=1 λ + ∑ x=1 ( x −1)2⋅ λ + 2 ∑x=1 ( x−1) λ =eλ + g( λ) + 2 e λ λ (x −1)! ( x−1)! (x −1)!

(The expression of the expectation of X has been used in the last term.) Thus, the function we are looking for λ verifies the first-order ordinary differential equation g '(λ)− g(λ )=e (1+2 λ). Homogeneous equation: This equation is the same, so gh (λ)=e λ+k =c eλ Particular solution: By applying the same method, c ' (λ )e +c ( λ)e −c(λ) e =e (1+ 2 λ) → c ' (λ )=1+2 λ → c (λ )=λ +λ 2 → g p (λ)=(λ+ λ2 )eλ λ

λ

λ

λ

g( λ)=gh ( λ)+ g p (λ )=c e λ +(λ+ λ 2) e λ =(c+ λ+ λ2 )e λ

General solution:

Any g(λ) given by the previous expression verifies the differential equation, so an additional condition is necessary to determine the value of c. The definition above implies that g(0) = 0, so c = 0. Finally, the second moment is E( X 2 )=e−λ g(λ)=e−λ (λ+λ 2)e λ =λ+λ 2 Remark: Working with the whole series of μ(λ) or σ2(λ) as functions of λ is more difficult than working with the previous functions g(λ), since the variable λ would appear twice instead of once. Additional way 2: Another way consists in using a relation involving the Stirling polynomials (see, e.g., § 2.69 of Análisis combinatorio: problemas y ejercicios. Ríbnikov et al. Mir) +∞

∑ j=0

n

j⋅

xj x = e Pn( x ) j!

n P 0 ( x )=1 , P 1 (x )=x , P 2 ( x)= x (1+ x) ,... , P n+1 (x )=x ∑ j=0 n P j ( x) j

()

In this case, x

+∞ E ( X )=e−λ ∑x=0 x⋅λ =e−λ⋅e λ P1 ( λ)=λ . x! 2

−λ

E ( X )=e

x

+∞

∑x=0 x 2⋅λx! =e −λ⋅eλ P 2 (λ )=λ 2 +λ .

(5) The exponential distribution By applying the definitions +∞

E(X )=∫0 x λ e

−λ x

Where the formula

dx=[−x e

+∞

] −∫0 −e

−λ x +∞ 0

−λ x

dx=[−x e

∫ u (x )⋅v ' (x )dx=u (x )⋅v (x )−∫ u ' (x)⋅v (x )dx 186

0

[( ) ]

1 1 −λ x ] − [e−λ x ]+∞ e 0 = x+ λ λ

−λ x +∞ 0

1 1 = −0= λ λ +∞

of integration by parts has been applied

Solved Exercises and Problems of Statistical Inference

with x and λe–λx as initial functions (since these two functions are of “different type”. u=x → u ' =1 • −λ x −λ x −λ x • → v=∫ λ e dx=−e v ' =λ e For the second-order moment, 2

+∞

2

−λ x

E ( X )=∫0 x λ e

2 −λ x +∞ 0

+∞

−λ x

] −∫0 −2 x e

dx=[−x e

dx=0+ 2 λ

−1

+∞

∫0

−λ x

xλe

−1

dx=2 λ μ=

2 λ2

Where the formula ∫ u (x )⋅v ' (x )dx=u (x )⋅v (x )−∫ u ' (x)⋅v (x )dx of integration by parts has been applied with x and λe–λx as initial functions (since these two functions are of “different type”. 2 • u=x → u ' =2 x −λ x −λ x • v ' =λ e−λ x → v=∫ λ e dx=−e That the function ex changes faster than xk, for any k, has been used too in calculating both integrals. On the other hand, for the exponential λ > 0, so the previous integrals always converge.

By using the moment generating function +∞ +∞ M (t)=E(e tX )=∫0 etx λ e−λ x dx=λ ∫0 e x[ t−λ ] dx= λ [e x [t −λ]]∞0 = λ t−λ λ−t

This function exists for real t such that t–λ < 0 (otherwise, the integral does not converge). Because of the mathematical real analysis,

[

E( X )=M (1) (0)=

−λ(−1) (λ−t)2

E(X 2 )=M (2) (0)=

[

]

1 = λ2 = λ λ t =0

−2 λ (λ−t)(−1) (λ−t) 4

] [ =

t =0

2λ (λ−t)3

]

t =0

=

2 λ2

By using the characteristic function +∞

+∞

φ (t)=E( eitX )=∫0 eitx λ e−λ x dx=λ∫0 e x (i t −λ) dx=λ lim M →∞ ∫{Z =γ , 0 ≤γ≤M } e z (i t−λ) dz =λ lim M →∞

[

e z (i t −λ) it−λ

]

= λ lim M →∞ [ e M (it −λ)−1 ]= λ lim M → ∞ [ e−M λ e i M t −1 ]= λ i t−λ λ−it { Z=γ , 0≤γ≤ M } i t−λ

This function exists for any real t such that it–λ ≠ 0 (dividing by zero is not allowed). In the previous calculation, that the complex integrand is differentiable has been use to calculate the (line) complex integral by using an antiderivative and the equivalent to the Barrow's rule. Now, the definitions and rules of the analysis for complex functions of a real variable must be considered to do

[

φ(1) (0) 1 −λ (−i) E(X )= = i i (λ−i t)2

]

1 = λ2 = λ λ t=0

[

φ(2) (0) 1 −λ i 2(λ−i t)(−i) E(X )= 2 = 2 i i ( λ−i t)4 2

] [

2

1 2λi = 2 i (λ−it)3 t=0

]

t =0

=

2λ 2 = λ 3 λ2

Mean and variance μ=E( X )=

1 λ

σ2 =Var ( X )=E( X 2 )−E( X )2=

187

2 1 2 1 − = 2 λ2 λ λ

( )

Solved Exercises and Problems of Statistical Inference

(6) The normal distribution By applying the definitions 2

− 1 E ( X )=∫−∞ x e √2 π σ2 +∞

(x−μ) 2 2σ

t

t

2

− 1 2σ dx=∫−∞ (t+μ) e dt 2 2 π σ √ +∞

2

2

t

2

− − +∞ 1 1 2σ 2σ =∫−∞ t e dt + μ e dt = 0+μ⋅1 = μ ∫ 2 −∞ 2 √2 π σ √2 π σ +∞

2

2

Where the change

t=x−μ → x=t +μ → dx=dt has been applied. In the second line, the first integral is zero because the integrand is and odd function and range of integration is symmetric, while the second integral is one because f(x) is a density function. 2

− 1 E ( X 2)=∫−∞ x 2 e √ 2 π σ2 +∞

( x−μ) 2 2σ

t

t

2

t

2

− − +∞ 1 1 2 2 2σ 2σ dx=∫−∞ (t +μ )2 e dt= (t +μ + 2μ t) e dt ∫−∞ 2 2 √2 π σ √2 π σ +∞

2

2

t

2

2

t

2

− − − +∞ +∞ 1 1 1 2 2σ 2σ 2σ =∫−∞ t e dt + μ e dt + 2μ t e dt ∫ ∫ 2 −∞ 2 −∞ 2 √2 π σ √2 π σ √2 π σ +∞

2

2

t

2

2

2

− +∞ 1 1 2 2σ = t e dt + μ 2⋅1 + 2 μ⋅0= σ 2 √ 2 π σ 2+μ 2 = σ 2 +μ 2 ∫ 2 −∞ 2 √2 π σ √2 π σ 2

where the first integral has been calculated as follows. 2

+∞



∫−∞ t 2 e

t 2 σ2

[

2



+∞

dt=∫−∞ t⋅t e

t 2 σ2

2



dt= −t σ 2 e

+∞

t 2σ 2

+∞

]

2

+∞

−∞



+σ 2∫−∞ e

t 2 σ2

+∞

dt=(0−0)+ σ2∫−∞

t 2 2σ

2

( ) dt e √ −

2

=σ 2 √ 2 σ2 ∫−∞ e −u du=σ 2 √ 2 π σ 2 Firstly, we have applied integration by parts u=t



u '=1

→ 2

2

t − 2 2σ

v ' =t e





v=∫ t e



t 2 2σ

2

2

dt=−σ e



t 2 2σ

(Again, the function ex changes faster than xk, for any k.) Then, we have applied the change t 2 2 =u → t=u √ 2 σ → dt=du √ 2 σ 2 √2 σ +∞ −x and the well-known result ∫−∞ e dx=√ π (see the appendix of Mathematics). On the other hand, these integrals converge for any real t. 2

By using the moment generating function 2

− 1 M (t)=E(e )=∫−∞ e e √ 2 π σ2 since +∞

tX

tx

2

+∞

∫−∞ e

xt



e

(x−μ ) 2 2σ



+∞



=∫−∞ e

2

+∞ tx − 1 dx= e ∫ √ 2 π σ2 −∞

1 [ −2 σ2 t x+x 2+μ 2−2μ x ] 2 2σ

+∞

dx=∫−∞ e

(x−μ) 2 2σ

=e

188



+∞

dx=∫−∞ e

1 {(x−[σ 2t +μ])2−[σ 2t +μ]2+μ 2 } 2 2σ

1 2 2 − 2 (μ−[ σ t +μ])(μ+[σ t +μ]) 2σ

( x−μ) 2 2σ

+∞



dx=e 2

dx=e

1 t (2μ+ σ2 t) 2

1 { x 2+−2 x [σ2 t +μ]+μ 2} 2 2σ

1 {μ2−[σ 2t +μ]2} 2 2σ

+∞



dx

( e −

x−[σ2 t +μ]

√ 2σ 2

−∞ 1 2 2 − 2 [−σ t] [2μ +σ ln (t )] 2σ

∫−∞ e−u √ 2 σ2 du=e

Solved Exercises and Problems of Statistical Inference

2

) dx

√ 2 π σ2

1

=e 2

2

t [2 μ+σ ln(t )]

√2 π σ2

where we have applied the change x−[σ2 t+μ ] =u → √2 σ 2

x=u √ 2 σ2 +[σ2 t+μ ] →

dx=du √ 2 σ 2

The integrand suggested completing the square in the exponent. This way is indicated Probability and Random Processes, by Grimmett and Stirzaker (Oxford University Press) for the standard normal distribution. We have used this idea for the general normal distribution. This function exists for any real t. Now, because of the mathematical real analysis,

[

1

E(X )=M (1) ( 0)= e 2

[

2

t (2 μ+σ t)

1

E(X 2 )=M (2) (0)= e 2

]

[

1

]

t (2μ+ σ t) 1 (2μ +σ 2 2 t) = e 2 (μ +σ2 t) t =0=μ 2 t =0

2

t (2 μ+σ t )

2

[(μ+σ 2 t )2+ σ2 ] ]t =0=μ 2 +σ 2

By using the characteristic function 2

− 1 φ (t)=E( eitX )=∫−∞ eitx e √2 π σ 2 +∞

(x−μ) 2 2σ

2

+∞ itx− 1 dx= ∫ e √2 π σ 2 −∞

( x−μ) 2 2σ

1

dx=⋯=e 2

it (2μ+ σ2 it)

This function exists for any real t. In this case, using the previous calculations with it in place of t leads to the correct result, but the whole way is not: in complex analysis we can also make an square appear in the exponent, as well as move coefficients outside of the integral (these operations are not trivial generalizations of the analogous in real analysis, and it is necessary to take into account the definitions, properties and results of complex analysis), but integrals must be solved in the proper way. (For this section, I have consulted Variable compleja y aplicaciones, Churchill, R.V., y J.W. Brown, McGraw-Hill, 5ª ed., and Teoría de las funciones analíticas, Markushevich, A., Mir, 1ª ed, 2ª reimp.) When the following limit exists, the integral can be solved as follows 2

+∞

itx−

∫−∞ e

(x−μ ) 2 2σ

2

+M

dx=lim M →∞ ∫−M e

itx−

(x−μ) 2 2σ

dx

Now, by completing an square in the exponent, as for previous generating functions, 2

+M

itx −

∫−M e

(x−μ) 2 2σ

dx =⋯=e

1 i t [2 μ+σ2 i t ] 2

+M



∫−M e

1 2 2 (x−μ−i σ t ) 2 2σ

dx

Because of the rules of complex analysis, these calculations are similar—but based on new definitions and properties— to those of previous sections. What is much different is the way of solving the integral. Now we cannot find an antiderivative of the integrand—as we did for the exponential distribution—and therefore we must think of calculating the integral by considering a contour containing the points {x−μ−i σ2 t , −M≤x≤+M }. The integral of a complex function is null for any close contour within the domain in which the function is differentiable. We consider the contour:

C( γ)=C I (γ)∪C II (γ)∪C III (γ)∪C IV (γ) C I (γ)={z=γ−μ−i t σ 2 , −M ≤γ≤+ M } 2

2

C II ( γ)={z=M −μ +i(γ−t σ ) , 0≤γ≤t σ } C III (γ )={z=−( γ−μ), −M ≤γ≤+ M }

C I (γ)={z=−M−μ−i γ , 0≤γ≤t σ 2 }

189

Solved Exercises and Problems of Statistical Inference

Then, 0 = ∫C f (z)dz = ∫C f ( z) dz +∫C f (z )dz +∫C f ( z) dz +∫C f (z) dz I

II

III

IV

1 2 − 2z 2σ

so for f ( z) = e +M

∫−M e



1 2 2 (x−μ−i σ t ) 2 2σ

dx = −∫C e



1 2 z 2 2σ

II



2



2

dz−∫C e

1 2 2 [ M−μ +i(γ−t σ )] 2 2σ



1 2 2 2 2 [(M −μ) −(γ−t σ ) +i 2 (M −μ)(γ−t σ )] 2 2σ

=−∫0 e

1 2 z 2 2σ



dz−∫C e

III



=−∫0 e





+M

d γ−∫−M e

1 2 z 2 2σ

dz

IV

1 2 [−(γ−μ)] 2 2σ



2



d γ−∫0 e −

+M

d γ−∫− M e

1 2 (γ−μ) 2 2σ

1 2 [−M −μ−i γ] 2 2σ tσ

2



d γ−∫0 e

dγ 1 2 2 [(M +μ) − γ +i 2(M +μ )γ] 2 2σ



We are interested in the limit when M increases. For the first integral,

|∫



2



e

0

1 2 (M−μ) 2 2σ

e

1 2 2 1 2 (γ−t σ ) − 2 [i 2(M−μ)(γ−t σ )] 2 2σ 2σ

e

|



d γ ≤ ∫0



=e

2

1 2 (M−μ) 2 2σ

|e



1 2 (M −μ) 2 2σ



e

2

∫0

e

1 2 2 1 2 (γ−t σ ) − 2 [i 2(M −μ)(γ−t σ )] 2 2σ 2σ

|d γ

e

1 2 2 (γ−t σ ) 2 2σ

d γ →M → ∞ 0

Since |e |=|cos (c)+i sin( c)|=1, ∀ c ∈ℝ and the last integral is finite (the integrand is a continuous function and the interval of integration is compact) and does not depend on M. For the second integral, ic



+M

∫−M e

1 2 (γ−μ ) 2 2σ

d γ=∫−M e

where the change γ−μ =u → √2 σ2

γ−μ

2

+M −μ √ 2 σ2 −M −μ

( √ 2 σ ) d γ=



+M

2



+∞

e−u √2 σ2 du →M →∞

√ 2σ 2∫−∞ e−u du=√2 π σ 2

2

√2 σ 2

γ=u √ 2 σ 2+μ →

d γ=du √2 σ 2

−M −μ

and

√2 σ

2

2



γ−μ

√2σ

2



−M−μ

√ 2 σ2

has been applied. Finally, for the third integral,

|∫

tσ 0

2



e

1 2 1 2 1 (M +μ ) − 2 −γ − 2 i 2 (M+μ)γ 2 2σ 2σ 2σ

e

|

e



dγ ≤e

1 2 (M +μ) 2 2σ



∫0

2



e

1 2 −γ 2 2σ

d γ →M →∞ 0

Again, the last integral is finite and does not depend on M. In short, 2

+∞ itx− 1 φ (t)= e ∫ √ 2 π σ 2 −∞ 1

(x−μ) 2 2σ

2

+M itx− 1 dx= lim e ∫ M→ ∞ −M √2 π σ 2 1

i t [2 μ+σ i t ] +M − 1 2 2σ = e lim ∫ M →∞ −M e 2 √2 π σ 2

2

2

2

(x−μ−i σ t)

(x−μ) 2 2σ

dx 1

i t [2 μ+σ 1 2 dx= e √ 2 π σ2

2

it ]

√2 π σ

2

=e

1 it [2μ +σ2 i t ] 2

This function exists for any real t. (The reader can notice that the correct way is slightly longer.) Now, (1)

E(X )=

φ (0) = i

[

e

[

1 2 i t [2μ +σ i t ] 2

φ (0) e E( X )= 2 = i 2

(2)

]

1 1 i t [2 μ+σ i t ] i(2μ+ iσ 2 2 t) 2 2 e i(μ+i σ2 t) t =0 iμ t =0 = = =μ i i i

1 2 i t [ 2μ+ σ it ] 2

[

[ i2 (μ+ iσ 2 t)2 +i(i σ2 )] ]t=0 i2

2

=

]

i2 μ 2 +i 2 σ 2 =μ 2+ σ2 2 i

Mean and variance

μ=E( X )=μ

190

Solved Exercises and Problems of Statistical Inference

σ2 =Var (X )=E(X 2 )−E(X )2=σ 2+μ 2−μ2 =σ2

Conclusion: To calculate the moments of a probability distribution, different methods can be considered, some of them quite more difficult than others. The characteristic function is a complex function of a real variable, which requires theoretical justifications of complex analysis we must be aware of. My notes:

[Ap] Mathematics Remark 1m: The exponential function ex changes faster than any monomial xk of any k. Remark 2m: In complex analysis, there are frequently definions and properties analogous to those of real analysis. Nevertheless, one must take care before applying them. Remark 3m: Theoretically, quantities like proportions (sometimes expressed in per cent), rates, statistics, etc., are dimensionless. To interpret a numerical quantity, it is necessary to know the framework in which it is being used. For example, 0.98% and 0.98% 2 are different. The second must be interpreted as √(0.98%2) = 0.99%. Thus, to track how they are transformed the use of a symbol may be useful. Remark 4m: In working with expressions—equations, inequations, sums, limits, integrals, etc—, special attention must be paid when 0 or ∞ appears. For example, even if two limits (series, integrals, etc) do not exist, their summation (difference, quotient, product, etc) may exist. limn→∞ n3 = ∞ and limn→∞ n4 = ∞, but limn→∞ n3/n4 = 0

or



∫1

1 dx does not exist while x



∫1

1 1 ⋅ dx does x x

On the other hand, many paradoxes (e.g. Zenon's ones) are based on any wrong step (in red color): 0 = 0 ↔ 0·2 = 0·3 ↔ 0·2 = 0·3 ↔ 2 = 3

and

∞ = ∞ ↔ ∞·2 = ∞·3 ↔ ∞·2 = ∞·3 ↔ 2 = 3

Readers of advanced sections may want to check some theoretical details related to the following items (the very basic theory is not itemized).

Some Reminders Real Analysis For real functions of one or several real variables. n n ( x+ y)n =∑ j=0 n x j y n− j or, equivalently, ( x+ y)n =∑ j=0 n x n− j y j j j

()

● Binomial Theorem.

()

● Limits: infinitesimal and infinite quantities. ● Integration: methods (integration by substitution, integration by parts, etc.), Fubini's theorem, line integral. ● Series: convergence, criteria of convergence, radius of convergence, differentiability and integrability, Taylor series, representation of the exponential function, power series. Concretely, when the criterion of the quotient is applied to study the convergence, the radius of convergence is defined as: lim m →∞

am +1 |c x m+1| |c | =lim m →∞ m +1 m =|x|lim m→∞ m +1 < 1 am |c m| |c m x |



|x| < lim m →∞

(Similarly for the criterion of the square root.) 191

Solved Exercises and Problems of Statistical Inference

|c m| =r |c m+1|

● Geometric Series. For 00 such that n< M 0 such that n< M > 0 and the limit could not be zero. n+c M +c

an+ b , where a, b, c and d are constants cn +d

Way 0: (This limit includes the previous.) The quotient is an indeterminate form. Intuitively, the numerator increases like an and the denominator like cn. (The terms b and d are negligible for huge n.) Then, the limit of the quotient tends to a/c. Way 1: Formally, we divide the numerator and the denominator (all their terms) by n. lim n →∞

(

−1

)

an+ b n an+ b =lim n→∞ −1 =lim n→ ∞ cn+d n cn+ d

b n a = d c c+ n a+

Way 2: By using infinites, lim n →∞

an+ b an a a =lim n→∞ =lim n →∞ = cn+d cn c c

Necessity: lim n

an+ b a = cn+ d c

n→∞

then

If not, that is, if ∃M >0 such that n< M 0 |c|(|c| M +|d|) cn+ d c c ( cn+d )

|

||

|

and the limit could not be a/c... unless the original quotient was always equal to this value. Notice that when the previous numerator is cero, ad=bc



a b =λ= c d



c {a=λ b=λ d



an+b λ (cn+ d) a = =λ= cn+d cn+d c



an+b a − =0 cn+d c

that is, in this case the function is really a constant. In the initial statement, the condition

|ac db|≠0

could

have been added for the polynomials an+b and cn+ d to be independent. an k + b(n) , where a and c are constants and b(n) and d(n) are polynomials whose degrees are k cn +d (n) smaller than k1 and k2, respectively

(4)

1

lim n →∞

2

Way 0: (This limit includes the two previous.) The quotient is an indeterminate form. Intuitively, the numerator increases like ank and the denominator like cn k while b(n) and d(n) are negligible. Thus, 1

197

2

Solved Exercises and Problems of Statistical Inference

{

0 if k 1< k 2 a if k 1=k 2 k c an + b(n) lim n →∞ k = a cn +d (n) −∞ if k 1 > k 2 , k 2 , >0 c 1

2

Way 1: Formally, we divide the numerator and the denominator (all their terms) by the power of n with the highest degree among all the terms in the quotient (if there were products, we should imagine how the monomials are). For example, for the case k 1 k 2 , > 0 c 1

1

2

2

a b + n n2 , where a, b and c are constants c n3

Way 0: The quotient is an indeterminate form. Intuitively, the numerator decreases like a/n (the slowest) and the denominator like c/n3, so the denominator is smaller and smaller with respect to the numerator, and, as a consequence, the limit is –∞ or +∞ depending on whether a/c is negative or positive, respectively. Way 1: Formally, it is always possible to multiply or divide the numerator and the denominator (all their monomials, if they are summation, or any element, if they are products) by the power of n with the appropriate exponent. Then we can do

lim n →∞

a b a b a + 2 + 2 3 2 −∞ if 0 3 3 c n n

( ) 198

{

Solved Exercises and Problems of Statistical Inference

Way 2: By using infinitesimals, a b + n n2 =lim n→∞ c n3

lim n →∞

a a −∞ if < 0 3 n an a 2 c =lim n →∞ =lim n→∞ n = c c n c a +∞ if >0 c n3

( )

( )

{

Conclusion: We have studied the limits proposed. Some of them were almost trivial, while others involved indeterminate forms like 0/0 or ∞/∞. All the cases were quotients of polynomials, so the limits of the former form have been transformed into limits of the latter form. To solve these cases, the technique of multiplying and dividing by the same quantity has suffices (there are other techniques, e.g. L'Hôpital rule).

Additional examples lim n →∞

1 1 1 =0 or lim n →∞ =lim n →∞ =0 n−1 n−1 n

lim n →∞

( 2n − n1 )=0 2

lim n →∞ (2 n−n2)=lim n →∞ [n(2−n)]=−∞ or lim n →∞ (2 n−n2)=lim n →∞ (−n2)=−∞ lim n →∞

n−1 =lim n →∞ n

lim n →∞

n =lim n →∞ n−2

1−

1 n

=1 or

lim n →∞

n−1 n =lim n →∞ =1 n n

=1

or

lim n →∞

n n =lim n →∞ =1 n−2 n

1 n−1 n lim n →∞ =lim n→∞ =1 n−3 3 1− n

or

lim n →∞

n−1 n =lim n→∞ =1 n−3 n

1 1 2 1− n 1−

My notes:

Exercise 3m (*) Study the following limits of sequences of two variables (1) lim n

X

(2) lim n

X

(3) lim n

X

(4) lim n

X

→∞ nY →∞

→∞ nY →∞

→∞ nY →∞

→∞ nY →∞

( n X +nY ) ( n X⋅nY ) n X nY nX

lim n

and

→∞ nY →∞

and

lim n

X

→∞ nY →∞

and

lim n

X

→∞ nY →∞

(n X +a)(n Y +b) n X +c 199

X

and

( n X −nY )

nX nY nX n X nY lim n

X →∞ nY →∞

nX+ a where a, b and c are constants (n X +b)(nY +c )

Solved Exercises and Problems of Statistical Inference

1 1 n X nY 1 nX

(5) lim n

X →∞ nY →∞

(6) lim n

→∞ nY →∞ X

n X +n Y nX

and

lim n

X →∞ nY →∞

and lim n

(7) lim n

(8) lim n

1 1 + n X nY 1 nX

nd lim n

n X +n Y n X nY

and lim n

X →∞ nY →∞

X →∞ nY →∞

(9) lim n

→∞ nY →∞ X

(10) lim n

→∞ nY →∞ X

nX n X +n Y

→∞ nY →∞

1 1 n X +a n Y +b 1 n X +c

X

and

lim n

X →∞ nY →∞

1 nX+ a 1 1 n X +b n Y +c

1 nX

X →∞ nY →∞

n X −nY n X nY

1 nX 1 1 n X nY

1 1 + n X nY

→∞ nY →∞ X

and lim n

n X nY n X +n Y

→∞ nY →∞ X

n X nY n X −nY

Discussion: We have to study several limits of two-variable sequences. Firstly, we try to substitute the value to which the variable tends in the expression of the quantity in the limit. If we are lucky, the value is found and the formal calculations are done later; if not, techniques to solve the indeterminate forms must be applied. These limits may be quite more difficult than those for one variable, since we need to prove that the value does not depend on the particular way for the sample sizes to tend to infinite (if the limit exists or is infinite) or find two ways such that different values are obtained (the limits does not exist).

(1)

lim n

→∞ nY →∞ X

( n X +nY )

and

lim n

→∞ nY →∞ X

( n X −nY )

Way 0: Intuitively, the first limit is infinite while the second does not exist, since it depends on which variable increases faster. Way 1: For the first limit to be infinite, it is necessary and sufficient one variable tending to infinite, say nX. lim n

X →∞ nY →∞

( n X +nY ) > lim n → ∞ n X =∞ X

For the necessity, if n X < M n X nY ( n X +1) nY Since the expression of the sequence is symmetric, the same inequality is true when nY increases in a unit. Finally, the case when both sizes increase in a unit can always be decomposed in two of the previous steps, while the quantity n +n Q (n X , nY )= X Y n X nY depends only on the position, not on the way to arrive at it; thus, the sequence decreases in this case too. Second, Q can take values in a discrete set that can sequentially be constructed and ordered to form a sequence that is strictly decreasing and bounded, say Q(k). (The set ℕ x ℕ is countable.) The symmetry implies that the increase of Q can take only two values—not three—when any sample size or both increase in a unit. In sort, Q(k) converges, though we need not build it. Third, any path s(k) such that the sample sizes are nondecreasing and tend to infinite can be written in terms of one-unit rightward and upward steps, with an infinite amount of any type. For each path s(k), the quantity Q s ( k)=

nX ( k)+ nY ( k ) n X (k ) nY (k )

can be seen as a subsequence of Q(k). Finally, the limit of Q is unique and the case nX = n = nY indicates that it is zero: n (k )+n(k) 2 lim k →∞ =lim k→ ∞ =0 n (k )⋅n( k) n(k ) For the necessity for both sample sizes to tend to infinite, let us suppose, without loss of generality, that nX ≤ M < ∞. There would be a subsequence that cannot tend to zero: lim k →∞

n X (k )+n Y (k) n ( k )+ nY (k ) 1 ≥ lim k→∞ X = >0 n X (k ) nY (k ) M nY ( k ) M

whatever the behaviour of nX(k). The previous nondecreasing s(k) are the only paths of interest in Statistics.

(10)

lim n

→∞ nY →∞ X

n X −nY n X nY

and lim n

→∞ nY →∞ X

n X nY n X −nY

Way 0: Intuitively, the limit of the difference does not exist, since it takes different values that depend on the path; but the difference—or the summation, in the previous section—is so smaller than the product, that the first limit seems zero while the second seems infinite. Formally, we can do calculations as for the previous limit, for example n −n n n 1 1 lim n →∞ X Y =lim n →∞ X − lim n →∞ Y =lim n →∞ − lim n →∞ =0−0=0 n X nY n X nY n n nY nX n →∞ n →∞ n →∞ X Y X

X

X

Y

Y

Y

Y

X

or, alternatively, use the bound:

205

Solved Exercises and Problems of Statistical Inference

|

lim n

→∞ nY → ∞ X

n X −nY ≤lim n nX nY n

|

→∞ Y →∞ X

n X −nY n +n ≤lim n →∞ X Y =0 n X nY n X nY n →∞

|

|

X Y

Conclusion: We have studied the limits proposed. Some of them were almost trivial, while others involved indeterminate forms like 0/0 or ∞/∞. All the cases were quotients of polynomials, so the limits of the former form have been transformed into limits of the latter form. To solve these cases, the technique of multiplying and dividing by the same quantity has suffices (there are other techniques, e.g. L'Hôpital rule). Other techniques have been applied too.

Additional Examples: Several limits have been solved in the exercises—look for limit in the final index. My notes:

Exercise 4m (*) For two positive integers nX and nY, find the (discrete) frontier and the two regions determined by the equality

2(n X +nY )=(nX −nY )2

Discussion: Both sides of the expression are symmetric with respect to the variables, meaning that they are the same if the two variables are switched. This implies that the frontier we are looking for is symmetric with respect to the bisector line. The square suggests a parabolic curve, while

2(n X +nY )=(nX −nY )2

2(1+n X nY )=(n X −1)2+(nY −1)2



suggests a sort of transformation of a conic curve. Intuitively, in the region around the bisector line, the difference of the variables is small and therefore the right-hand side of the original equality is smaller than the left-hand side; obviously, the other region is at the other side of the (discrete) frontier.

Purely computational approach: In a previous exercise we wrote some “force-based” lines for the computer to plot the points in the frontier. Here we use the same code to plot the inner region (see the figures below) N = 100 vectorNx = vector(mode="numeric", length=0) vectorNy = vector(mode="numeric", length=0) for (nx in 1:N) { for (ny in 1:N) { if (2*(nx+ny)>=(nx-ny)^2) { vectorNx = c(vectorNx, nx); vectorNy = c(vectorNy, ny) } } } plot(vectorNx, vectorNy, xlim = c(0,N+1), ylim = c(0,N+1), xlab='nx', ylab='ny', main=paste('Regions'), type='p')

Algebraical-computational approach: Before using the computer, we can do some algebraical work n2X + n2Y −2 n X nY =2n X +2 nY



n2Y −2(n X +1)nY + n X ( n X−2)=0

2(n X +1)±√ 4 (n X +1)2−4 n X (n X −2) 2( n X +1)±2 √ n2X +2 n X +1−n 2X +2 n X nY = = =(n X +1)±√ 4 n X +1 2 2

206

Solved Exercises and Problems of Statistical Inference

The following code plots the two branches of the frontier (see the figures above) N = 100 vectorNx = seq(1,N) vectorNyPos = (vectorNx+1)+sqrt(4*vectorNx+1) vectorNyNeg = (vectorNx+1)-sqrt(4*vectorNx+1) integerSolutions = (vectorNyPos/round(vectorNyPos) == 1) yL = c(0, max(vectorNyPos[integerSolutions], vectorNyNeg[integerSolutions])) plot(vectorNx[integerSolutions], vectorNyPos[integerSolutions], xlim = c(0,N+1), ylim = yL, xlab='nx', ylab='ny', main=paste('Frontier'), type='p') points(vectorNx[integerSolutions], vectorNyNeg[integerSolutions])

Algebraical, analytical and geometrical approach: The change of variables C1 ( n X , nY )=(n X −nY , n X + nY )=(u , v ) 1 2 is a linear transformation. The new frontier can be written as the parabolic curve v = u . The computer 2 allows plottin this frontier in the U-V plane. N = 50 vectorU = seq(-50, +50) vectorV = 0.5*vectorU^2 plot(vectorU, vectorV, xlim = c(-N-1,+N+1), ylim = c(0,max(vectorV)), xlab='u', ylab='v', main=paste('Frontier'), type='p')

How should the change of variables be interpreted? If we write

(uv)=(11 −11 )(nn ) X Y

the previous matrix reminds us a rotation in the plain (although movements have orthonormal matrixes and the previous is only orthogonal). Let us have a look to how a triangle—a rigid polygon—is transformed,

207

Solved Exercises and Problems of Statistical Inference

P1=( 1, 2)



C1 ( P1 )=(−1 , 3)

P2=(1, 1)



C1 (P2 )=(0 , 1)

P3=(2 ,1)



C1 (P3 )=(1 ,3)

To confirm that C1 is a rotation plus a dilatation (homothetic transformation), or vice versa, we consider the distances between points, the linearity, and a rotation of the axes. First, if ~ ~ A=( a1−a 2 , a1 +a 2) A=(a 1 , a2 ) → B=(b1 , b2) → B=(b1−b 2 , b 1+ b2) then d (~ A ,~ B)= [(b −b )−(a −a )]2 +[(b +b )−(a +a )]2= [(b −a )−(b −a )]2 +[(b −a )+(b −a )]2 u, v

√ √ =√ 2(b −a ) + 2(b −a ) =√ 2⋅√(b −a ) +(b −a ) =√ 2⋅d 1

2

1

2

1

2

1

2

1

2

2

1

2

2

1

2

1

1

1

2

2

2

2

n x , ny

2

1

1

2

2

( A , B)

This means that the previous change of variable is not an isometry; therefore it cannot be considered a movement in the plain, technically. Nonetheless, the previous lines show that the linear transformation 1 C2 (n X , nY )= (n X −nY , n X +n Y )=(u , v ), √2 respects the distances, so it is an isometry whose matrix is orthonormal—that is, it is a movement. Now, the frontier is 2 1 2 1 1 1 (n X +nY )= (n X −nY ) ↔ ↔ 2(n X +nY )=(nX −nY )2 v= u √2 √2 √2 √2 C2 can be written as u = 1 1 −1 n X v √ 2 1 1 nY

[

]

( ) ( )( )

which is the expression of a rotation in the plain (see the literature on Linear Algebra). Second, the linearity implies that both C1 and C2 transform lines into lines. The expression ⃗ AB=0⃗A +λ AB=(a 1 , a 2)+λ (b 1 −a1 , b2−a2 )=(λ b1 +( 1−λ) a 1 , λ b2 +(1−λ)a2 ) determines the line containing A and B if λ ∈ℝ and the segment from A to B if λ ∈[0,1] . It is transformed as follows C1 ( λ b1 +(1−λ) a1 , λ b2 +(1−λ) a2 )=(λ b 1+(1−λ) a1−λ b2−(1−λ)a2 , λ b 1+(1−λ)a1 +λ b 2+(1−λ) a2 ) =(λ (b1−b2 )+(1−λ)(a 1−a2 ), λ(b 1+ b2 )+(1−λ)(a 1+ a2))=λ( b1−b 2 , b 1+b 2)+(1−λ)(a1−a2 , a1 +a2 ) =λ C1 (b 1 , b 2)+(1−λ)C 1(a1 , a2) (similarly for C2). This expression determines the line containing C1(A) and C1(B) if λ ∈ℝ and the segment from C1(A) to C1(B) if λ ∈[0,1]. Third, as regards the rotation of axes, the following figure and formulas are general e⃗1 = cos α ~ e⃗1 + sin α ~ e⃗2 e⃗2 =−sin α ~ e⃗1 + cos α ~ e⃗2

{

(Rotation sinistrorsum)

e⃗1 = cos α ~ e⃗1 − sin α ~ e⃗2 e⃗2 = sin α ~ e⃗1 + cos α ~ e⃗2

{

(Rotation dextrorsum) 208

Solved Exercises and Problems of Statistical Inference

When the axes are rotated in one direction, it can be thought that the points are rotated in the opposite. Now, C2 can be written as a 45º dextrorsum rotation of the axes e⃗1 = cos π ~ e⃗ − sin π ~ e⃗ 4 1 4 2 e⃗2 = sin π ~ e⃗ + cos π ~ e⃗ 4 1 4 2

{

1 1 π −sin π ~ − cos ~ ⃗ ⃗ e⃗1 4 4 e1 = √ 2 √2 e 1 = 1 1 −1 = 1 1 ~ e⃗2 e⃗2 e⃗2 √ 2 1 1 sin π cos π ~ 4 4 √2 √ 2

()(

)( ) (

)( )

~ e⃗1 ~ e⃗2

( )( )

Any point P=(x , y ) is transformed through 1 1 −1 x 1 x− y = =u . 1 1 y x + y v 2 2 √ √

( )( ) ( ) ( )

The matrix M = −1 t M =M . Then,

1 1 −1 √2 1 1

( )

is orthogonal, which means that M⋅M t=I =M t⋅M and implies that ~ 1 1 1 e⃗1 e⃗ = ~1 . √2 −1 1 e⃗2 e⃗2

(

)( ) ( )

Conclusion: We have applied different approaches to study the frontier and the two regions determined by the given equality. Fortunately, nowadays the computer allows us to do this work even without any deeper theoretical study—change of variable, transformation, et cetera. My notes:

209

Solved Exercises and Problems of Statistical Inference

References Remark 1r: When an exercise is based on another of a book, the reference has been included below the statement; some statements may have been taken from official exams. I have written the entire solutions. The slides mentioned in the prologue contain references on theory. For some specific theoretical details, some literature is referred to in proper section of this document.

[1] The R Project for Statistical Computing, http://www.r-project.org/ [2] Wikipedia, http://en.wikipedia.org/ My notes:

210

Solved Exercises and Problems of Statistical Inference

Tables of Statistics Basic Measures μ = E ( X ) = ∑Ω x i⋅ f ( x i )

(Discrete)

μ = E ( X ) = ∫Ω x⋅ f (x )dx

(Continuous)

σ 2 = Var ( X ) = E ( [ X −μ]2 ) =⋯ = E( X 2 )−μ 2

Basic Estimators ̄ = 1∑ Xi X n i=1 n

s2 =

n 2 1 S = X i− X̄ ) ( ∑ i=1 n−1 2

2

n 1 n ̄ ) 2 = ⋯= 1 ∑ X 2i − X̄ 2 X − X ( ∑ i n i=1 n i =1

n s = (n−1) S

n X s 2X + n Y s 2Y (n X −1)S 2X + ( nY −1)S Y2 S = = n X + n y−2 n X + n y −2 2 p

2

n

1 n V = ∑i=1 ( X i−μ ) 2 n 2

η̂ =

1 population

∑i=1 X i

η̂ p=

n

nX η ̂ X + nY η̂ Y n X + nY

2 populations

Parameter μ σ2

Estimator ̄ X V

μ known

σ2

s

μ unknown η

2

2

or S

η̂

211

Parameter μX –μY σX2/σY2

Estimator ̄ −Ȳ X V 2X V 2Y

μX, μY known 2

σX2/σY2 μX, μY unknown ηX–ηY

Solved Exercises and Problems of Statistical Inference

2

sX 2 sY

2

or η̂ X − η̂ Y

SX 2 SY

Basic Statistics 1 normal population, any n Parameter μ

Statistic T ( X ; μ)=

2

σ known

̄ −μ X



σ2 n

μ

n

∼ N (0,1)

2

(

̄ ∼ N μ,σ X n

T ( X ; μ)=

̄ −μ X

2

σ unknown

σ2



T ( X ; σ)=

μ known

σ2

∑i=1 X i ∼ N ( nμ , nσ 2 )

2

S n

)

∼ t n −1

nV 2 ∼ χ 2n 2 σ 2

n s 2 (n−1) S = ∼ χ2n −1 2 2 σ σ

T ( X ; σ)=

μ unknown

2 independent normal populations, any nX and nY Parameters μX–μY

Statistic T ( X , Y ; μ X ,μ Y )=

σX2, σY2 known

̄ −Ȳ )−(μ X −μY ) (X



σ 2X σ 2Y + n X nY

(

̄ −Ȳ ) ∼ N μ X −μ Y , (X

μX–μY T ( X , Y ; μ X ,μ Y )=

σX2, σY2 unknown

σX2/σY2

̄ −Ȳ )−(μ X −μY ) (X



T ( X , Y ; σ X , σ Y )=

μX, μY known

212

(

2

2

S X SY + n X nY

2 1 nX V X n X σ 2X 2 1 nY V Y n Y σY2



σ 2X σY2 + n X nY

) where k is the closest integer to

∼ tk

V 2X =

∼ N ( 0,1)

σ 2X V Y2 σY2

)

=

V 2X σY2 V Y2 σ 2X

Solved Exercises and Problems of Statistical Inference

∼ Fn

X

, nY

σX2/σY2

T ( X , Y ; σ X , σ Y )=

μX, μY unknown

(

( n X −1)S 2X 1 (n X −1) σ 2X 2

(nY −1)S Y 1 (nY −1) σ 2Y

S 2X =

σ 2X 2

SY σY2

)

=

S 2X σ2Y 2

2

SY σ X

∼ Fn

X

−1 ,nY −1

1 population, large n Parameter

Statistic ̄ −μ d X T ( X ; μ)= → N (0,1) ? n where ? is substituted by σ2, S2 or s2

μ

n



η

̄ → N μ,? X n

( )

d

d ̂ η−η → N (0,1) ?(1−? ) n where ? is substituted by η or η̂

T ( X ; η)=

d

∑i=1 X i → N ( nμ , n⋅?)



d

( √

η̂ → N η,

? (1−?) n

)

2 independent populations, large nX and nY Parameters μX–μY

Statistic T ( X , Y ; μ X ,μ Y )=

̄ −Ȳ )−(μ X −μY ) (X

d

→ N (0,1)



?X ?Y + n X nY where for each population ? is substituted by σ2, S2 or s2 d ? ? ̄ −Ȳ ) → N μ X −μY , X + Y (X n X nY

(

ηX–ηY

T ( X , Y ; ηX , ηY )=

√ )

̂ Y )−( ηX −ηY ) ( η̂ X − η

d

→ N (0,1)



? X (1−? X ) ?Y (1−?Y ) + nX nY where for each population ? is substituted by η or η̂

Remark 1T: For normal populations, the rules that govern the addition and subtraction imply that: 2

( )

̄ ∼ N μx , σx , X nx

(

2

)

σ Ȳ ∼ N μ y , y , ny

and hence

(

2

2

)

̄ ∓Ȳ ∼ N μ x ∓μ y , σ x + σ y . X nx n y

The tables include results combining the rules with a standardization or studentization. We are usually interested in comparing the mean of the two populations, for which the difference is considered; nevertheless, the addition can also be considered with

213

Solved Exercises and Problems of Statistical Inference

̄ ∓Ȳ )−(μ X ∓μ Y ) (X



σ 2X σ2Y + n X nY

∼ N (0,1).

On the other hand, since the quality of estimators—e.g. measured through the mean square error—increase with the sample size, when the parameters of two populations are supposed to be equal the samples should be merged to estimate the parameter jointly (especially for small nx and ny). Then, under the hypothesis σx = σy the pooled sample quasivariance should be used through the statistic:

T ( X , Y ; μ X ,μ Y )=

̄ −Ȳ )−(μ X −μY ) (X



2 p

2 p

S S + n X nY

∼ tn

X

+nY −2

Remark 2T: For any populations with finite mean and variance, one version of the Central Limit Theorem implies that 2

(

)

̄ → N μx , σ X , X nx d

2

(

)

σ Ȳ → N μ y , Y , ny d

2

(

2

)

̄ ∓Ȳ → N μ x ∓μ y , σ X + σY , X nx ny d

and hence

where the rules that govern the convergence (in distribution) of the addition—and subtraction—of sequences of random variables (see a text on Probability Theory) and the rules that govern the addition and subtraction of normally distributed variables are applied. We are usually interested in comparing the mean of the two populations, for which the difference is considered; nevertheless, the addition can also be considered with

̄ ∓Ȳ )−(μ X ∓μ Y ) (X



?x ?y + nx ny

( η̂ X ∓η̂ Y )−(ηX ∓ηY )

d

→ N ( 0,1)



and, for a Bernoulli population,

? X (1−? X ) ? Y (1−? Y ) + nX nY

d

→ N (0,1).

Besides, variances can be estimated when they are unknown. By applying theorems in section 2.2 of Approximation Theorems of Mathematical Statistics, by R.J. Serfling, John Wiley & Sons, and sections 7.2 and 7.3 of Probability and Random Processes, by G. Grimmett and D. Stirzaker, Oxford University Press,

X̄ −μ

=

1

X̄ −μ

√ √ √ S2 n

S2 σ2

σ2 n

̂ −η η

d

→ 1⋅N (0,1)=N ( 0,1)



and

=

η(1− ̂ η ̂) n d

1



η(1− ̂ η ̂) η(1−η)

t n−1 → N (0,1).

Similarly for two populations. From the first convergence it is deduced that

̂ η−η



d

→ N (0,1).

η(1−η) n

On the other hand, when the parameters of two

populations are supposed to be equal the samples should be merged to estimate the parameter jointly (especially for medium nx and ny). Then, under the hypothesis σx = σy the pooled sample quasivariance should be used—although in some cases its effect is negligible—through the statistic:

T ( X , Y ; μ X ,μ Y )=

̄ −Ȳ )−(μ X −μY ) (X



S 2p S 2p + n X nY

d

→ N (0,1)

For a Bernoulli population, under the hypothesis ηx = ηy the pooled sample proportion should be used—although in some cases the effect is negligible—in the denominator of the statistic:

T ( X , Y ; ηX , ηY )=

̂ Y )−(η X −ηY ) ( η̂ X − η



d

→ N (0,1) .

η ̂ p (1− η ̂ p) η ̂ (1−η̂ p ) + p nX nY

Remark 3T: In the last tables, the best information available should be used in place of the symbol ?. Remark 4T: The Bernoulli population is a particular case for which directly estimated without estimating η,

σ̂ 2

μ=η

is used in place of the product

and

2

σ =η⋅(1−η), ?(1−?).

so

̄ =η̂ , X

When the variance σ2 is

Remark 5T: Once an interval for the variance is obtained, P(a1 < σ2 < a2), since the positive square root is a strictly increasing function (and therefore it preserves the order between two values) an interval for the standard deviation is given by P(√ a1 < σ < √a2). (Notice that, for a reasonable initial interval, 0 < a1.) Similarly for the quotient of two variances σX2/σY2.

214

Solved Exercises and Problems of Statistical Inference

Statistics Based on Λ 1 population, any n

Parameters θ θ

Statistic Λ=

(1 dimension) Λ=

(r dimensions)

L( X ; θ̂ 0) ̂ L( X ; θ)

L( X ; θ 0) L( X ; θ 1) d

−2 ln(Λ) → χ 2r

Asymptotically,

Analysis of Variance (ANOVA) P independent normal populations One-Factor Fixed-Effects Between-Group Measures

Sample Quantities P

2

̄ p− X̄ ) SSG = ∑ p =1 n p ( X P

Within-Group Measures Total Measures

SSW = ∑ p=1 SS p

where

np

̄ p) 2 SS p = ∑i=1 ( X p ,i − X P

Statistic

MSG =

1 SSG P−1

MSW =

1 SSW n−P

T0 =

MSG ∼ F P −1, n−P MSW

np

SST = ∑ p=1 ∑i =1 ( X p , i− X̄ )2 = SSW + SSG

Nonparametric Hypothesis Tests Chi-Square Tests

Data

Null Hypothesis

Statistic and

Expected Absolut Frequency ( N i −e^i )2 d 2 T 0 ( X )=∑i=1 → χ K −(1+ s)=χ 2K −1−s e^i K

X 1 ,... , X n Goodness-of-Fit

K classes 1 model F0

H0: The sample comes from the model F0

where s parameters are estimated and

êi=n p̂ i=n P θ̂ (i th class) or, if no parameter is estimated, s = 0 and

e i=n p i=n P θ (i th class)

215

Solved Exercises and Problems of Statistical Inference

Homogeneity

{

X 11 , ... , X 1n X 21 , ... , X 2n ⋮ X L1 ,... , X ln

( N ij −êij ) 2 T 0 ( X )=∑i=1 ∑ j=1 êij L

1

2

d

H0: The samples come from the same model

L

K

→ χ 2KL−(L+ K −1) = χ2(K −1 )(L−1) where

K classes L samples

êij =n i p̂ij =ni p̂ j =ni

( N ij − êij )2 T 0 ( X ,Y )=∑i =1 ∑ j =1 êij L

( X 1 , Y 1) ⋮ ( X n , Y n)

Independence

N ⋅j n

H0: The bivariate sample comes from two independent models

KL classes 2 variables

d

K

2

2

→ χ KL−(L−1+ K −1+1) = χ(K −1)( L−1) where

êij =n p̂ij =n p̂ i p̂ j =n

N i⋅ N ⋅j n n

Remark 6T: Although because of different theoretical reasons, for the practical estimation of eij the same mnemonic rule can be used in both homogeneity and independence tests: for each position, multiply the absolut frequencies of the row and the column and divide by the total number of elements n.

Kolmogorov-Smirnov Tests

Data

Null Hypothesis

T 0 ( X )=max x∣F n ( x)− F 0 ( x)∣

X 1 ,... , X n Goodness-of-Fit

Statistic

H0: The sample comes from the model F0

1 sample 1 model F0

where

F 0 (x) = P( X ≤ x) 1 F n (x )= Number { X i≤ x } n

T 0 ( X ,Y )=maxt∣F n (t)− F n (t )∣ X

Homogeneity

{

X 1 ,... , X n Y 1 , ... ,Y n

where

X

Y

H0: The samples come from the same model

Y

1 Number { X i ≤t } nX 1 F n (t) = Number {Y i≤t } nY

F n (t) = X

2 samples

Y

Other Tests

Null Hypothesis

Data

Statistic Let R be the number of runs.

T 0 ( X )=R X 1 ,... , X n Runs Test (of Randomness)

1 dichotomous property Nyes elements with it Nno = n–Nyes elements without it

if Nyes < 20, Nno < 20,

and using the specific table. Or, for Nyes ≥ 20, Nno ≥ 20, H0: The sample is simple and random T ( X )−μ d (it has been selected T̃ 0 ( X )= 0 → N (0,1) by applying simple σ2 random sampling) with



μ=

2 n1 n 2 +1 n 1+ n2

σ 2=

2 n1 n 2 (2 n 1 n2−n1−n2 ) 2 (n1+ n 2) ( n1 +n 2−1)

and using the table of the standard normal distribution

216

Solved Exercises and Problems of Statistical Inference

T 0 ( X )=Number { X i −q 0 >0 } Signs Test (of Position)

X 1 ,... , X n 1 model F0 1 position measure Q (e.g. the median)

H0: The population measure Q takes de value q0

if n < 20, and using the specific table or the table of the Binomial(n,p), where p depends on Q (e.g. 1/2 for the median). Or, for n ≥ 20,

T ( X )−μ T̃ 0 ( X )= 0 → N (0,1) √σ2 d

with

σ 2=n p(1− p)

μ=np

and using the table of the standard normal distribution

T 0 ( X )= ∑ { X −q >0 } Ri i

Wilcoxon Signed-Rank Test (of Position)

X 1 ,... , X n 1 model F0 1 position measure Q (e.g. the median)

H0: The population measure Q takes de value q0

0

if n < 20, where Ri are the positions in the increasing sequence of |Xi – q0|, and using the specific table. Or, for n ≥ 20,

T ( X )−μ T̃ 0 ( X )= 0 → N (0,1) √σ2 d

with

μ=

n( n+1) 4

2

σ =

n( n+1)(2n +1) 24

and using the table of the standard normal distribution Remark 7s: In the statistics, the parameter of interest is the unknown for confidence intervals while it is supposed to be known for hypothesis tests. Remark 8s: Usually the estimators involved in the statistic T (like s, S...) and the quantiles (like a ...) also depend on the sample size n, although the notation is simplified. Remark 9s: For big sample sizes, when the Central Limit Theorem can be applied to T or its standardization, quantiles or probabilities that are not tabulated can be approximated: p is directly calculated given a, and for p given a is calculated from the quantile z of the standard normal distribution:

(

p=P (T ≤a)=P Z ≤

a−E (T ) √ Var ( T )

)

z=

a− E( T ) √ Var (T )

a=E (T )+ z √ Var (T )

This is used in the asymptotic approximations proposed in the tests of the last table. Remark 10s: To consider the approximations, sample sizes bigger than 20 has been proposed in the last table, although it is possible to find other cutoff values in literature (like 8, 10 or 30); in practice, there is no severe change at any value. Remark 11s: The goodness-of-fit chi-square test can also be used to test position measures: by considering two classes with probabilities (p,1–p). Remark 12s: To test the symmetry of a distribution, the position tests can be used. Remark 13s: Although different types of test can be applied to evaluate the same hypotheses H0 and H1 with the same α (type I error), their quality is usually different, and β (type II error) should be taken into account. A global comparison can be done by using their power functions.

My notes:

217

Solved Exercises and Problems of Statistical Inference

Probability Tables Standard Normal

z

p=P ( Z ≤z )=∫−∞

z

2

1 −2 e dz for x ∈(−∞ ,+∞)=ℝ √2 π

(Taken from: Kokoska, S., and C. Nevison. Statistical Tables and Formulae. Springer-Verlag, 1989.)

218

Solved Exercises and Problems of Statistical Inference

t

+∞

p=P ( X > x )=∫x f (x )dx for x ∈(−∞ ,+∞ )

(Taken from: Newbold, P., W. Carlson and B. Thorne. Statistics for Business and Economics. Pearson-Prentice Hall.)

219

Solved Exercises and Problems of Statistical Inference

χ2

+∞

p=P ( X > x )=∫x f (x ) dx for x ∈[ 0,+∞)

(Taken from: Newbold, P., W. Carlson and B. Thorne. Statistics for Business and Economics. Pearson-Prentice Hall.)

220

Solved Exercises and Problems of Statistical Inference

F

+∞

p=P ( X > x )=∫x

f ( x ) dx

for x ∈[ 0,+∞)

(Taken from: Newbold, P., W. Carlson and B. Thorne. Statistics for Business and Economics. Pearson-Prentice Hall.)

221

Solved Exercises and Problems of Statistical Inference

My notes:

222

Solved Exercises and Problems of Statistical Inference

Index (These references include only the most important concepts involved in each exercise.)

algebra, 4m analysis complex, 3pt real, 1m, 2m, 3m, 4m analysis of variance, 1ht-av ANOVA → analysis of variance asymptoticness, 3pe-p, 1ci-m, 2ci-m, 3ci-m, 4ci-m, 2ci (see also 'consistency') basic estimators, 12pe-p, 13pe-p (see also 'sample mean', 'sample variance', 'sample quasivariance', 'sample proportion') bind, 1m bound, 5pe-p, 1m (see also Cramér-Rao's lower bound) Bernoulli distribution, 1pe-m, 3pe-p, 12pe-p, 14pe-p, 3ci-m, 4ci-m, 6ht-T, 1ht-Λ, 1ht, 3pe-ci-ht, 3pt (see also 'binomial distribution') binomial distribution, 1pe-m, 1pt, 3pt characteristic function, 3pt Chebyshev's inequality, 1ci-s, 1ci, 2ci, 3ci, 4ci chi-square distribution, 7pe-p, 1pt chi-square tests, goodness-of-fit, 2ht-np, 3ht-np, 1ht homogeneity, 3ht-np independence, 1ht-np, 3ht-np cook → statistical cook critical region, 1ht-T, 2ht-T, 3ht-T, 4ht-T, 5ht-T, 6ht-T, 1ht-Λ, 1ht-av, 1ht-np, 2ht-np, 3ht-np, 1ht, 3pe-ci-ht, 4pe-ci-ht critical values → critical region completion, 2pe-p, 4pe-p, 5pe-p standardization, 1pe-p, 3pe-p, 4pe-p, 2pt complex analysis, 3pt confidence intervals, 1ci-m, 2ci-m, 3ci-m, 4ci-m, 1ci-s, 1ci, 2ci, 3ci, 4ci, 1pe-ci-ht, 2pe-ci-ht, 3pe-ci-ht consistency, 6pe-p, 7pe-p, 9pe-p, 10pe-p, 12pe-p, 13pe-p, 14pe-p, 1pe, 2pe, 3pe convergence → rate of convergence coordinates rectangular, 4m polar, 1m, 3m Cramér-Rao's lower bound, 9pe-p density function → probability function (see the continuous probability distributions) differential equation, 3pt efficiency, 9pe-p, 10pe-p, 3pe (see also 'relative efficiency') exponential distribution, 3pe, 1ht-Λ, 3pt two-parameter (or translated), 6pe-m exponential function, 1m factorization theorem, 11pe-p, 3pe F distribution, 1pt frontier, 4m Fubini's theorem, 1m generating functions → probability generating function → moments generating function → characteristic function 223

Solved Exercises and Problems of Statistical Inference

geometric distribution, 2pe-m, 11pe-p, 3pt geometry, 4m goodness-of-fit → chi-square tests homogeneity → chi-square tests hypothesis tests, 1ht-T, 2ht-T, 3ht-T, 4ht-T, 5ht-T, 6ht-T, 1ht-Λ, 1ht-av, 1ht-np, 2ht-np, 3ht-np, 1ht, 3pe-ci-ht, 4pe-ci-ht independence → chi-square tests indeterminate form, 2m, 3m inference theory, 1it-spd integral equation, 3pt integral improper, 3pt, 1m multiple, 1m integration directly, 5pe-m, 7pe-m, 3pt by parts, 6pe-m, 3pt by substitution, 3pt, 1m joint distribution, 1it-spd likelihood function, 11pe-p, 3pe likelihood ratio tests, 1ht-Λ, 4pe-ci-ht limits, 2m, 3m linear algebra, 4m margin of error, 1ci-m, 2ci-m, 1ci-s, 1ci, 2ci, 3ci, 4ci mass function → probability function (see the discrete probability distributions) maximum likelihood method, 1pe-m, 2pe-m, 3pe-m, 4pe-m, 5pe-m, 6pe-m, 7pe-m, 1pe, 2pe, 3pe, 4pe-ci-ht mean square error, 6pe-p, 7pe-p, 8pe-p, 9pe-p, 12pe-p, 13pe-p, 14pe-p, 1pe, 2pe method of the moments, 1pe-m, 2pe-m, 3pe-m, 4pe-m, 5pe-m, 6pe-m, 7pe-m, 1pe, 2pe, 3pe, 4pe-ci-ht method of the pivot, 1ci-m, 2ci-m, 3ci-m, 4ci-m, 1ci-s, 1ci, 2ci, 3ci, 4ci, 1pe-ci-ht, 2pe-ci-ht, 3pe-ci-ht minimum sample size, 1ci-s, 1ci, 2ci, 3ci, 4ci moment generating function, 3pt moment (see 'population moment' and 'sample moment') movement, 4m Neyman-Pearson's lemma, 1ht-Λ, 4pe-ci-ht normal distribution, 4pe-m, 1pe-p, 2pe-p, 4pe-p, 5pe-p, 14pe-p, 1ci-m, 2ci-m, 1ci-s, 1ci, 2ci, 3ci, 4ci, 1ht-T, 2ht-T, 3ht-T, 4ht-T, 5ht-T, 1ht-Λ, 1ht-av, 1pe-ci-ht, 2pe-ci-ht, 1pt, 2pt, 3pt

normality, 12pe-p, 13pe-p point estimations, 1pe-m, 2pe-m, 3pe-m, 4pe-m, 5pe-m, 6pe-m, 1pe-p, 2pe-p, 3pe-p, 4pe-p, 5pe-p, 6pe-p, 7pe-p, 8pe-p, 9pe-p, 10pe-p, 11pe-p, 12pe-p, 13pe-p, 14pe-p, 1pe, 2pe, 3pe, 1pe-ci-ht, 2pe-ci-ht, 4pe-ci-ht

Poisson distribution, 3pe-m, 1ht-Λ, 1pt, 3pt polar coordinates, 1m, 3m, 12pe-p pooled sample proportion → sample proportion pooled sample variance → sample variance population mean, 12pe-p, 1ht-T population moment, raw or crude, 3pt population proportion, 12pe-p, 6ht-T, 3pe-ci-ht population standard deviation → population variance population variance, 12pe-p, 13pe-p, 2ht-T, 3ht-T, 4ht-T, 5ht-T position signs test, 1ht power function, 1ht-T, 2ht-T, 3ht-T, 4ht-T, 5ht-T, 6ht-T, 1ht, 3pe-ci-ht probability, 1pe-p, 2pe-p, 3pe-p, 4pe-p, 5pe-p, 2pe-ci-ht, 1pt, 2pt, 3pt probability function, 1it-spd, 10pe-p, 1pt probability generating function, 3pt probability tables, 1pt plug-in principle, 1pe-m, 2pe-m, 3pe-m, 5pe-m, 6pe-m, 7pe-m, 3pe, 4pe-ci-ht p-value, 1ht-T, 2ht-T, 3ht-T, 4ht-T, 5ht-T, 6ht-T, 1ht-av, 1ht-np, 2ht-np, 3ht-np, 1ht, 3pe-ci-ht 224

Solved Exercises and Problems of Statistical Inference

quantile, 4pe-p, 1pt Rayleigh distribution, 2pe rate of convergence, 6pe-p, 12pe-p, 13pe-p, 14pe-p relative efficiency, 8pe-p (see also 'efficiency') rotation, 4m sample mean, 1it-spd, 1pe-p, 4pe-p, 9pe-p, 10pe-p, 3pe, 2pt trimmed, 6pe-p sample moment (see 'method of the moments') sample proportion, 3pe-p pooled, 14pe-p, 4ci-m sample quasivariance, 2pe-p, 4pe-p sample variance pooled, 14pe-p, 1pe-ci-ht, 2pe-ci-ht sample size minimum → minimum sample size sample standard deviation → sample variance sampling distribution, 1it-spd sequence, 2m, 3m, 12pe-p, 13pe-p, 14pe-p (see 'rate of convergence') series, 3pt statistical cook, 4ht-T standard power function density, 4pe-ci-ht sufficiency, 11pe-p, 3pe table of frequencies, 1ht-np, 2ht-np, 3ht-np, 1ht t distribution, 1pe-ci-ht, 1pt total sum, 5pe-p, 2pt type I error, 1ht-T, 2ht-T, 3ht-T, 4ht-T, 5ht-T, 6ht-T, 1ht-av, 1ht-np, 2ht-np, 3ht-np, 1ht, 3pe-ci-ht type II error, 1ht-T, 2ht-T, 3ht-T, 4ht-T, 5ht-T, 6ht-T, 1ht-av, 1ht, 3pe-ci-ht unbiasedness, 10pe-p (see also 'consistency') uniform distribution continuous, 5pe-m, 10pe-p, 1pt discrete, 1pt

My notes:

225

Solved Exercises and Problems of Statistical Inference