Concentration Inequalities from Likelihood Ratio Method

6 downloads 0 Views 416KB Size Report
Sep 1, 2014 - 3.10 Lagrangian Logarithmic Distribution . ... 3.13 Logarithmic Distribution . ... 3.21 Truncated Exponential Distribution .
Concentration Inequalities from Likelihood Ratio

arXiv:1409.6276v1 [math.ST] 1 Sep 2014

Method



Xinjia Chen September 2014

Abstract We explore the applications of our previously established likelihood-ratio method for deriving concentration inequalities for a wide variety of univariate and multivariate distributions. New concentration inequalities for various distributions are developed without the idea of minimizing moment generating functions.

Contents 1 Introduction

3

2 Likelihood Ratio Method 2.1 General Principle . . . . . . . . . . . . . . . 2.2 Construction of Parameterized Distributions 2.2.1 Weight Function . . . . . . . . . . . 2.2.2 Parameter Restriction . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

4 4 5 5 6

3 Concentration Inequalities for Univariate Distributions 3.1 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . 3.2 Beta Negative Binomial Distribution . . . . . . . . . . . . 3.3 Beta-Prime Distribution . . . . . . . . . . . . . . . . . . . 3.4 Borel Distribution . . . . . . . . . . . . . . . . . . . . . . 3.5 Consul Distribution . . . . . . . . . . . . . . . . . . . . . 3.6 Geeta Distribution . . . . . . . . . . . . . . . . . . . . . . 3.7 Gumbel Distribution . . . . . . . . . . . . . . . . . . . . . 3.8 Inverse Gamma Distribution . . . . . . . . . . . . . . . . . 3.9 Inverse Gaussian Distribution . . . . . . . . . . . . . . . . 3.10 Lagrangian Logarithmic Distribution . . . . . . . . . . . . 3.11 Lagrangian Negative Binomial Distribution . . . . . . . . 3.12 Laplace Distribution . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

7 7 7 8 8 8 9 9 9 10 10 10 11

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

∗ The author is afflicted with the Department of Electrical Engineering and Computer Science at Louisiana State University, Baton Rouge, LA 70803, USA, and the Department of Electrical Engineering, Southern University and A&M College, Baton Rouge, LA 70813, USA; Email: [email protected]

1

3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23

Logarithmic Distribution . . . . . . . Lognormal Distribution . . . . . . . Nakagami Distribution . . . . . . . . Pareto Distribution . . . . . . . . . . Power-Law Distribution . . . . . . . Stirling Distribution . . . . . . . . . Snedecor’s F-Distribution . . . . . . Student’s t-Distribution . . . . . . . Truncated Exponential Distribution Uniform Distribution . . . . . . . . . Weibull Distribution . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

11 11 12 12 12 13 13 13 14 14 14

4 Concentration Inequalities for Multivariate Distributions 4.1 Dirichlet-Compound Multinomial Distribution . . . . . . . 4.2 Inverse Matrix Gamma Distribution . . . . . . . . . . . . . 4.3 Multivariate Normal Distribution . . . . . . . . . . . . . . . 4.4 Multivariate Pareto Distribution . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

15 15 15 16 16

5 Conclusion

17

A Proofs of Univariate Inequalities A.1 Proof of Theorem 2 . . . . . . . A.2 Proof of Theorem 3 . . . . . . . A.3 Proof of Theorem 4 . . . . . . . A.4 Proof of Theorem 5 . . . . . . . . A.5 Proof of Theorem 6 . . . . . . . . A.6 Proof of Theorem 7 . . . . . . . . A.7 Proof of Theorem 8 . . . . . . . . A.8 Proof of Theorem 9 . . . . . . . . A.9 Proof of Theorem 10 . . . . . . . A.10 Proof of Theorem 11 . . . . . . . A.11 Proof of Theorem 12 . . . . . . . A.12 Proof of Theorem 13 . . . . . . . A.13 Proof of Theorem 14 . . . . . . . A.14 Proof of Theorem 15 . . . . . . . A.15 Proof of Theorem 16 . . . . . . . A.16 Proof of Theorem 17 . . . . . . . A.17 Proof of Theorem 18 . . . . . . . A.18 Proof of Theorem 19 . . . . . . . A.19 Proof of Theorem 20 . . . . . . . A.20 Proof of Theorem 21 . . . . . . . A.21 Proof of Theorem 22 . . . . . . . A.22 Proof of Theorem 23 . . . . . . . A.23 Proof of Theorem 24 . . . . . . . A.24 Proof of Theorem 25 . . . . . . .

17 17 18 20 21 22 23 24 24 26 27 28 29 29 30 31 32 33 34 34 35 36 36 37 38

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

2

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

B Proofs of Multivariate Inequalities B.1 Proof of Theorem 26 . . . . . . . . B.2 Proof of Theorem 27 . . . . . . . . B.3 Proof of Theorem 28 . . . . . . . . B.4 Proof of Theorem 29 . . . . . . . .

1

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

39 39 40 41 42

Introduction

Bounds for probabilities of random events play important roles in many areas of engineering and sciences. Formally, let E be an event defined in probability space (Ω, Pr, F ), where Ω is the sample space, Pr denotes the probability measure, and F is the σ-algebra. A frequent problem is to obtain simple bounds as tight as possible for Pr{E}. In general, the event E can be expressed in terms of a matrix-valued random variable X . In particular, X can be a random vector or scalar. Clearly, the event E can be represented as {X ∈ E }, where E is a certain set of deterministic matrices. In probability theory, a conventional approach for deriving inequalities for Pr{E} is to bound the indicator function I{X ∈E } by a family of random variables having finite expectation and minimize the expectation. The central idea of this approach is to seek a family of bounding functions w(X , ϑ) of X , parameterized by ϑ ∈ Θ, such that I{X ∈E } ≤ w(X , ϑ) for all ϑ ∈ Θ.

(1)

Here, the notion of inequality (1) is that the inequality I{X (ω)∈E } ≤ w(X (ω), ϑ) holds for every ω ∈ Ω. As a consequence of the monotonicity of the mathematical expectation E[.], Pr{E} = E[I{X ∈E } ] ≤ E[w(X , ϑ)]

for all ϑ ∈ Θ.

(2)

Minimizing the upper bound in (2) with respect to ϑ ∈ Θ yields Pr{E} ≤ inf E[w(X , ϑ)]. ϑ∈Θ

(3)

Classical concentration inequalities such as Chebyshev inequality and Chernoff bounds [2] can be derived by this approach with various bounding functions w(X , ϑ), where X is a scalar random variable. We call this technique of deriving probabilistic inequalities as the mathematical expectation (ME) method, in view of the crucial role played by the mathematical expectation of bounding functions. For the ME method to be successful, the mathematical expectation E[w(X , ϑ)] of the family of bounding functions w(X , ϑ), ϑ ∈ Θ must be convenient for evaluation and minimization. The ME method is a very general approach. However, it has two drawbacks. First, in some situations, the mathematical expectation E[w(X , ϑ)] may be intractable. Second, the ME method may not fully exploit the information of the underlying distribution, since the mathematical expectation is only a quantity of summary for the distribution. Recently, we have proposed in [3, 4, 5] a more general approach for deriving probabilistic inequalities, aiming at overcoming the drawbacks of the ME method. Let f (.) denote the probability density function (pdf) or probability mass function (pmf) of X . The primary idea of the proposed approach is to seek a family of pdf or pmf g(., ϑ), parameterized by ϑ ∈ Θ, and a deterministic function Λ(ϑ) of ϑ ∈ Θ such that for all ϑ ∈ Θ, the indicator function I{X ∈E } is bounded from above by the product of Λ(ϑ) and the ,ϑ) likelihood ratio g(X f (X ) . Then, the probability Pr{X ∈ E } is bounded from above by the infimum of Λ(ϑ) with respect to ϑ ∈ Θ. Due to the central role played by the likelihood ratio, this technique of deriving probabilistic inequalities is referred to as the likelihood ratio (LR) method. It has been demonstrated in [4] that the ME method is actually a special technique of the LR method. 3

In this paper, we shall apply the LR method to investigate the concentration phenomenon of random variables. Our goal is to derive simple and tight concentration inequalities for various distributions. The remainder of the paper is organized as follows. In Section 2, we introduce the fundamentals of the LR method. In Section 3, we apply the LR method to the development of concentration inequalities for univariate distributions. In Section 4, we apply the LR method to establish concentration inequalities for multivariate distributions. Section 5 is the conclusion. Most proofs are given in Appendices. Throughout this paper, we shall use the following notations. Let IE denote the indicator function such  that IE = 1 if E is true and IE = 0 otherwise. We use the notation kt to denote a generalized combinatoric number in the sense that     Qk (t − ℓ + 1) Γ(t + 1) t t = , = 1, = ℓ=1 k! Γ(k + 1) Γ(t − k + 1) 0 k where t is a real number and k is aPnon-negative integer. We use X n to denote the average of random n Xi . The notation ⊤ denotes the transpose of a matrix. The trace variables X1 , · · · , Xn , that is, X n = i=1 n of a matrix is denoted by tr. We use pdf and pmf to represent probability density function and probability mass function, respectively. The other notations will be made clear as we proceed.

2

Likelihood Ratio Method

In this section, we shall introduce the LR method for deriving probabilistic inequalities.

2.1

General Principle

Let E be an event which can be expressed in terms of matrix-valued random variable X , where X is defined on the sample space Ω and σ-algebra F such that the true probability measure is one of two measures Pr and Pϑ . Here, the measure Pr is determined by pdf or pmf f (.). The measure Pϑ is determined by pdf or pmf g(., ϑ), which is parameterized by ϑ ∈ Θ. The subscript in Pϑ is used to indicate the dependence on the parameter ϑ. Clearly, there exists a set, E , of deterministic matrices of the same size as X such that E = {X ∈ E }. The LR method for obtaining an upper bound for the probability Pr{E} is based on the following general result. Theorem 1 Assume that there exists a function Λ(ϑ) of ϑ ∈ Θ such that f (X ) I{X ∈E } ≤ Λ(ϑ) g(X , ϑ)

for all ϑ ∈ Θ.

(4)

Then, Pr{E} ≤ inf Λ(ϑ) Pϑ {E} ≤ inf Λ(ϑ). ϑ∈Θ

ϑ∈Θ

(5)

In particular, if the infimum of Λ(ϑ) is attained at ϑ∗ ∈ Θ, then Pr{E} ≤ Pϑ∗ {E} Λ(ϑ∗ ).

(6)

The notion of the inequality in (4) is that f (X (ω)) I{X (ω)∈E } ≤ Λ(ϑ) g(X (ω), ϑ) for every ω ∈ Ω. The function Λ(ϑ) in (4) is referred to as likelihood-ratio bounding function. Theorem 1 asserts that the probability of event E is no greater than the likelihood ratio bounding function.

4

2.2

Construction of Parameterized Distributions

In the sequel, we shall introduce two approaches for constructing parameterized distributions g(., ϑ) which are essential for the application of the LR method. 2.2.1

Weight Function

A natural approach to construct parameterized distribution g(., ϑ) is to modify the pdf or pmf f (.) by multiplying it with a parameterized function and performing a normalization. Specifically, let w(., ϑ) be a non-negative function with parameter ϑ ∈ Θ such that E[w(X , θ)] < ∞ for all ϑ ∈ Θ, where the expectation is taken under the probability measure Pr determined by f (.). Define a family of distributions as w(X , ϑ) f (X ) g(X , ϑ) = E[w(X , ϑ)] for ϑ ∈ Θ and X in the range of X . In view of its role in the modification of f (.) as g(., ϑ), the function w(., ϑ) is called a weight function. Note that f (X ) w(X , ϑ) = E[w(X , ϑ)] g(X , ϑ) for all ϑ ∈ Θ.

(7)

For simplicity, we choose the weight function such that the condition (1) is satisfied. Combining (1) and (7) yields f (X ) I{X ∈E } ≤ f (X ) w(X , ϑ) = E[w(X , ϑ)] g(X , ϑ) for all ϑ ∈ Θ. Thus, the likelihood ratio bounding function can be taken as Λ(ϑ) = E[w(X , ϑ)] for ϑ ∈ Θ. It follows from Theorem 1 that Pr{E} ≤ inf Λ(ϑ) Pϑ {E} ≤ inf Λ(ϑ). ϑ∈Θ

ϑ∈Θ

Thus, we have demonstrated that the ME method is actually a special technique of the LR method. By constructing a family of parameterized distributions and making use of the LR method, we have obtained the following result. Theorem 2 Let X be a random variable with moment generating function φ(.). Let X1 , · · · , Xn be i.i.d. samples of X. Let CBE be the absolute constant in the Berry-Essen inequality. Then,    n 1 + ∆ e−zτ φ(τ ) , Pr{X n ≥ z} ≤ 2 where

∆ = min with τ satisfying

φ′ (τ ) φ(τ )

(

1 CBE , √ 2 n



φ(τ )[φ′′′′ (τ ) − 4zφ′′′ (τ )] + 3[φ′′ (τ )]2 −3 [φ′′ (τ ) − z 2 φ(τ )]2

 43 )

= z.

See Appendix A.1 for a proof. Note that ∆ → 0 as n → ∞. So, for large n, the above bound is twice tighter than the classical Chernoff bound.

5

2.2.2

Parameter Restriction

In many situations, the pdf or pmf f (.) of X comes from a family of distributions parameterized by θ ∈ Θ. If so, then the parameterized distribution g(., ϑ) can be taken as the subset of pdf or pmf with parameter ϑ contained in a subset Θ of parameter space Θ. By appropriately choosing the subset Θ, the deterministic function Λ(ϑ) may be readily obtained. As an illustrative example, consider the normal distribution. A random variable X is said to have a normal distribution with mean µ and variance σ 2 if it possesses a probability density function   |x − µ|2 1 . exp − fX (x) = √ 2σ 2 2πσ Let X1 , · · · , Xn be i.i.d. samples of the random variable X. The following well-known inequalities hold true.   1 n(z − µ)2 Pr{X n ≤ z} ≤ exp − for z ≤ µ, (8) 2 2σ 2   1 n(z − µ)2 Pr{X n ≥ z} ≤ exp − for z ≥ µ. (9) 2 2σ 2 It should be noted that the factor 21 in these inequalities cannot be obtained by using conventional techniques of Chernoff bounds. By virtue of the LR method, we can provide an easy proof for inequalities (8) and (9). We proceed as follows. Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is   Pn (xi − µ)2 1 . exp − i=1 2 fX (x) = √ 2σ ( 2πσ)n To apply the LR method to show (8), we construct a family of probability density functions  Pn 2 1 i=1 (xi − ϑ) exp − gX (x, ϑ) = √ 2σ 2 ( 2πσ)n for ϑ ∈ (−∞, z] with z ≤ µ. It can be checked that n   fX (x) 2(ϑ − µ)xn + µ2 − ϑ2 = exp − . gX (x, ϑ) 2σ 2 For any ϑ ∈ (−∞, z], we have ϑ ≤ z ≤ µ and thus n   fX (x) 2(ϑ − µ)z + µ2 − ϑ2 ≤ exp − gX (x, ϑ) 2σ 2

∀ϑ ∈ (−∞, z] for xn ≤ z.

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) where

∀ϑ ∈ (−∞, z],

n  2(ϑ − µ)z + µ2 − ϑ2 . Λ(ϑ) = exp − 2σ 2 

By differentiation, it can be readily shown that the infimum of Λ(ϑ) with respect to ϑ ∈ (−∞, z] is equal to   n(z − µ)2 , exp − 2σ 2 6

which is attained at ϑ = z. By symmetry, it can be shown that Pz {X n ≤ z} =

1 . 2

Using these facts and invoking (6) of Theorem 1, we have Pr{X n ≤ z} ≤ Pz {X n ≤ z} Λ(z) for z ≤ µ. This implies that inequality (8) holds. In a similar manner, we can show inequality (9).

3

Concentration Inequalities for Univariate Distributions

In this section, we shall apply the LR method to derive bounds for tail probabilities for univariate distributions. Such bounds are referred to as concentration inequalities.

3.1

Beta Distribution

A random variable X is said to have a beta distribution if it possesses a probability density function f (x) =

1 xα−1 (1 − x)β−1 , B(α, β)

0 < x < 1,

α > 0,

β > 0,

where B(α, β) = Γ(α)Γ(β) Γ(α+β) . Let X1 , · · · , Xn be i.i.d. samples of the random variable X. Making use of the LR method, we have shown the following results. βz α . Define α b = 1−z and βb = α(1−z) . Then, Theorem 3 Let z ∈ (0, 1) and µ = E[X] = α+β z n   B(b α, β) z α for 0 < z ≤ µ, Pr X n ≤ z ≤ B(α, β) z αb #n " b (1 − z)β  B(α, β) Pr X n ≥ z ≤ for µ ≤ z < 1 . B(α, β) (1 − z)βb

(10) (11)

Specially, if β = 1, then

 Pr X n ≤ z ≤

 n 1 α eαz ln z

  1 for 0 < z < exp − . α

(12)

See Appendix A.2 for a proof.

3.2

Beta Negative Binomial Distribution

A random variable X is said to have a beta distribution if it possesses a probability mass function   n + x − 1 Γ(α + n)Γ(β + x)Γ(α + β) , x = 0, 1, 2, · · · f (x) = Pr{X = x} = Γ(α + β + n + x)Γ(α)Γ(β) x where α > 1 and β > 0 and n > 1. By virtue of the LR method, we have obtained the following results. Theorem 4 Let z be a nonnegative integer no greater than Pr{X ≤ z} ≤

nβ α−1 .

Then,

αz−z + n + z) Γ( αz−z n ) Γ(β + z) Γ(α + n . αz−z Γ(β) Γ( n + z) Γ(α + β + n + z)

See Appendix A.3 for a proof. 7

3.3

Beta-Prime Distribution

A random variable X is said to have a beta-prime distribution if it possesses a probability density function f (x) =

xα−1 (1 + x)−α−β , B(α, β)

x > 0,

α > 0,

β > 0.

Let X1 , · · · , Xn be i.i.d. samples of the random variable X. Making use of the LR method, we have obtained the following results. Theorem 5 Assume that β > 1 and 0 < z ≤  Pr X n ≤ z ≤

α β−1 .

"

Then, α+z−βz

B(βz − z, β) B(α, β)  n α  B(α, 1 + z ) α Pr X n ≤ z ≤ (1 + z)1+ z −β . B(α, β) z 1+z

#n

,

(13) (14)

See Appendix A.4 for a proof.

3.4

Borel Distribution

A random variable X is said to possess a Borel distribution if it has a probability mass function f (x) = Pr{X = x} =

(θx)x−1 e−θx , x!

x = 1, 2, · · · ,

where 0 < θ < 1. Let X1 , · · · , Xn be i.i.d. samples of the random variable X. Making use of the LR method, we have obtained the following result. Theorem 6 Pr{X n ≤ z} ≤

"

eθz 1−z

z−1

e

−θz

#n

for 1 < z
0, α > 0, β > 0. x exp − f (x) = Γ(α) x Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following results. Theorem 10 "

#n  β Γ( βz + 1) z z −α+1 β Pr{X n ≤ z} ≤ for 0 < z ≤ , Γ(α) β α−1 n   α β αz − β β exp for 0 < z ≤ . Pr{X n ≤ z} ≤ αz z α See Appendix A.9 for a proof.

9

(19) (20)

3.9

Inverse Gaussian Distribution

A random variable X is said to have an inverse Gaussian distribution if it possesses a probability density function   1/2  λ λ(x − θ)2 f (x) = , x>0 exp − 2πx3 2θ2 x

where λ > 0 and θ > 0. Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 11

  n λ λ λz Pr{X n ≤ z} ≤ exp − − θ 2z 2θ2

for 0 < z ≤ θ.

(21)

See Appendix A.10 for a proof.

3.10

Lagrangian Logarithmic Distribution

A random variable X is said to have a Lagrangian logarithmic distribution if it possesses a probability mass function f (x) = Pr{X = x} =

−θx (1 − θ)x(β−1) Γ(βx) , Γ(x + 1)Γ(βx − x + 1) ln(1 − θ)

x = 1, 2, · · ·

where 0 < θ ≤ θβ < 1. Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 12 "   #n z(β−1) z θ 1−θ ln(1 − ϑ) Pr{X n ≤ z} ≤ ϑ 1−ϑ ln(1 − θ) where ϑ satisfies the equation z =

for 0 < z ≤

θ , (βθ − 1) ln(1 − θ)

(22)

ϑ (βϑ−1) ln(1−ϑ) .

See Appendix A.11 for a proof.

3.11

Lagrangian Negative Binomial Distribution

A random variable X is said to have a Lagrangian logarithmic distribution if it possesses a probability mass function   αx + β x β θ (1 − θ)β+αx−x , x = 0, 1, 2, · · · f (x, θ) = Pr{X = x} = αx + β x where 0 < θ < 1, θ < αθ < 1 and β > 0. Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 13

where ϑ =

z β+αz .

"   β+αz−z #n z θ 1−θ Pr{X n ≤ z} ≤ ϑ 1−ϑ

See Appendix A.12 for a proof. 10

for 0 ≤ z ≤

βθ , 1 − αθ

(23)

3.12

Laplace Distribution

A random variable X is said to have a Lagrangian logarithmic distribution if it possesses a probability density function   |x − α| 1 , −∞ < x < ∞, exp − f (x) = 2β β where −∞ < α < ∞ and β > 0. Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following results. Theorem 14 

n  z−α z−α Pr{X n ≥ z} ≤ exp 1 − β β n   α−z α−z exp 1 − Pr{X n ≤ z} ≤ β β

for z ≥ α + β,

(24)

for z ≤ α − β.

(25)

See Appendix A.13 for a proof.

3.13

Logarithmic Distribution

A random variable X is said to have a logarithmic distribution if it possesses a probability mass function f (x) =

qx , −x ln p

x = 1, 2, · · ·

where p ∈ (0, 1) and q = 1 − p. Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 15



ln(1 − q)  q z Pr{X n ≤ z} ≤ ln(1 − ϑ) ϑ

where ϑ ∈ (0, q] is the unique number such that z =

n

for 0 < z ≤

ϑ (1−ϑ) ln

1 1−ϑ

q 1 , (1 − q) ln 1−q

(26)

.

See Appendix A.14 for a proof.

3.14

Lognormal Distribution

A random variable X is said to have a lognormal distribution if it possesses a probability density function   1 1 f (x) = √ x > 0, −∞ < µ < ∞, σ > 0. exp − 2 (ln x − µ)2 , 2σ x 2πσ Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 16

"

n Pr{X n ≤ z} ≤ exp − 2



µ − ln z σ

See Appendix A.15 for a proof.

11

2 #

for 0 < z ≤ eµ .

(27)

3.15

Nakagami Distribution

A random variable X is said to have a Nakagami distribution if it possesses a probability density function  2 x 2 x2m−1 exp − 2 , x>0 f (x) = 2m Γ(m) σ σ where m ≥ 12 and σ > 0. Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following results. Theorem 17

  2(m−ϑ) )n  ( Γ(ϑ + 12 ) Γ(ϑ) Γ(ϑ + 21 ) Pr X n ≤ for 0 < ϑ ≤ m, σ ≤ Γ(ϑ) Γ(m) Γ(ϑ)  2 m  n √ z z2 exp m − for z ≥ mσ. Pr{X n ≥ z} ≤ mσ 2 σ2

(28) (29)

See Appendix A.16 for a proof.

3.16

Pareto Distribution

A random variable X is said to have a Pareto distribution if it possesses a probability density function θ  a θ+1 f (x) = , x > a > 0, θ > 1. a x Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 18

"

Pr{X n ≤ ρµ} ≤ eθ where µ = E[X] =



θ−1 ρθ



ln



ρθ θ−1

#n

1 for 1 − < ρ ≤ θ

    1 1 exp , 1− θ θ

(30)

θa θ−1 .

See Appendix A.17 for a proof.

3.17

Power-Law Distribution

A random variable X is said to have a power-law distribution if it possesses a probability density function f (x) =

x−α , C(α)

C(α) =

  1−β 1−α

where β > 1, α ∈ R and

α−1

ln β

1 ≤ x ≤ β, for α 6= 1, for α = 1

Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 19 Let θ ≥ α > 1 and z =

θ−1 β θ−1 −β θ−2 β θ−1 −1 .

 Pr X n ≤ z ≤

See Appendix A.18 for a proof.



Then, α − 1 1 − β 1−θ θ−α z θ − 1 1 − β 1−α

12

n

.

(31)

3.18

Stirling Distribution

A random variable is said to have a Stirling distribution if it possesses a probability mass function Pr{X = x} =

m!|s(x, m)|θx , x![− ln(1 − θ)]m

0 < θ < 1,

x = m, m + 1, · · · ,

where s(x, m) is the Stirling number of the first kind, with arguments x and m. Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following result. Theorem 20  nm  nz  ln(1 − ϑ) θ Pr X n ≤ z ≤ ln(1 − θ) ϑ

where ϑ ∈ (0, θ] is the unique number such that z =

for z ≤

mθ , (θ − 1) ln(1 − θ)

(32)

mϑ (ϑ−1) ln(1−ϑ) .

See Appendix A.19 for a proof.

3.19

Snedecor’s F-Distribution

If random variable X has a probability density function of the form f (x) =

m m/2 (m−2)/2 x Γ( n+m 2 )( n ) , n m m Γ( 2 )Γ( 2 )(1 + n x)(n+m)/2

for

0 < x < ∞,

then the random variable X is said to possess an F -distribution with m and n degrees of freedom. Making use of the LR method, we have obtained the following results. Theorem 21 Pr{X ≥ z} ≤ z m/2 Pr{X ≤ z} ≤ z

m/2





n+m n + mz n+m n + mz

(n+m)/2 (n+m)/2

for z ≥ 1

(33)

for 0 < z ≤ 1.

(34)

See Appendix A.20 for a proof.

3.20

Student’s t-Distribution

If random variable X has a probability density function of the form f (x) = √

Γ( n+1 2 ) nπΓ( n2 )(1

+

x2 (n+1)/2 n )

,

for

− ∞ < x < ∞,

then the random variable X is said to possess a Student’s t-distribution with n degrees of freedom. By virtue of the LR method, we have obtained the following results. Theorem 22 Pr{|X| ≥ z} ≤ z Pr{|X| ≤ z} ≤ z





n+1 n + z2 n+1 n + z2

See Appendix A.21 for a proof. 13

(n+1)/2 (n+1)/2

for z ≥ 1,

(35)

for 0 < z ≤ 1.

(36)

3.21

Truncated Exponential Distribution

A random variable X is said to have a truncated exponential distribution if it possesses a probability density function θeθx f (x) = θ , θ 6= 0, 0 < x < 1. e −1 Let X1 , · · · , Xn be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained the following results. Theorem 23 

θ eϑ − 1 (θ−ϑ)z e Pr{X n ≤ z} ≤ ϑ eθ − 1

n

for 0 < z ≤ 1 +



1 1 1 − and z 6= , −1 θ 2

where ϑ ∈ (−∞, θ], ϑ 6= 0 satisfies equation z = 1 + eϑ1−1 − ϑ1 . Moreover,   θ/2 n  θe 1 ≤ for θ > 0. Pr X n ≤ 2 eθ − 1

(37)

(38)

See Appendix A.22 for a proof.

3.22

Uniform Distribution

Let X be a random variable uniformly distributed over interval [0, 1]. Let X1 , · · · , Xn be i.i.d. samples of the random variable X. By virtue of the LR method, we have obtained the following results. Theorem 24 Pr{X n ≥ z} ≤



eϑ − 1 ϑeϑz

n

2 !  1 ≤ exp −6n z − 2

where ϑ is a negative number such that z = 1 +

1 , 2

(39)

for 0 < z
z >

1 eϑ −1

− ϑ1 .

See Appendix A.23 for a proof.

3.23

Weibull Distribution

A random variable X is said to have a Weibull distribution if it possesses a probability density function  x > 0, α > 0, β > 0. f (x) = αβxβ−1 exp −αxβ ,

Let X1 , · · · , Xn be i.i.d. samples of the random variable X. By virtue of the LR method, we have obtained the following results. Theorem 25  n Pr{X n ≤ z} ≤ αz β exp(1 − αz β )  n Pr{X n ≥ z} ≤ αz β exp(1 − αz β )

See Appendix A.24 for a proof.

14

for αz β ≤ 1 and β < 1,

(41)

for αz ≥ 1 and β > 1.

(42)

β

4

Concentration Inequalities for Multivariate Distributions

In this section, we shall apply the LR method to derive concentration inequalities for the joint distributions of multiple random variables.

4.1

Dirichlet-Compound Multinomial Distribution

Random variables X1 , · · · , Xk are said to have a Dirichlet-compound multinomial distribution if they possess a probability mass function Pk   k Γ( ℓ=0 αℓ ) Y Γ(xℓ + αℓ ) n f (x) = , P x Γ(n + kℓ=0 αℓ ) Γ(αℓ ) ℓ=0

where



x = [x0 , x1 , · · · , xk ] , and

k X

  n! n = Qk x ℓ=0 xℓ !

xℓ = n

ℓ=0

with xℓ ≥ 0 and αℓ > 0 for ℓ = 0, 1, · · · , k. Based on the LR method, we have obtained the following result. Theorem 26 Assume that 0 < zℓ ≤

ℓ Pnα k i=0

αi

for ℓ = 1, · · · , k. Then,

Pk Pk k Γ( ℓ=0 αℓ ) Γ(n + ℓ=0 θℓ ) Y Γ(xℓ + αℓ ) Γ(θℓ ) Pr{Xℓ ≤ zℓ , ℓ = 1, · · · , k} ≤ Pk , Pk Γ( ℓ=0 θℓ ) Γ(n + ℓ=0 αℓ ) ℓ=1 Γ(xℓ + θℓ ) Γ(αℓ )

where θ0 = α0 and

θℓ = See Appendix B.1 for a proof.

4.2

n−

α0 zℓ Pk

i=1 zi

,

(43)

ℓ = 1, · · · , k.

Inverse Matrix Gamma Distribution

A positive-definite random matrix X is said to have an inverse matrix gamma distribution [9] if it possesses a probability density function   |Ψ|α 1 f (x) = pα |x|−α−(p+1)/2 exp − tr(Ψx−1 ) , β Γp (α) β where β > 0 is the scale parameter, Ψ is a positive-definite real matrix of size p × p. Here x is a positivedefinite matrix of size p × p, and Γp (.) is the multivariate gamma function. The inverse matrix gamma distribution reduces to the Wishart distribution with β = 2, α = n2 . Let 4 denote the relationship of two matrices A and B of the same size such that A 4 B implies that B − A is positive definite. By virtue of the LR method, we have obtained the following result. Theorem 27

    1 p 1 Pr {X 4 ρΥ} ≤ pα exp − − 1 (2α − p − 1) ρ 2 ρ

where Υ = E[X] =

Ψ 2 β 2α−p−1

is the expectation of X.

See Appendix B.2 for a proof. 15

for 0 < ρ < 1,

(44)

4.3

Multivariate Normal Distribution

A random vector X is said to have a multivariate normal distribution if it possesses a probability density function   1 f (x) = (2π)−k/2 |Σ|−1/2 exp − (x − µ)⊤ Σ−1 (x − µ) , 2 where k is the dimension of X, x is a vector of k elements, µ is the expectation of X, and Σ is the covariance matrix of X. Let X 1 , · · · , X n be i.i.d. samples of X. Define Pn Xi . X n = i=1 n Let < denote the relationship of two vectors A = [a1 , · · · , ak ] and B = [b1 , · · · , bk ] such that A < B implies aℓ ≥ bℓ , ℓ = 1, · · · , k. By virtue of the LR method, we have obtained the following result. Theorem 28

  n 1 Pr{X n < z} ≤ exp µ⊤ Σ−1 z − [z ⊤ Σ−1 z + µ⊤ Σ−1 µ] 2

(45)

provided that Σ−1 z < Σ−1 µ. See Appendix B.3 for a proof.

4.4

Multivariate Pareto Distribution

Random variables X1 , · · · , Xk are said to have a multivariate Pareto distribution if they possess a probability density function f (x1 , · · · , xk ) =

k Y α+i−1 βi i=1

!

1−k+

k X xi i=1

βi

!−(α+k)

,

xi > βi > 0,

α > 0.

Let X = [X1 , · · · , Xk ]⊤ . Let z = [z1 , · · · , zk ]⊤ . Let X1 , · · · , Xn be i.i.d. samples of random vector X. Define Pn Xi Xn = i=1 . n

Let the notation “” denote the relationship of two vectors A = [a1 , · · · , ak ]⊤ and B = [b1 , · · · , bk ]⊤ such that A  B means aℓ ≤ b, ℓ = 1, · · · , k. By virtue of the LR method, we have the following results. Theorem 29 Let zℓ > βℓ , ℓ = 1, · · · , k. The following statements hold true. (I): The inequality  ! !θ−α n k k Y X α+i−1 zi  Pr{Xn  z} ≤  1−k+ θ + i − 1 β i=1 i=1 i

(46)

holds for any θ > α. (II): The inequality (46) holds for θ such that k−1 X ℓ=0

k X zi 1 = ln 1 − k + θ+ℓ β i=1 i

16

!

(47)

provided that k−1 X ℓ=0

(III): The inequality (46) holds for

k X zi 1 > ln 1 − k + α+ℓ β i=1 i

θ =1+  P k 1

zi i=1 βi

k

provided that α > 1 and

1 k

Pk

zi i=1 βi


0. As a consequence of this By differentiation, it can be shown that ln 1+x fact, we have α−ϑ  n(α−ϑ) n  Y xi xn ≤ ∀ϑ ∈ (0, α], 1 + xi 1 + xn i=1

where xn =

Pn

i=1

xi

n

. Since

x 1+x

is an increasing function of x > 0, it follows that

n(α−ϑ) α−ϑ  n  Y xi z ≤ 1 + xi 1+z i=1

∀ϑ ∈ (0, α]

provided that 0 ≤ xn ≤ z.

Therefore, we have established that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) where

"

B(ϑ, β) Λ(ϑ) = B(α, β) Invoking Theorem 1, we have



 Pr X n ≤ z ≤

As a consequence of β > 1 and 0 < z ≤

α β−1 ,

z 1+z

∀ϑ ∈ (0, α],

α−ϑ #n

.

inf Λ(ϑ).

ϑ∈(0,α]

we have 0 < z(β − 1) ≤ α. Hence,

"  α+z−βz #n  z B(βz − z, β) . Pr X n ≤ z ≤ Λ(βz − z) = B(α, β) 1+z

This proves (13). To apply the LR method to show (14), we construct a family of probability density functions Qn [xα−1 (1 + xi )−α−ϑ ] , ϑ ∈ [β, ∞). gX (x, ϑ) = i=1 i [B(α, ϑ)]n It can be seen that

n n  fX (x) B(α, ϑ) Y (1 + xi )ϑ−β , = gX (x, ϑ) B(α, β) i=1 21

ϑ ∈ [β, ∞).

By differentiation, it can be shown that ln(1 + x) is a concave function of x > 0. As a consequence of this fact, we have n Y (1 + xi )ϑ−β ≤ (1 + xn )n(ϑ−β) ≤ (1 + z)n(ϑ−β) , ϑ ∈ [β, ∞) i=1

provided that 0 ≤ xn ≤ z. Hence, we have that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ [β, ∞), where n  B(α, ϑ) ϑ−β (1 + z) . Λ(ϑ) = B(α, β)  α , we Making use of Theorem 1, we have Pr X n ≤ z ≤ inf ϑ≥β Λ(ϑ). As a consequence of 0 < z ≤ β−1 α have 1 + z ≥ β. Hence, n   B(α, 1 + αz ) α −β 1+ α z Pr X n ≤ z ≤ Λ 1 + = (1 + z) z B(α, β) 



for 0 < z ≤

α . β−1

This proves (14). The proof of the theorem is thus completed.

A.5

Proof of Theorem 6

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability mass function of X is fX (x) =

n Y (θxi )xi −1 e−θxi . xi ! i=1

To apply the LR method to show (15), we construct a family of probability mass functions gX (x, ϑ) =

n Y (ϑxi )xi −1 e−ϑxi

i=1

xi !

,

ϑ ∈ (0, θ].

It can be seen that fX (x) = gX (x, ϑ) where xn =

Pn

i=1

n

xi

"  #n   n x −1 θ n ϑ exp ((ln θ − θ − ln ϑ + ϑ) xn ) , exp ((ϑ − θ)xn ) = ϑ θ

. Noting that ln x − x is increasing with respect to x ∈ (0, 1), we have that ln θ − θ − ln ϑ + ϑ ≥ 0

as a consequence of 0 < ϑ ≤ θ. It follows that n     n ϑ ϑ exp ((ln θ − θ − ln ϑ + ϑ) xn ) ≤ exp ((ln θ − θ − ln ϑ + ϑ) z) θ θ provided that xn ≤ z. Hence, fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, θ] provided that xn ≤ z, gX (x, ϑ) where Λ(ϑ) =

n   ϑ exp ((ln θ − θ − ln ϑ + ϑ) z) . θ

22

∀ϑ ∈ (0, θ]

Hence, we have that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, θ]. By virtue of Theorem 1,  we have Pr X n ≤ z ≤ inf ϑ∈(0,θ] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with respective to ϑ ∈ (0, θ] is attained at ϑ = 1 − z1 . Therefore, #n   " z−1 1 eθz −θz = e Pr{X n ≤ z} ≤ Λ 1 − z 1−z

for 1 < z
0, we have that fX (x) ≤ gX (x, ϑ)

n   α ϑ−β β exp . ϑ xn 26

It follows that

fX (x) ≤ Λ(ϑ) gX (x, ϑ)

where Λ(ϑ) =

∀ϑ ∈ (0, β] provided that xn ≤ z,  α n  β ϑ−β exp . ϑ z

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, β]. By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈(0,β] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with β respect to ϑ ∈ (0, β] is attained at ϑ = αz as long as 0 < z ≤ α . Therefore,  Pr X n ≤ z ≤ Λ (αz) =



β αz



exp



αz − β z

n

for 0 < z ≤

β . α

This proves inequality (20). The proof of the theorem is thus completed.

A.10

Proof of Theorem 11

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is fX (x) =

 1/2  n  Y λ λ(xi − θ)2 . exp − 2πx3i 2θ2 xi i=1

To apply the LR method to show (21), we construct a family of probability density functions  1/2  n  Y λ λ(xi − ϑ)2 gX (x, ϑ) = , exp − 2πx3i 2ϑ2 xi i=1 It can be verified that

where xn =

Pn

i=1

n

xi

fX (x) = gX (x, ϑ)

ϑ ∈ (0, θ].

 n    λ λ λ λ , xn − exp − + θ ϑ 2ϑ2 2θ2

. It follows that fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, θ] provided that xn ≤ z, gX (x, ϑ)

where Λ(ϑ) =

   n  λ λ λ λ exp z − − + . θ ϑ 2ϑ2 2θ2

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, θ]. By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈(0,θ] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with respect to ϑ ∈ (0, θ] is attained at ϑ = z as long as 0 < z ≤ θ. Therefore, n   λ λz λ for 0 < z ≤ θ. − − 2 Pr{X n ≤ z} ≤ Λ(z) = exp θ 2z 2θ This completes the proof of the theorem.

27

A.11

Proof of Theorem 12

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability mass function of X is fX (x) =

n Y

−θxi (1 − θ)xi (β−1) Γ(βxi ) . Γ(xi + 1)Γ(βxi − xi + 1) ln(1 − θ) i=1

To apply the LR method to show (22), we construct a family of probability mass functions gX (x, ϑ) =

n Y

−ϑxi (1 − ϑ)xi (β−1) Γ(βxi ) , Γ(xi + 1)Γ(βxi − xi + 1) ln(1 − ϑ) i=1

It can be seen that fX (x) = gX (x, ϑ) where xn =

Pn

i=1

n

xi

ϑ ∈ (0, θ].

#n "   (β−1)xn x ln(1 − ϑ) θ n 1−θ , ϑ 1−ϑ ln(1 − θ)

. Define function h(x) = ln x + (β − 1) ln(1 − x)

for x ∈ (0, 1). Then, we can write

n  ln(1 − ϑ) fX (x) . = exp ([h(θ) − h(ϑ)]xn ) gX (x, ϑ) ln(1 − θ)

Note that the first derivative of h(x) is h′ (x) = which is positive for x ∈ (0, β1 ). Hence,

1 − βx , x(1 − x)

fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, θ] provided that xn ≤ z, gX (x, ϑ) where



ln(1 − ϑ) Λ(ϑ) = exp ([h(θ) − h(ϑ)]z) ln(1 − θ)

n

#n "   z(β−1) z 1−θ ln(1 − ϑ) θ = . ϑ 1−ϑ ln(1 − θ)

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, θ]. By virtue of Theorem 1,  we have Pr X n ≤ z ≤ inf ϑ∈(0,θ] Λ(ϑ). By differentiation, it can be shown that, as long as 0 < z ≤ ϑ θ (βθ−1) ln(1−θ) , the infimum of Λ(ϑ) with respect to ϑ ∈ (0, θ] is attained at ϑ such that z = (βϑ−1) ln(1−ϑ) . Such number ϑ is unique because the first derivative of (βϑ−1)ϑln(1−ϑ) with respective to ϑ ∈ (0, β1 ) is equal to   1 − βϑ 1 , − ln(1 − ϑ) − ϑ [(1 − βϑ) ln(1 − ϑ)]2 1−ϑ which is no less than

(β − 1)ϑ2 > 0. (1 − ϑ)[(1 − βϑ) ln(1 − ϑ)]2

This completes the proof of the theorem.

28

A.12

Proof of Theorem 13

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability mass function of X is   n Y β αxi + β xi fX (x) = θ (1 − θ)β+αxi −xi . αx + β x i i i=1 To apply the LR method to show (23), we construct a family of probability mass functions   n Y αxi + β xi β ϑ (1 − ϑ)β+αxi −xi , ϑ ∈ (0, θ]. gX (x, ϑ) = αx + β x i i i=1 It can be seen that fX (x) = gX (x, ϑ) where xn =

Pn

i=1

n

xi

"   β+(α−1)xn #n x θ n 1−θ , ϑ 1−ϑ

. Define function h(x) = ln x + (α − 1) ln(1 − x)

for x ∈ (0, 1). Then, we can write

"  β #n 1−θ fX (x) . = exp ([h(θ) − h(ϑ)]xn ) gX (x, ϑ) 1−ϑ

Note that the first derivative of h(x) is

1 − αx , x(1 − x)

h′ (x) = which is positive for x ∈ (0, α1 ). Hence,

fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, θ] provided that xn ≤ z, gX (x, ϑ) where

"

Λ(ϑ) = exp ([h(θ) − h(ϑ)]z)



1−θ 1−ϑ

β #n

"   β+(α−1)z #n z 1−θ θ = . ϑ 1−ϑ

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, θ]. By virtue of Theorem 1, we  βθ , have Pr X n ≤ z ≤ inf ϑ∈(0,θ] Λ(ϑ). By differentiation, it can be shown that, as long as 0 < z ≤ 1−αθ z the infimum of Λ(ϑ) with respect to ϑ ∈ (0, θ] is attained at ϑ = β+αz . This completes the proof of the theorem.

A.13

Proof of Theorem 14

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is   n Y 1 |xi − α| . fX (x) = exp − 2β β i=1 To apply the LR method to show (24), we construct a family of probability density functions   n Y 1 |xi − α| gX (x, ϑ) = , ϑ ∈ [β, ∞). exp − 2ϑ ϑ i=1 29

It can be seen that for ϑ ∈ [β, ∞), fX (x) gX (x, ϑ)

= ≤ =

where xn =

Pn

i=1

n

xi

. Since

1 ϑ



1 β

# "  n  n 1 X 1 ϑ |xi − α| exp − β ϑ β i=1 # "  n  n 1 X 1 ϑ (xi − α) exp − β ϑ β i=1  n    n ϑ 1 1 exp (xn − α) − , β ϑ β

≤ 0 for ϑ ∈ [β, ∞), it follows that

fX (x) ≤ Λ(ϑ) ∀ϑ ∈ [β, ∞) provided that xn ≥ z, gX (x, ϑ) where Λ(ϑ) =



ϑ exp β



1 1 − ϑ β



n (z − α) .

This implies that fX (X ) I{X n ≥z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ [β, ∞). By virtue of Theorem 1, we  have Pr X n ≥ z ≤ inf ϑ∈[β,∞) Λ(ϑ). By differentiation, it can be shown that, as long as z ≥ α + β, the infimum of Λ(ϑ) with respect to ϑ ∈ [β, ∞) is attained at ϑ = z − α. This proves (24). To show (25), note that for ϑ ∈ [β, ∞), # "  n  n 1 1 X ϑ fX (x) |xi − α| exp = − gX (x, ϑ) β ϑ β i=1 # "  n  n 1 X ϑ 1 (α − xi ) ≤ exp − β ϑ β i=1 n  n    ϑ 1 1 . = exp (α − xn ) − β ϑ β Since

1 ϑ



1 β

≤ 0 for ϑ ∈ [β, ∞), it follows that fX (x) ≤ Λ(ϑ) ∀ϑ ∈ [β, ∞) provided that xn ≤ z, gX (x, ϑ)

where Λ(ϑ) =



ϑ exp β



1 1 − ϑ β



n (α − z) .

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ [β, ∞). By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈[β,∞) Λ(ϑ). By differentiation, it can be shown that, as long as z ≤ α − β, the infimum of Λ(ϑ) with respect to ϑ ∈ [β, ∞) is attained at ϑ = α − z. This proves (25). The proof of the theorem is thus completed.

A.14

Proof of Theorem 15

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability mass function of X is fX (x) =

n Y

q xi 1 . x ln 1−q i=1 i 30

To apply the LR method to show (26), we construct a family of probability mass functions gX (x, ϑ) =

n Y

ϑxi 1 , x ln 1−ϑ i=1 i

ϑ ∈ (0, q].

Clearly,

where xn =

Pn

i=1

 n fX (x) ln(1 − q)  q xn = , gX (x, ϑ) ln(1 − ϑ) ϑ xi

n

. Hence, fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, q] provided that xn ≤ z, gX (x, ϑ)

where



ln(1 − q)  q z Λ(ϑ) = ln(1 − ϑ) ϑ

n

.

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, q]. By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈(0,q] Λ(ϑ). By differentiation, it can be shown that, as long as z ≤ (1−q)qln 1 , 1−q

the infimum of Λ(ϑ) with respect to ϑ ∈ (0, q] is attained at ϑ ∈ (0, q] such that z =

number ϑ is unique because the function the theorem is thus completed.

A.15

ϑ (1−ϑ) ln

1 1−ϑ

ϑ (1−ϑ) ln

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is "  2 # n Y 1 1 µ − ln xi √ . fX (x) = exp − 2 σ x 2πσ i=1 i To apply the LR method to show (27), we construct a family of probability density functions "  2 # n Y 1 1 ϑ − ln xi √ gX (x, ϑ) = , ϑ ∈ (0, µ]. exp − 2 σ x 2πσ i=1 i It can be seen that " # n  fX (x) ϑ−µ X µ+ϑ . = exp − ln xi gX (x, ϑ) σ 2 i=1 2 ϑ−µ σ2



. Such

is increasing with respect to ϑ ∈ (0, 1). The proof of

Proof of Theorem 16

It can be readily shown that for ϑ ∈ (0, µ],

1 1−ϑ

 µ+ϑ − ln x 2

is a concave function of x > 0. Hence, ( "  2 #)n fX (x) ϑ−µ µ+ϑ ≤ exp − ln xn gX (x, ϑ) σ2 2

31

for ϑ ∈ (0, µ],

where xn =

Pn

i=1

xi

n

. It follows that fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, µ] provided that xn ≤ z, gX (x, ϑ)

where Λ(ϑ) =

(

"

ϑ−µ exp σ2



µ+ϑ − ln z 2

2 #)n

.

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, µ]. By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈(0,µ] Λ(ϑ). By differentiation, it can be shown that, as long as 0 < z ≤ eµ , the infimum of Λ(ϑ) with respect to ϑ ∈ (0, µ] is attained at ϑ = ln z. Therefore, "  2 # n µ − ln z for 0 < z ≤ eµ . Pr{X n ≤ z} ≤ Λ(ln z) = exp − 2 σ The proof of the theorem is thus completed.

A.16

Proof of Theorem 17

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is  2 n Y 2 xi2m−1 x fX (x) = exp − i2 . 2m Γ(m) σ σ i=1 To apply the LR method to show (28), we construct a family of probability density functions  2 n Y 2 x2ϑ−1 x i exp − i2 , ϑ ∈ (0, m]. gX (x, ϑ) = 2ϑ Γ(ϑ) σ σ i=1 Clearly, for ϑ ∈ (0, m], n  fX (x) Γ(ϑ) 2(ϑ−m) = σ gX (x, ϑ) Γ(m) where xn =

Pn

i=1

xi

n

n Y

xi

i=1

!2(m−ϑ)



Γ(ϑ) 2(ϑ−m) ≤ σ Γ(m)

n

2n(m−ϑ)

(xn )

,

. It follows that fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, m] provided that xn ≤ z, gX (x, ϑ)

where Λ(ϑ) =



Γ(ϑ)  z 2(m−ϑ) Γ(m) σ

n

.

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, m]. By virtue of Theorem 1, we  Γ(ϑ+ 1 ) have Pr X n ≤ z ≤ inf ϑ∈(0,µ] Λ(ϑ). Letting z = Γ(ϑ)2 σ leads to (28). To apply the LR method to show (29), we construct a family of probability density functions  2 n Y 2 xi2m−1 x gX (x, ϑ) = exp − i2 , ϑ ∈ [σ, ∞). 2m Γ(m) ϑ ϑ i=1 It can be seen that fX (x) = gX (x, ϑ)

# "  n  2mn 1 X 2 ϑ 1 xi . − 2 exp σ ϑ2 σ i=1 32

Observing that for ϑ ∈ [σ, ∞), fX (x) ≤ gX (x, ϑ) It follows that

1 ϑ2



1 σ2



x2 is a concave function of x > 0, we have that

(     )n 2m 1 1 ϑ exp − 2 (xn )2 , σ ϑ2 σ

∀ϑ ∈ [σ, ∞).

fX (x) ≤ Λ(ϑ) ∀ϑ ∈ [σ, ∞) provided that xn ≥ z, gX (x, ϑ)

where

"  #n  2 2m ϑ z2 z Λ(ϑ) = exp − 2 . σ ϑ2 σ

This implies that fX (X ) I{X n ≥z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ [σ, ∞). By virtue of Theorem 1, we  √ have Pr X n ≥ z ≤ inf ϑ∈[σ,∞) Λ(ϑ). By differentiation, it can be shown that, as long as z ≥ mσ, the infimum of Λ(ϑ) with respect to ϑ ∈ [σ, ∞) is attained at ϑ = √zm . Therefore, Pr{X n ≥ z} ≤ Λ



z √ m



=



z2 mσ 2

m

n  z2 exp m − 2 σ

for z ≥

√ mσ.

This establishes (29) and completes the proof of the theorem.

A.17

Proof of Theorem 18

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is fX (x) =

 θ+1 n Y θ a . a xi i=1

To apply the LR method to show (30), we construct a family of probability density functions gX (x, ϑ) = Clearly, fX (x) = gX (x, ϑ) where xn =

Pn

i=1

n

xi

 ϑ+1 n Y ϑ a , a xi i=1

 n θ ϑ

n Y

i=1

xi

!ϑ−θ

ϑ ∈ [θ, ∞). "

θ ≤ ϑ



xn a

ϑ−θ #n

,

. It follows that fX (x) ≤ Λ(ϑ) gX (x, ϑ)

where

∀ϑ ∈ [θ, ∞) provided that xn ≤ z, 

θ  z ϑ−θ Λ(ϑ) = ϑ a

n

.

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ [θ, ∞). By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈[θ,∞) Λ(ϑ). Hence, for γ > 1, Pr{X n ≤ γa} ≤ inf

ϑ≥θ



θ ϑ−θ γ ϑ

where w(ϑ) = − ln ϑ + (ϑ − θ) ln γ. 33

n

= θn inf exp[n w(ϑ)], ϑ≥θ

Now consider the minimization of w(ϑ) subject to ϑ ≥ θ. Note that the first and second derivatives of w(ϑ) are w′ (ϑ) = − ϑ1 + ln γ and w′′ (ϑ) = ϑ12 , respectively. Hence, the minimum is achieved at ϑ∗ = ln1γ provided that 1 < γ ≤ e1/θ . Accordingly, w(ϑ∗ ) = 1 + ln ln γ − θ ln γ and n  eθ ln γ for 1 < γ ≤ e1/θ . Pr{X n ≤ γa} ≤ γθ θa . Letting γ = ρµ Note that the mean of X is µ = θ−1 a yields "  θ  #n   1 1 1 ρθ θ−1 Pr{X n ≤ ρµ} ≤ eθ exp( ). ln for 1 − < ρ ≤ 1 − ρθ θ−1 θ θ θ

This establishes (30) and completes the proof of the theorem.

A.18

Proof of Theorem 19

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is fX (x) =

n Y xi−α . C(α) i=1

To apply the LR method to show (31), we construct a probability density functions gX (x, ϑ) = Clearly,

n  fX (x) C(ϑ) = gX (x, ϑ) C(α)

provided that xn ≤ z, where xn =

Pn

i=1

xi

i=1

xi

n

n Y

n Y x−ϑ i , C(ϑ) i=1

!ϑ−α

ϑ ∈ [α, ∞). 

C(ϑ) ≤ C(α)

n h in ϑ−α (xn ) ≤ Λ(ϑ)

and 

C(ϑ) ϑ−α Λ(ϑ) = z C(α)

n

.

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ [α, ∞). By virtue of Theorem 1, we  have Pr X n ≤ z ≤ Λ(ϑ). This completes the proof of the theorem.

A.19

Proof of Theorem 20

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability mass function of X is fX (x) =

n Y m!|s(xi , m)|θxi . x ![− ln(1 − θ)]m i=1 i

To apply the LR method to show (32), we construct a family of probability mass functions gX (x, ϑ) = Clearly,

n Y m!|s(xi , m)|ϑxi , x ![− ln(1 − ϑ)]m i=1 i

ϑ ∈ (0, θ].

nm " xn #n  θ ln(1 − ϑ) fX (x) = , gX (x, ϑ) ln(1 − θ) ϑ 34

where xn =

Pn

i=1

n

xi

. It follows that fX (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, θ] provided that xn ≤ z, gX (x, ϑ)

where



ln(1 − ϑ) Λ(ϑ) = ln(1 − θ)

nm  z n θ . ϑ

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, θ]. By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈(0,θ] Λ(ϑ). By differentiation, it can be shown that, as long as z ≤ (θ−1)mθ ln(1−θ) , the infimum of Λ(ϑ) with respective to ϑ ∈ (0, θ] is attained at a number ϑ such that z = (ϑ−1)mϑ ln(1−ϑ) . ϑ Such number ϑ is unique because (ϑ−1) ln(1−ϑ) is an increasing function of ϑ ∈ (0, 1). This completes the proof of the theorem.

A.20

Proof of Theorem 21

To apply the LR method, we introduce a family of probability density functions g(x, ϑ) = Clearly, f (x) f (x) = 1 x = ϑm/2 g(x, ϑ) ϑf(ϑ) To show inequality (33), note that



f (x) g(x,ϑ)

1 x , f ϑ ϑ

n + mx ϑ n + mx

ϑ > 0.

(n+m)/2

 = ϑm/2 1 +

1 ϑ

−1 n 1 + mx

(n+m)/2

.

is decreasing with respect to x > 0 for ϑ ≥ 1. Hence,

f (x) ≤ Λ(ϑ) ∀ϑ ∈ [1, ∞) provided that x ≥ z, g(x, ϑ) where

 Λ(ϑ) = ϑm/2 1 +

1 ϑ

−1 n 1 + mz

(n+m)/2

.

(54)

This implies that f (X) I{X≥z} ≤ Λ(ϑ) g(X, ϑ) holds for any ϑ ∈ [1, ∞). By virtue of Theorem 1, we have Pr {X ≥ z} ≤

inf

ϑ∈[1,∞)

Λ(ϑ) = Λ(z) = z m/2

To show inequality (34), note that

f (x) g(x,ϑ)



n+m n + mz

(n+m)/2

for z ≥ 1.

is increasing with respect to x > 0 for 0 < ϑ ≤ 1. Hence,

f (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, 1] provided that x ≤ z, g(x, ϑ) where Λ(ϑ) is defined by (54). This implies that f (X) I{X≤z} ≤ Λ(ϑ) g(X, ϑ) holds for any ϑ ∈ (0, 1]. By virtue of Theorem 1, we have Pr {X ≤ z} ≤ inf Λ(ϑ) = Λ(z) = z

m/2

ϑ∈(0,1]



n+m n + mz

(n+m)/2

This proves inequality (34) and completes the proof of the theorem.

35

for 0 < z ≤ 1.

A.21

Proof of Theorem 22

To apply the LR method, we introduce a family of probability density functions g(x, ϑ) = Clearly,

1 x f , ϑ ϑ

ϑ > 0.

(n+1)/2   n + ( ϑx )2 f (x) f (x) = 1 x =ϑ =ϑ 1+ g(x, ϑ) n + x2 ϑf(ϑ)

To show inequality (35), note that

f (x) g(x,ϑ)

1 ϑ2 n x2

−1 +1

(n+1)/2

.

is decreasing with respect to |x| for ϑ ≥ 1. Hence,

f (x) ≤ Λ(ϑ) ∀ϑ ∈ [1, ∞) provided that |x| ≥ z, g(x, ϑ) where

 Λ(ϑ) = ϑ 1 +

1 ϑ2 n z2

−1 +1

(n+1)/2

.

(55)

This implies that f (X) I{|X|≥z} ≤ Λ(ϑ) g(X, ϑ) holds for any ϑ ∈ [1, ∞). By virtue of Theorem 1, we have Pr {X ≥ z} ≤

inf

ϑ∈[1,∞)

This proves inequality (35). To show inequality (36), note that

Λ(ϑ) = Λ(z) = z

f (x) g(x,ϑ)



n+1 n + z2

(n+1)/2

for z ≥ 1.

is increasing with respect to |x| for ϑ ∈ (0, 1]. Hence,

f (x) ≤ Λ(ϑ) ∀ϑ ∈ (0, 1] provided that |x| ≤ z, g(x, ϑ) where Λ(ϑ) is defined by (55). This implies that f (X) I{|X|≤z} ≤ Λ(ϑ) g(X, ϑ) holds for any ϑ ∈ (0, 1]. By virtue of Theorem 1, we have Pr {X ≤ z} ≤ inf Λ(ϑ) = Λ(z) = z ϑ∈(0,1]



n+1 n + z2

(n+1)/2

for 0 < z ≤ 1.

This proves inequality (36) and completes the proof of the theorem.

A.22

Proof of Theorem 23

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is fX (x) =

n Y

i=1



θ eθxi . −1

To apply the LR method to show (37), we construct a family of probability density functions gX (x, ϑ) =

n Y

i=1

Clearly, fX (x) = gX (x, ϑ)



ϑ eϑxi , −1



θ eϑ − 1 ϑ eθ − 1

n

36

ϑ ∈ (−∞, θ],

ϑ 6= 0.

exp [n(θ − ϑ)xn ] ,

where xn =

Pn

i=1

xi

n

. It follows that fX (x) ≤ Λ(ϑ) gX (x, ϑ)

∀ϑ ∈ (−∞, θ], ϑ 6= 0 provided that xn ≤ z,

where Λ(ϑ) =



θ eϑ − 1 ϑ eθ − 1

n

exp [n(θ − ϑ)z] .

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (−∞, θ], ϑ 6= 0. By virtue of  Theorem 1, we have Pr X n ≤ z ≤ inf ϑ∈(−∞,θ] Λ(ϑ). By differentiation, it can be shown that, as long as 0 < z ≤ 1 + eθ1−1 − 1θ and z 6= 21 , the infimum of Λ(ϑ) with respective to ϑ ∈ (−∞, θ], ϑ 6= 0 is attained at a number ϑ ∈ (−∞, θ], ϑ 6= 0 such that z = 1 + eϑ1−1 − ϑ1 . Such a number is unique because   1 1 1+ ϑ lim =0 − ϑ→−∞ e −1 ϑ and 1 + eϑ1−1 − ϑ1 is increasing with respect to ϑ 6= 0. To show such monotonicity, note that the first derivative of 1 + eϑ1−1 − ϑ1 with respective to ϑ is equal to    ϑ/2 eϑ/2 1 e − e−ϑ/2 − ϑ , + ϑ eϑ − 1 ϑ(eϑ/2 − e−ϑ/2 ) where eϑ/2 − e−ϑ/2 − ϑ is a function of ϑ with its first derivative assuming value 0 at ϑ = 0, and its second derivative equal to 41 (eϑ/2 − e−ϑ/2 ). This establishes inequality (37). To show (38), it suffices to note that as z → 12 , the root of equation z = 1 + eϑ1−1 − ϑ1 with respect to ϑ tends to 0. This completes the proof of the theorem.

A.23

Proof of Theorem 24

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is fX (x) = 1. To apply the LR method to show (39), we construct a family of probability density functions gX (x, ϑ) = Clearly,

where xn =

Pn

i=1

n

xi

n Y

ϑ eϑxi , ϑ−1 e i=1

ϑ > 0.

n  ϑ fX (x) e −1 = exp(−ϑ xn ) , gX (x, ϑ) ϑ . It follows that fX (x) ≤ Λ(ϑ) ∀ϑ > 0 provided that xn ≥ z, gX (x, ϑ)

where Λ(ϑ) =



eϑ − 1 ϑeϑz

n

.

This implies that fX (X ) I{X n ≥z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ > 0. By virtue of Theorem 1, we have  Pr X n ≥ z ≤ inf ϑ>0 Λ(ϑ). By differentiation, it can be shown that, as long as 1 > z ≥ 12 , the infimum of Λ(ϑ) with respective to ϑ > 0 is attained at a positive number ϑ∗ such that z = 1 + eϑ∗1−1 − ϑ1∗ . Such a number is unique because   1 1 1 = − lim 1 + ϑ ϑ↓0 e −1 ϑ 2 37

and 1 +

1 eϑ −1



1 ϑ

is increasing with respect to ϑ > 0. Therefore, we have shown that Pr{X n ≥ z} ≤ Λ(ϑ∗ ) for

1 < z < 1. 2

On the other hand, it can be shown that ∗

Λ(ϑ ) =



inf e

s>0

To establish an upper bound on Λ(ϑ∗ ), we can use  2 s sX E[e ] < exp + 24

−zs

E[e

sX

]

n

the following inequality due to Chen [6, Appendix H],  s , ∀s ∈ (−∞, ∞). 2

By differentiation, it can be shown that ∗



Pr{X ≥ z} ≤ Λ(ϑ ) ≤ inf exp s>0



s s2 + − zs 24 2

n

2 !  1 , ≤ exp −6n z − 2

1>z>

1 . 2

This establishes (39). By a similar argument, we can show (40). This completes the proof of the theorem.

A.24

Proof of Theorem 25

Let X = [X1 , · · · , Xn ] and x = [x1 , · · · , xn ]. The joint probability density function of X is fX (x) =

n Y

i=1

  αβxiβ−1 exp −αxβi .

To apply the LR method, we construct a family of probability density functions gX (x, ϑ) =

n Y

i=1

Clearly,

  ϑβxiβ−1 exp −ϑxβi ,

ϑ ∈ (0, ∞).

# " n  α n X fX (x) β xi . exp (ϑ − α) = gX (x, ϑ) ϑ i=1

To show inequality (41) under the condition that αz β ≤ 1 and 0 < β ≤ 1, we restrict ϑ to be no less than α. As a consequence of 0 < β ≤ 1 and ϑ ≥ α, we have that (ϑ − α)xβ is a concave function of x > 0. By virtue of such concavity, we have

where xn =

Pn

i=1

n

xi

nα on  fX (x) , ≤ exp (ϑ − α)(xn )β gX (x, ϑ) ϑ

∀ϑ ∈ [α, ∞),

. It follows that fX (x) ≤ Λ(ϑ) ∀ϑ ∈ [α, ∞) provided that xn ≤ z, gX (x, ϑ)

where Λ(ϑ) =

nα ϑ

on  . exp (ϑ − α)z β 38

(56)

This implies that fX (X ) I{X n ≤z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ [α, ∞). By virtue of Theorem 1, we  have Pr X n ≤ z ≤ inf ϑ∈[α,∞) Λ(ϑ). By differentiation, it can be shown that, as long as αz β ≤ 1, the infimum of Λ(ϑ) with respect to ϑ ∈ [α, ∞) is attained at ϑ = z −β . Therefore,  n for αz β ≤ 1 and β < 1. Pr{X n ≤ z} ≤ Λ(z −β ) = αz β exp(1 − αz β )

This proves inequality (41). To show inequality (42) under the condition that αz β ≥ 1 and β > 1, we restrict ϑ to be a positive number less than α. As a consequence of β > 1 and 0 < ϑ < α, we have that (ϑ − α)xβ is a concave function of x > 0. By virtue of such concavity, we have nα  on fX (x) ≤ exp (ϑ − α)(xn )β , ∀ϑ ∈ (0, α). gX (x, ϑ) ϑ It follows that

fX (x) ≤ Λ(ϑ) gX (x, ϑ)

∀ϑ ∈ (0, α) provided that xn ≥ z,

where Λ(ϑ) is defined by (56). This implies that fX (X ) I{X n ≥z} ≤ Λ(ϑ) gX (X , ϑ) holds for any ϑ ∈ (0, α).  By virtue of Theorem 1, we have Pr X n ≥ z ≤ inf ϑ∈(0,α) Λ(ϑ). By differentiation, it can be shown that, as long as αz β ≥ 1, the infimum of Λ(ϑ) with respect to ϑ ∈ (0, α) is attained at ϑ = z −β . Therefore,  n for αz β ≥ 1 and β > 1. Pr{X n ≥ z} ≤ Λ(z −β ) = αz β exp(1 − αz β ) This proves inequality (42). The proof of the theorem is thus completed.

B B.1

Proofs of Multivariate Inequalities Proof of Theorem 26

To apply the LR method to show (43), we introduce a family of probability mass functions P   k Γ( kℓ=0 ϑℓ ) Y Γ(xℓ + ϑℓ ) n , g(x, ϑ) = Pk Γ(ϑℓ ) x Γ(n + ℓ=0 ϑℓ ) ℓ=0

with ϑ0 = α0 and 0 < ϑℓ ≤ αℓ ,

ℓ = 1, · · · , k

where ϑ = [ϑ0 , ϑ1 , · · · , ϑk ]⊤ . Clearly,

Pk Pk k Γ( αℓ ) Γ(n + ℓ=0 ϑℓ ) Y Γ(xℓ + αℓ ) Γ(ϑℓ ) f (x) = Pℓ=0 . P g(x, ϑ) Γ( kℓ=0 ϑℓ ) Γ(n + kℓ=0 αℓ ) ℓ=1 Γ(xℓ + ϑℓ ) Γ(αℓ )

For simplicity of notations, define

L(x, ϑ) =

f (x) . g(x, ϑ)

Let y = [y0 , y1 , · · · , yk ]⊤ be a vector such that yi = xi + 1 for some i ∈ {1, · · · , k} and that yℓ = xℓ for all ℓ ∈ {1, · · · , k} except ℓ = i. Then, xi + αi L(y, ϑ) ≥ 1. = L(x, ϑ) xi + ϑi Making use of this observation and by an inductive argument, we have that for z = [z0 , z1 , · · · , zk ]⊤ such that xℓ ≤ zℓ for ℓ = 1, · · · , k, it must be true that L(z, ϑ) ≥ 1. L(x, ϑ) 39

It follows that

where

f (x) ≤ Λ(ϑ) g(x, ϑ)

∀ϑ ∈ Θ provided that xℓ ≤ zℓ , ℓ = 1, · · · , k,

Pk Pk k Γ( ℓ=0 αℓ ) Γ(n + ℓ=0 ϑℓ ) Y Γ(zℓ + αℓ ) Γ(ϑℓ ) , Λ(ϑ) = Pk Pk Γ( ℓ=0 ϑℓ ) Γ(n + ℓ=0 αℓ ) ℓ=1 Γ(zℓ + ϑℓ ) Γ(αℓ )

Θ is the set of vectors ϑ = [ϑ0 , ϑ1 , · · · , ϑk ]⊤ such that ϑ0 = α0 and 0 < ϑℓ ≤ αℓ , ℓ = 1, · · · , k. This implies that f (X ) I{X 4z} ≤ Λ(ϑ) g(X , ϑ) ∀ϑ ∈ Θ, where X = [X0 , X1 , · · · , Xk ]⊤ and X 4 z means Xℓ ≤ zℓ , ℓ = 1, · · · , k. By virtue of Theorem 1, we have Pr {Xℓ ≤ zℓ , ℓ = 1, · · · , k} = Pr{X 4 z} ≤ inf ϑ∈Θ Λ(ϑ). ℓ for ℓ = 1, · · · , k, we have that As a consequence of the assumption that 0 < zℓ ≤ Pnα k α i=0

θℓ =

n−

α0 zℓ Pk

i=1 zi



n−

i

ℓ α0 Pnα k i=0 αi Pk nαℓ

ℓ=1

Pk

i=0

= αℓ αi

for ℓ = 1, · · · , k. Define θ = [θ0 , θ1 , · · · , θk ]⊤ . Then, θ ∈ Θ and

Pr {Xℓ ≤ zℓ , ℓ = 1, · · · , k} ≤ inf Λ(ϑ) ≤ Λ(θ). ϑ∈Θ

This completes the proof of the theorem.

B.2

Proof of Theorem 27

To apply the LR method to show inequality (44), we introduce a family of probability density functions   |ϑ|α 1 g(x, ϑ) = pα |x|−α−(p+1)/2 exp − tr(ϑx−1 ) , β Γp (α) β where ϑ is a positive-definite real matrix of size p × p such that ϑ 4 Ψ. Note that   1 |Ψ|α f (x) −1 exp − = tr([Ψ − ϑ]x ) . g(x, ϑ) |ϑ|α β For positive definite matrices x and z such that x 4 z, we have tr([Ψ − ϑ]x−1 ) ≥ tr([Ψ − ϑ]z −1 ) as a consequence of ϑ 4 Ψ. If follows that f (x) ≤ Λ(ϑ) g(x, ϑ) for ϑ 4 Ψ and x 4 z, where Λ(ϑ) =

  1 |Ψ|α −1 exp − tr([Ψ − ϑ]z ) . |ϑ|α β

Hence, f (X) I{X4z} ≤ Λ(ϑ) g(X, ϑ) 40

provided that ϑ 4 Ψ.

By virtue of Theorem 1, we have Pr{X 4 z} ≤ inf ϑ4Ψ Λ(ϑ). In particular, taking z = ρE[X] = and ϑ = β2 (2α − p − 1)z, we have Pr {X 4 ρΥ} = ≤ = =

2ρ Ψ β 2α−p−1

Pr {X 4 z}   1 β |Ψ|α −1 exp − tr([Ψ − (2α − p − 1)z]z ) β 2 | β2 (2α − p − 1)z|α   p α exp( 2 (2α − p − 1))|Ψ| 1 −1 exp − tr(Ψz ) β [ β2 (2α − p − 1)]pα |z|α     p 1 1 exp − − 1 (2α − p − 1) . ρpα 2 ρ

This completes the proof of the theorem.

B.3

Proof of Theorem 28

For simplicity of notations, let X = [X 1 , · · · , X n ]. Let X = [x1 , · · · , xn ], where x1 , · · · , xn are vectors of dimension k. Since X 1 , · · · , X n are identical and independent, the joint probability density of X is n  n  Y 1 ⊤ −1 −k/2 −1/2 (2π) |Σ| exp − (xi − µ) Σ (xi − µ) . fX (X ) = 2 i=1 To apply the LR method to show (45), we introduce a family of probability density functions n  n  Y 1 ⊤ −1 −k/2 −1/2 (2π) |Σ| exp − (xi − ϑ) Σ (xi − ϑ) , gX (X , ϑ) = 2 i=1 where ϑ is a vector of dimension k such that Σ−1 ϑ < Σ−1 µ. It can be checked that fX (X ) gX (X , ϑ)

= =

  1 exp (µ⊤ − ϑ⊤ )Σ−1 xi + [ϑ⊤ Σ−1 ϑ − µ⊤ Σ−1 µ] 2 i=1   n 1 exp (µ⊤ − ϑ⊤ )Σ−1 xn + [ϑ⊤ Σ−1 ϑ − µ⊤ Σ−1 µ] , 2 n Y

where xn = As a consequence of Σ−1 ϑ < Σ−1 µ, we have that

Pn

i=1

n

xi

.

(µ⊤ − ϑ⊤ )Σ−1 u ≤ (µ⊤ − ϑ⊤ )Σ−1 v for arbitrary vectors u and v such that v < u. This implies that for ϑ such that Σ−1 ϑ < Σ−1 µ, fX (X ) ≤ Λ(ϑ) gX (X , ϑ)

provided that xn < z,

41

where

n  1 ⊤ −1 ⊤ −1 ⊤ −1 ⊤ Λ(ϑ) = exp (µ − ϑ )Σ z + [ϑ Σ ϑ − µ Σ µ] . 2 

It follows that fX (X ) I{X n α,

k X zi 1−k+ β i=1 i

!ϑ−α n

 .

By virtue of Theorem 1, we have Pr{Xn  z} ≤ Λ(θ) for any θ > α. This proves the first statement. Note that if (48) holds, then θ > α for θ satisfying (47). Moreover, by differentiation, it can be shown that Pr{Xn  z} ≤ inf ϑ>α Λ(ϑ) = Λ(θ). This proves statement (II). For θ satisfying (49), it must be true that θ > α as a consequence of the assumption that α > 1 and Pk zi 1 α i=1 βi < α−1 . This proves statement (III). The proof of the theorem is thus completed. k

References [1] A. C. Berry, “The accuracy of the Gaussian approximation to the sum of independent variates,” Trans. Amer. Math. Soc., vol. 49, no. 1, pp. 122–139, 1941. [2] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations,” Ann. Math. Statist., vol. 23, pp. 493–507, 1952. [3] X. Chen, “New probabilistic inequalities from monotone likelihood ratio property,” arXiv:1010.3682v1 [math.PR], October 2010. [4] X. Chen, “A likelihood ratio approach for probabilistic inequalities,” arXiv:1308.4123 [math.PR], August 2013. [5] X. Chen, “Probabilistic inequalities with applications to machine learning,” Proceeding of SPIE conference, Baltimore, USA, May 2014. [6] X. Chen, “New optional stopping theorems and maximal inequalities on stochastic processes,” arXiv:1207.3733 [math.PR], July 2012. [7] P. C. Consul, Generlized Poisson Distribution, Dekker, 1989. [8] C. G. Esseen, “On the Liapunoff limit of error in the theory of probability,” Ark. Mat. Astron. Fys., vol. A28, no. 9, pp. 1-19, 1942. [9] A. K. Gupta and D. K. Nagar, Matrix Variate Distributions, Chapman and Hall/CRC, 1999.

43