Universality of Covariance Matrices

3 downloads 0 Views 552KB Size Report
AMS 2000 subject classifications: 15B52, 82B44. Keywords and phrases: Covariance Matrix, Marcenko Pastur law, Universality, Dyson Brownian motion. 1 ...
UNIVERSALITY OF COVARIANCE MATRICES

arXiv:1110.2501v1 [math.PR] 11 Oct 2011

By Natesh S. Pillai, Jun Yin

In this paper we prove the universality of covariance matrices of the form HN ×N = is a rectangular matrix with independent real valued entries 1 , N, M → ∞. Furthermore it is assumed 0 and E x2ij = M that these entries have sub-exponential tails. We will study the asymptotics in the regime N/M = dN ∈ (0, ∞), limN →∞ dN 6= 1. Our main result states that the Stieltjes transform of the empirical eigenvalue distribution of H is given by the Marcenko-Pastur law uniformly up to the edges of the spectrum with an error of order (N η)−1 where η is the imaginary part of the spectral parameter in the Stieltjes transform. From this strong local Marcenko-Pastur law, we derive the following results. 1. The rigidity of eigenvalues: If γj = γj,N denotes the classical location of the j-th eigenvalue under the Marcenko Pastur law ordered in increasing order, then the j-th eigenvalue λj of H is close to γj in the sense that for some positive constants C, c such that, 1 † N X X where [X]M×N [xij ] satisfying E xij =

 h i−1/3 −2/3    P ∃ j : |λj − γj | > (log N )C log log N min min(N, M ) − j, j N 6 C exp − (log N )c log log N for N large enough. 2. The delocalization of the eigenvectors of the matrix XX † uniformly both at the edge and the bulk. 3. Bulk universality, i.e., n-point correlation functions of the eigenvalues of the sample covariance matrix X † X coincide with those of the Wishart ensemble, when N goes to infinity. 4. Universality of the eigenvalues of the sample covariance matrix X † X at both edges of the spectrum. Furthermore the first two results are applicable even in the case in which the entries of the column vectors of X are not independent but satisfy a certain large deviation principle. All our results hold for both real and complex valued entries.

1. Introduction. Covariance matrices are fundamental objects in modern multivariate statistics where the advance of technology has lead to high dimensional data. They have manifold applications in various applied fields; see [2, 13, 14, 15] for an extensive account on statistical applications, [12, 16] for applications in economics and [17] in population genetics to name a few. Except in special cases (under specific assumptions on the distributions of the entries of the covariance matrix such as Gaussian), the exact asymptotic distribution of the eigenvalues is not known. In this context, akin to the central limit theorem, the phenomenon AMS 2000 subject classifications: 15B52, 82B44 Keywords and phrases: Covariance Matrix, Marcenko Pastur law, Universality, Dyson Brownian motion.

1

2

of universality helps us to obtain the asymptotic distribution of the eigenvalues, without having restrictive assumptions on the distribution on the entries. Borrowing a physical analogy, the key observation is that the eigenvalue gap distribution for a large complicated system is universal in the sense that it depends only on the symmetry class of the physical system but not on other detailed structures. The covariance matrix formed by i.i.d standard Gaussian entries is the well studied Wishart matrix for which one has closed form expressions for many objects of interest including the joint distribution of the eigenvalues. Furthermore the empirical spectrum of the Wishart matrix converges to the Marcenko-Pastur law. In this paper we prove the universality of covariance matrices (both at the bulk and at the edges) under the assumption that the matrix entries are independent, have mean 0, variance 1 and have a sub-exponential tail decay. This implies that, asymptotically the distribution of the local statistics of eigenvalues of the covariance matrices of the above kind are identical to those of the Wishart matrix. Over the past two decades, great progress have been made in proving the universality properties of i.i.d. matrix elements (Standard Wigner ensembles) (see [9] and the references there in). However results regarding universality for covariance matrices have been obtained only recently [1, 8, 18, 19, 20, 21]. Moreover these results are obtained under strong assumptions; for example in the “four moment theorem” of [23, 24], universality results are proved under the assumption that the first four moments of the matrix elements are equal to those of the standard Gaussian. In [8] the authors prove bulk unversality of covariance matrices under the assumption that distribution of matrix elements have a smooth density. These results, although quite interesting, exclude many important cases including the Bernoulli ensembles. On the other hand, we don’t require the smoothness of the distribution of the matrix entries and only need the first two moments to be identical to those of the standard Gaussian. Furthermore, some of our results are applicable even in situations where the entries in same column are not independent, but satisfy a certain large deviation bound as explained below. Of course, we do require an exponential tail decay condition for the matrix entries. However in the future work, all of our results will be proved with the tail condition replaced by a uniform bound on pth moment of the matrix elements (say p = 5 or 7), by the methods in [5]. The approach we take in this paper to prove universality is the one developed in a recent series of papers [4, 5, 6, 7, 8, 9, 10, 11]. The first step is to derive a strong local MarcenkoPastur law, a precise estimate of the local eigenvalue density, which is our key technical tool for proving universality. En route to this, we also obtain precise bounds on the matrix elements of the corresponding Green function. For proving bulk universality of eigenvalues, the next step is to embed the Covariance matrix into a stochastic flow of matrices and so that the eigenvalues evolve according to a distinguished coupled system of stochastic differential equations, called the Dyson Brownian motion [3]. The central idea in the papers mentioned

3

above is to estimate the time to local equilibrium for the Dyson Brownian motion with the introduction of a new stochastic flow, the local relaxation flow, which locally behaves like a Dyson Brownian motion but has a faster decay to global equilibrium. This approach [6, 8] entirely eliminates the usage of explicit formulas and it provides a unified proof for the universality. For proving edge universality of eigenvalues, we apply a “moment comparison” method based on the Green function, which is similar to the “four moment theorem” of [20, 21]. This idea has been recently used in [11] for proving edge universality of Wigner matrices. More precisely, let X = (xij ) be an M × N matrix with independent centered real valued entries of variance M −1 : xij = M −1/2 qij ,

E qij = 0,

E qij2 = 1 .

(1.1)

Furthermore, the entries qij have a sub-exponential decay, i.e., there exists a constant ϑ > 0 such that for u > 1, P(|qij | > u) 6 ϑ−1 exp(−uϑ ) .

(1.2)

Notice that all our constants may depend on θ, but we will not denote this dependence. Define the Green function of X † X by   1 Gij (z) = , z = E + iη, E ∈ R, η > 0. (1.3) X † X − z ij The Stieltjes transform of the empirical eigenvalue distribution of X † X is given by m(z) :=

1 1 1 X Gjj (z) = Tr † . N j N X X −z

(1.4)

We will be working the regime d = dN = N/M,

lim d 6= 1 .

N →∞

Define  √ 2 λ± := 1 ± d .

The Marchenko-Pastur (henceforth abbreviated by MP) law is given by: s  (λ+ − x)(x − λ− ) + 1 . ̺W (x) = 2πd x2

(1.5)

(1.6)

4

We define mW (z), z ∈ C, as the Stieltjes transform of ̺W , i.e., Z ̺W (x) dx . mW (z) = R (x − z)

(1.7)

The function mW depends on d and has the closed form solution p 1 − d − z + i (z − λ− )(λ+ − z) (1.8) mW (z) = , 2dz √ denotes the square root on complex plane whose branch cut is the negative real where line. One can check that mW (z) is the unique solution of mW (z) +

1 = 0, z − (1 − d) + z d mW (z)

with ℑ mW (z) > 0 when ℑz > 0. Define the normalized empirical counting function by n(E) :=

1 #{λj > E}. N

Let nW (E) :=

Z



ρW (x)dx

(1.9)

(1.10)

E

so that 1 − nW (·) is the distribution function of the MP law. By the singular value decomposition of X, there exist orthonormal bases {u1 , u2 . . . , uM } ∈ CM and {v1 , . . . , vN } ∈ RN such that X =

M p X

λα uα vα

α=1



N p X λα uα vα † , =

(1.11)

α=1

where λ1 > λ2 . . . λmax{M,N } > 0, λα = 0 for min{N, M} + 1 6 α 6 max{N, M} and vα = 0, ¯ if α > N and uα = 0, for α > M. We also define the classical location of the eigenvalues ¯ with ρW as follows Z λ+ Z +∞ ̺W (x) dx = ̺W (x) dx = j/N . (1.12) γj

γj

Define the parameter ϕ := (log N)log log N .

(1.13)

5

For ζ > 0, define the set  (1.14) S(ζ) := z ∈ C : 1d>1 (λ− /5) 6 E 6 5λ+ , ϕζ N −1 6 η 6 10(1 + d) . ¯ Note that mW ∼ O(1) in S(0). ¯ Definition 1.1 (High probability events). Let ζ > 0. We say that an event Ω holds with ζ-high probability if there exists a constant C > 0 such that P(Ωc ) 6 N C exp(−ϕζ )

(1.15) for large enough N.

Our goal is to estimate the following quantities Λd := max |Gkk − mW |,

Λo := max |Gkℓ |,

k

k6=ℓ

Λ := |m − mW |,

(1.16)

where the subscripts refer to “diagonal” and “off-diagonal” matrix elements. All these quantities depend on the spectral parameter z and on N but for simplicity we suppress this in the notation. The following is the main result of this paper: Theorem 1.2 (Strong local Marchenko-Pastur law). Let X = [xij ] with the entries xij satisfying (1.1) and (1.2). For any ζ > 0 there exists a constant Cζ such that the following events hold with ζ-high probability. (i) The Stieltjes transform of the empirical eigenvalue distribution of H satisfies  \  Cζ 1 (1.17) . Λ(z) 6 ϕ Nη z∈S(Cζ ) ¯ (ii) The individual matrix elements of the Green function satisfy that s ! \  ℑ m (z) 1 W (1.18) . Λo (z) + Λd 6 ϕCζ + Nη Nη z∈S(Cζ ) ¯ (iii) The smallest non zero and largest eigenvalue of X † X satisfy λ− − N −2/3 ϕCζ 6

min

j6min{M,N }

λj 6 max λj 6 λ+ + N −2/3 ϕCζ . j

(1.19)

(iv) Delocalization of the eigenvectors of X † X: max kvα k∞ 6 ϕCζ N −1/2 .

α:λα 6=0

(1.20)

6

The main theorem above is then used to the following results: Theorem 1.3 (Rigidity of the eigenvalues of covariance matrix). Recall γj in (1.12). Let X = [xij ] with the entries xij satisfying (1.1) and (1.2). For any 1 6 j 6 N, let e j = min

n

o min{N, M} − j, j .

For any ζ > 0 there exists a constant Cζ such that

and

|λj − γj | 6 ϕCζ N −2/3 e j −1/3

(1.21)

|n(E) − nW (E)| 6 ϕCζ N −1

(1.22)

hold with ζ-high probability for any 1 6 j 6 N. The above two results are stated under the assumption that the matrix entries are independent. The independence assumption (of the elements in each column vector of X) required in Theorems 1.2 and 1.3 may be replaced with the following large deviation criteria. Let us first recall the following large deviation lemma for independent random variables (see [9], Appendix B for a proof). Lemma 1.4. (Large Deviation Lemma) Suppose ai be independent, mean 0 complex variables, with E|ai |2 = σ 2 and have a sub-exponential decay as in (1.2). Then there exists a constant ρ ≡ ρ(ϑ) > 1 such that, for any ζ > 0 and for any Ai ∈ C and Bij ∈ C, the bounds M X

ai Ai 6 (log M)ρζ σkAk

(1.23)

i=1

|

M X i=1

a ¯i Bii ai − |

M X i=1

X i6=j

σ 2 Bii | 6 (log M)ρζ σ 2 ( ρζ

2

a¯i Bij aj | 6 (log M) σ (

M X i=1

X i6=j

|Bii |2 )1/2

(1.24)

|Bij |2 )1/2

(1.25)

hold with ζ-high probability. Next we extend Theorems 1.2 and 1.3 by relaxing the independence assumption:

7

Theorem 1.5. Let X = (xij ) be a random matrix with the entries satisfying (1.1) and assume that the column vectors of the matrix X are mutually independent. Furthermore, suppose that for any fixed j 6 N, the random variables defined by ai = xij , 1 6 i 6 M satisfy the large deviation bounds (1.23), (1.24) and (1.25), for any Ai ∈ C and Bij ∈ C and for any ζ > 0. Then the conclusions of Theorem 1.2 and 1.3 hold for the random matrix X. Thus the above Theorem extends the universality results to a large class of matrix ensembles. For instance, let hij be a sequence of i.i.d random variables from a symmetric distribution and set hij xij = qP M

,

1 6 i 6 M, 1 6 j 6 N .

(1.26)

2 i=1 hij

Thus the entries of the column vector (x1j , x2j , · · · , xM j ) are not independent, but exchangable. Clearly E(xij ) = 0, E(x2ij ) = M1 . The random variables xij given by (1.26), are called self normalized sums and arise in various statistical applications. Proof of Theorem (1.5): Actually in the proof of Theorem 1.2 and 1.3, we only use the large deviation properties of ai = xij instead of independence and sub-expotential decay. Therefore, the proof of Theorem 1.2 and 1.3 is already enough for Theorem (1.5). Theorem 1.6 (Universality of eigenvalues in Bulk). Let X v = [xvij ] with the independent entries satisfying (1.1) and (1.2), as so X w . Let E ∈ [λ− + c, λ+ − c] with some c > 0. Then for any ε > 0, N −1+ε < b < c/2, any integer n > 1 and for any compactly supported continuous test function O : Rn → R we have (1.27) Z lim N →∞

E+b E−b

dE ′ 2b

(n)

Z

O(α1 , . . . , αn ) Rn



(n) pvN



(n) pw,N

 E′ +

αn  Y dαi α1 ′ ,...,E + =0 NρW (E) NρW (E) i ρW (E)

(n)

where pvN and pw,N , are the n-points correlation functions of the eigenvalues of (X v )† X v and (X w )† X w , respectively. Theorem 1.7 (Universality of extreme eigenvalues). Let X v = [xvij ] with independent entries satisfying (1.1) and (1.2), as so X w . Then there is an ε > 0 and δ > 0 such that for any real number s (which may depend on N) we have Pv (N 2/3 (λN − λ+ ) 6 s − N −ε ) − N −δ 6 Pw (N 2/3 (λN − λ+ ) 6 s) 6 Pv (N 2/3 (λN − λ+ ) 6 s + N −ε ) + N −δ

8

(1.28) for N > N0 sufficiently large, where N0 is independent of s. Analogous result hold for the smallest eigenvalue λ1 . Theorem 1.7 can be extended to finite correlation functions of extreme eigenvalues. For example, we have the following extension to (1.28):   2/3 −ε 2/3 −ε v − N −δ P N (λN − λ+ ) 6 s1 − N , . . . , N (λN −k − λ+ ) 6 sk+1 − N   (1.29) 6 Pw N 2/3 (λN − λ+ ) 6 s1 , . . . , N 2/3 (λN −k − λ+ ) 6 sk+1   2/3 −ε 2/3 −ε v + N −δ 6 P N (λN − λ+ ) 6 s1 + N , . . . , N (λN −k − λ+ ) 6 sk+1 + N

for all k fixed and N sufficiently large. The proof of (1.29) is similar to that of (1.28) and we will not provide details except stating the general form of the Green function comparison theorem (Theorem 6.4) needed in this case. We remark that edge universality is usually formulated in terms of joint distributions of edge eigenvalues in the form (1.29) with fixed parameters s1 , s2 , . . . etc. Our result holds uniformly in these parameters, i.e., they may depend on N. However, the interesting regime is |sj | 6 ϕO(1) , otherwise the rigidity estimate (1.21) gives a stronger control than (1.29). The rest of the paper is organized as follows. In Sections 2-4 we establish the strong version of the Marcenko-Pastur law, rigidity and delocalization of eigenvalues. In Section 6-7, we respectively prove the bulk and edge universality results. 2. Apriori bound for the strong local Marcenko-Pastur law. We first prove a weaker form of Theorem 1.2, and in Section 4 we will use this apriori bound to obtain the stronger form as claimed in Theorem 1.2.

Theorem 2.1. Let X = [xij ] with the entries xij satisfying (1.1) and (1.2). For any ζ > 0 there exists a constant Cζ such that the following events hold with ζ-high probability.  \  1 Cζ Λd (z) + Λo (z) 6 ϕ (2.1) (Nη)1/4 z∈S(Cζ ) ¯ Before proceeding, let us introduce some notations. Define H := X † X,

G(z) := (H − z)−1 = (X † X − z)−1 , G(z) := (XX † − z)−1 .

m(z) :=

1 Tr G(z) N

(2.2)

9

We know that the non-zero eigenvalues of XX † and X † X are identical and XX † has M − N more (or N − M less) zero eigenvalues. We then have the identity

M −N . (2.3) z We shall often need to consider minors of X, which are the content of the following definition. Tr G(z) − Tr G(z) =

Definition 2.2 (Minors). Let T ⊂ {1, . . . , N}. Then we define X (T) as the (M × (N − |T|)) minor of X obtained by removing all columns of X indexed by i ∈ T. Note that we keep the names of indices of X when defining X (T) . / T)Xij . (X (T) )ij := 1(j ∈ ¯ (T) (T) (T) The quantities G(T) (z), G (T) (z), λα , uα , vα etc. are defined in the obvious way using ¯ ¯ X (T) . Furthermore, we write abbreviate (i) = ({i}) as well as (iT) = ({i} ∪ T). We also set 1 X (T) m(T) (z) := Gii (z) . (2.4) N i∈T /

We denote by xi as the i-th column of X, which is a M × 1 vector.

2.1. Preliminary Lemmas. We start with the following elementary lemma whose proof is standard: Lemma 2.3. For anyrectangular matrix M, and partition matrices, A, B and D of M A B given by M = , we have the following identity B† D   G−1 −G−1 BD −1 −1 , G = A − BD −1 B † . M = −D −1 B † G−1 D −1 + D −1 B † G−1 BD −1 Lemma 2.4. For any z not in the spectrum of X † X, X(X † X − z)−1 X † = I + z(XX † − z)−1 Proof. Indeed from the SVD decomposition given in (1.11) we have X λα X(X † X − z)−1 X † = uα uα † ¯ ¯ λ − z α α X z )uα uα † = I + z(XX † − z)−1 = (1 + ¯ ¯ λ − z α α

and the lemma is proved.

10

(T)

(T)

The next lemma collects the main identities of the resolvent matrix elements Gij and Gij (z). Lemma 2.5 (Resolvent identities). Gii (z) =

1 , −z − z hxi , G (i) (z) xi i

hxi , G (i) (z) xi i =

i.e.,

−1 −1, z Gii (z)

(i)

Gij (z) = z Gii (z) Gjj (z) hxi , G (ij) (z) xj i, i 6= j Gik (z)Gkj (z) (k) Gij (z) = Gij (z) + , i, j 6= k . Gkk (z)

(2.5) (2.6) (2.7)

First we show (2.5) with i = 1. Let a = x1 and B = (X (1) ). We have X † =   Proof. a† , so that B†  †  a a−z a† B † X X −z = . B†a B†B − z By Lemmas 2.3 and 2.4 G11 (z) = =

1 † † a a − z − a B(B † B − z)−1 B † a 1

−z − z

a† (BB †

− z)−1 a

!

= 11

.

a† a

−z−

a† (1

1 + z(BB † − z)−1 ) a (2.8)

On the other hand, we have hx1 , G (1) (z) x1 i = a† (BB † − z)−1 a ¯ ¯ which together with (2.8) implies (2.5). Next we prove (2.6). From Lemma 3.2 of [11] and H = X † X, we have the identity ! X (i) (ij) Gij = −Gii Gjj hij − hik Gkl hlj (2.9) k,l6=i,j

i.e., (i)

, Zij = x†i X (ij) G(ij) X (ij)† xj = x†i (I + z G (ij) ) xj(2.10) ¯ ¯ ¯ ¯ where the last equality follows from an application of Lemma 2.4. Now (2.6) follows from (2.10). Finally (2.7) is proved in Lemma 3.2 of [11]. Gij (z) = −Gii (z) Gjj (z) (hij − Zij ),

11

Set κ := min(|λ+ − E|, |E − λ− |).

(2.11)

Lemma 2.6 (Properties of mW ). Based on the definition of mW , for z ∈ S(0), (see (1.14)) we have the following bounds: √ |mW (z)| ∼ 1, |1 − m2W (z)| ∼ κ + η (2.12)

ℑ mW (z) ∼

 

√η κ+η

 √

if κ > η and |E| ∈ / [λ− , λ+ ]

κ+η

(2.13)

if κ 6 η or |E| ∈ [λ− , λ+ ].

where A ∼ B denotes C −1 B 6 A 6 CB for some constants C. Furthermore 1 ℑ mW > O( ) and Nη N

∂η

ℑmW 60 η

(2.14)

For z ∈ S(0), define the event ¯

n o B(z) := Λo (z) + Λd (z) > (log N)−1 . ¯ (T)

(2.15)

(T)

Lemma 2.7 (Rough bounds of Λo and Λd ). Fix T ⊂ {1, 2, · · · , N}. For z ∈ S(0), there exists a constant C = CT such that the following estimates hold in Bc : ¯ (T)

max |Gkk − Gkk | 6 CΛ2o k ∈T /

1 (T) 6 |Gkk | 6 C C Λ(T) o 6 CΛo

(2.16) (2.17) (2.18)

Proof. For T = ∅, (2.16) , (2.18) follow from definition, (2.17) follows the definition of B(z) and mW ∼ 1 in (2.12). For nonempty T, one can prove the lemma using an induction ¯ on |T|. For example, for |T| = 1, using (2.7) we can show that (T)

|Gkk (z) − Gkk (z)| 6 CΛ2o , which implies the bound (2.16). A similar argument will yield (2.17), (2.18).

(2.19)

12

On the other hand, in the case of η = O(1), similar result of (2.17) holds without the assumption of Bc . ¯ Lemma 2.8 (Rough bounds for Gkk in large η case). we have the bound

For any z ∈ S(0) and η = O(1),

|Gii (z)| 6 C , for some C > 0. Proof: By definition X u (i)u (i) 1 X 1 α α |Gii | = uα (i)uα (i) 6 6 C 6 λα − z η α η α

where we used |λα − z| > ℑ z = η. Define the quantity

Ψ :=

s

ℑ mW + Λ . Nη

(2.20)

and z Zi := zhxi , G (i) xi i − Tr G (i) . ¯ ¯ M

(2.21)

Remark 2.9. Note that if mW 6 O(1) and Λ 6 O(1), then Ψ 6 O(Nη)−1/2 .

(2.22)

We now identify the “bad sets” (improbable events) and show that they indeed have small probability. Define, for fixed z, the events n o Ωo (z, K) := Λo (z) > KΨ(z) (2.23) n o Ωd (z, K) := max |Gii (z) − m(z)| > KΨ(z) . ij

Lemma 2.10. Let Ω(z, K)c be the good set where

Ω(z, K) := Ωd (z, K) ∪ Ωo (z, K)

(2.24)

13

and Γ(z, K) = Ω(z, K)c ∪ B(z) ¯ For any ζ > 0 there exists a constant Cζ such that \

Γ(z, ϕCζ )

(2.25)

z∈S(Cζ )

¯

holds with ζ-high probability. Proof. We only need to prove that there exists a uniform constant Cζ such that for any z ∈ S(Cζ ) the event ¯ Γ(z, ϕCζ )

(2.26)

holds with ζ-high probability. It is clear that (2.25) follows from (2.26) and the fact that |∂z Gij | 6 N C ,

η > N −1 .

(2.27)

Note Γ(z, K) = (Ωco ∪ B) ∩ (Ωcd ∪ B). First we prove Ωco ∪ B holds with ζ-high probability. Using Lemma 1.4, Equation (2.6) and the fact that |G|2 = G∗ G, we infer that there exists a constant Cζ such that with ζ-high probability, !1/2 X (ij) 1/2 |z| |z| |Gkl |2 Tr |G (ij) |2 6 ϕCζ Λo 6 C|z| max hxi , G (ij) xj i 6 ϕCζ i6=j ¯ ¯ N N kl s ℑ Tr G (ij) 6 ϕCζ |z| (2.28) , in Bc ¯ N 2η where in the last step we used the identity η1 ℑ Tr G (ij) = Tr |G (ij) |2 . Using the identity Tr G(T) (z) − Tr G (T) (z) =

M − N + |T| , z

Equation (2.16) and ℑ(z −1 ) = η|z|−2 we have that with ζ-high probability s 1 ℑ mW + Λ + Λ2o + in Bc . Λo 6 ϕCζ ¯ Nη N

(2.29)

14

For the above choice of Cζ , for z ∈ S(3Cζ ), we have that with ζ-high probability ¯ s ℑ mW + Λ 1 + + o(Λo) in Bc Λo 6 ϕCζ ¯ Nη N Together, with (2.14), we have Ωco ∪ B holds with ζ-high probability. ¯ A similar argument using Lemma 1.4 will give 1 (i) (i) in Bc Tr G 6 ϕCζ Ψ , |Zi | = |z| hxi , G xi i − ¯ ¯ ¯ M

(2.30)

(2.31)

hold with ζ-high probability. Notice that maxi |Gii − m| 6 maxi6=j |Gii − Gjj |. From (2.5) we obtain 1 1 |Gii − Gjj | 6 − (i) (j) −z − z hxi , G (z) xi i −z − z hxj , G (z) xj i   |z| (i) (j) 6 |Gii Gjj | |Zi − Zj | + | Tr G − Tr G | M Cζ 2 −1 6 C(ϕ Ψ + Λo + N ) in Bc ¯ hold with ζ-high probability, where the last inequality follows from (2.31), (2.3), (2.16) and (2.17). The lemma now follows from (2.30) and (2.22). On the other hand, in the case of η = O(1), similar result holds without the assumption of Bc . ¯ Lemma 2.11. Let Ωo (z) and Ωd (z) be as in (2.23). For any ζ > 0, there exists a constant Cζ such that the event \ c \ Ωd (z, ϕCζ ) ∪ Ωo (z, ϕCζ ) {max |Zi | 6 ϕCζ Ψ} (2.32) i z∈S(0),η>1 ¯ holds with ζ-high probability. Proof. With (2.27), we only need to prove (2.32) for fixed z. First we note in this case, we have ℑmW ∼ 1 and Λ = O(1) and therefore Ψ ∼ N −1/2 . It follows from (2.28) and Lemma 2.8 we have r ℑ Tr G (ij) Cζ Λo 6 ϕ 6 ϕCζ N −1/2 6 ϕCζ Ψ. 2 N

(2.33)

15

The Zi part can be proved as in (2.31), with Lemma 2.8. The Ωd part can also be proved similarly as in above proof, where for Tr G (i) − Tr G (j) , we used Tr G (i) − Tr G (j) = Tr G(i) − Tr G(j) = O(η)−1 which follows the interlacing theorem of the eigenvalue, i.e., |m − m(i) | 6 (Nη)−1

(2.34)

2.2. Self consistent equations. In last subsection, we have obtained the bound of Λo and maxi (Gii − m) in term of mW , η and Λ in B c . In this subsection, we will give the desired bound for Λ and show B c holds with ζ-high probability. First we give the bound for Λ in the case of η = O(1). Lemma 2.12. For any ζ > 0, there exists a constant Cζ such that \ Λ(z) 6 ϕCζ N −1/4 z∈S(0),η=10(1+d) ¯ hold with ζ-high probability.

(2.35)

Proof. Recall (2.33). By definition and (2.5), 1 X 1 1 X . m(z) = Gii (z) = 1 N i N i −z − z M Tr G (i) − Zi Using (2.29) and (2.34), we obtain 1 (i) z 6 CN −1 . Tr G − zd m(z) + 1 − d M Together with |Zi | 6 ϕCζ Ψ (see (2.32)), we have 1 1 X m(z) = , N i 1 − z − d − zdm(z) + Yi

max |Yi | 6 ϕCζ Ψ . i

Since |m| 6 η −1 , we have 1 − z − d − zdm(z) > O(1), then m(z) = which implies (2.35).

1 + O(ϕCζ Ψ). 1 − z − d − zdm(z)

(2.36)

16

Now combining (2.35) with (2.32), we have proved that For any ζ > 0, there exists a constant Cζ such that, in the case η = 10(1 + d), (2.1) hold with ζ-high probability. It immediately implies that \ B c (z) (2.37) z∈S(0),η=10(1+d) ¯ hold with ζ-high probability. Now we prove (2.1) for general η. For a function u(z), define its “deviance” to be D(u)(z) := (u−1(z) + zd u(z)) − (mW −1 (z) + zd mW (z))

(2.38)

Clearly, D(mW ) = 0. Recall Zi from (2.21) and define [Z] =

N 1 X Zi . N i=1

(2.39)

Recall the set B(z) from (2.15) and Γ(z) from Lemma 2.10. ¯ Lemma 2.13. Let 1 6 K 6 (log N)−1 (Nη)−1/2 , on the set Γ(z, K) (see (2.24)), |D(m)| 6 O([Z]) + O(K 2 Ψ2 ) + ∞ 1B(z) ¯ Proof. Using (2.5), (2.16), (2.29) and the definition of mW , we have that on the set Γ(z, K) Gii (z)−1 = mW (z)−1 + zd [mW (z) − m(z)] + O(K 2 Ψ2 ) + O(Zi ) + O(N −1 ) in Bc ∩ Ωc . ¯ Then −1 G−1 = D(m) + O(K 2 Ψ2 ) + O(Zi ) + O(N −1 ) in Bc ∩ Ωc ii − m ¯ and summation over i yields N 1 X −1 (Gii − m−1 ) = D(m) + O(K 2 Ψ2 ) + O(Zi ) + O(N −1 ) in Bc ∩ Ωc . ¯ N i=1

(2.40)

−1/2 It follows from the assumptions K ≪ (Nη) ≪ Ψ that Gii − m = o(1). Expanding the P left hand side and using the facts that i (Gii − m) = 0, N X i=1

(G−1 ii

−1

−m )=

N X Gii − m i=1

N N X 1 X (Gii − m)3 2 = 3 (Gii − m) + O( ) in Bc ∩ Ωc 4 ¯ Gii m m i=1 m i=1

17

Together with (2.17) and (2.23), it follows that N 2 1 X −1 (Gii − m−1 ) = C KΨ (1 + KΨ) N i=1

in Bc ∩ Ωc ¯

(2.41)

Now the lemma follows from (2.40), (2.41) and the assumptions K ≪ (Nη)−1/2 6 O(Ψ). The two solutions m1 , m2 of the equation D(m) = δ(z) for a given δ(·) are given by p δ(z) + 1 − d − z ± i (z − λ−,δ )(λ+,δ − z) (2.42) m1,2 = 2dz p λ±,δ = 1 + d ± 2 d − δ(z) − δ(z) , |λ±,δ − λ± | = O(δ) .

Lemma 2.14. Let K, L > 0, such that ϕL > K 2 (log N)4 , where L and K may depend on N. In any subset A of \ \ Γ(z, K) ∩ B c (z) (2.43) z∈S(L)

z∈S(L),η=10(1+d)

suppose we have the bound |D(m)(z)| 6 δ(z) + ∞ 1B(z) ∀z ∈ S(L) ¯ ¯ where δ : C 7→ R+ is a continuous function, decreasing in ℑ z and |δ(z)| 6 (log N)−8 . Then for some uniform C > 0 |m(z) − mW (z)| = Λ 6 C(log N) √

δ(z) κ+η+δ

∀z ∈ S(L) . ¯

(2.44)

holds in A and A⊂

\

Bc

(2.45)

z∈S(L)

Note: The difficulty in the proof is that the bound D(m) 6 δ(z) only in the set B but we ¯ need to prove (2.45) Proof. Let us first fix E and define the set  IE = η : Λo (E + iˆ η ) + Λd (E + iˆ η) 6

1 , log N

∀ˆ η > η , E + iˆ η ∈ S(L) . ¯

18

We first prove (2.44) for all z = E + iη with η ∈ IE . Define n o η1 = sup η : δ(E + iη) > (log N)−1 (κ + η) . η∈IE

Since δ is a continuos decreasing function of η by assumption, δ(E + iη) 6 (log N)−1 (κ + η1 ) for η > η1 . Let m1 and m2 be the two solutions of the equation D(m) = δ(z) as given in (2.42). (Note that since we are in B by assumption we do have D(m) 6 δ(z).) Then it can ¯ be easily verified that √ η > η1 (2.46) |m1 − m2 | > C κ + η, p 6 C(log N) δ(z), η 6 η1 . The difficulty here is that we don’t know which of the two solutions m1 , m2 is equal to m. However for η = O(1), we claim that m = m1 . With assumption, |m − mW | = Λ 6 Λd ≪ 1. Also a direct calculation using (2.42) gives 1 δ(z) ≪ . |m1 − mW | = C √ κ+η log N

(2.47)

√ Since |m1 −m2 | > C κ + η for η = O(1)(see (2.46)), it immediately follows that m = m1 for η = O(1). Furthermore since the functions m1 , m2 and m are continuous and since m1 6= m2 , it follows that m = m1 for η > η1 . Thus η > η1 , δ(z) δ(z) 6 C√ |m(z) − mW (z)| = |m1 (z) − mW (z)| 6 C √ κ+η κ+η+δ where in the last step we have used δ 6 κ + η. For η 6 η1 , we take advantage of the fact that the difference |m1 − m2 | is the same order as in (2.47). Indeed, for η 6 η1 , if m = m2 (say), then |m − mW | 6 |m2 − m1 | + |m1 − mW | 6 (log N)

p

δ(z) 6 C(log N) √

δ(z) κ+η+δ

verifying (2.44) for η ∈ IE . Now we prove that IE equals to the desired region [ϕL N −1 , 5], i.e. (2.45). We argue by contradiction. If not, let η0 = inf IE , with continuity, we have Λo (z0 ) + Λd (z0 ) = (log N)−1 , z0 = E + iη0

(2.48)

and thus Λ(z0 ) 6 Λd (z0 ) 6 (log N)−1 . On the other hand, with above result, i.e, (2.44) holds for η ∈ IE , we have Λ(z0 ) 6 (log N)−3

(2.49)

19

By definition

and therefore

 Λo (z0 ) + Λd (z0 ) = (log N)−1 ∩ Γ(z0 ) = (Ωo (z0 ) ∪ Ωd (z0 ))c , Λo (z0 ) + max |Gkk (z0 ) − m(z0 )| 6 CK Ψ(z0 ) . k

q

ℑ mW 0) + Λ(z ≪ K −1 (log N)−2 With assumptions ϕ > K (log N) , we have Ψ(z0 ) 6 Nη Nη which immediately implies that Λo (z0 ) + maxk |Gkk (z0 ) − m(z0 )| ≪ (log N)−1 . Using this estimate and (2.49) we deduce that L

2

4

Λo (z0 ) + Λd (z0 ) 6 Λo (z0 ) + max |Gkk (z0 ) − m(z0 )| + Λ ≪ log N −1 k

which contradicts (2.48) and concludes the proof of the lemma.

Proof of Lemma 2.1: Now we complete the proof of Lemma 2.1. It follows from (2.31), eζ that Lemma 2.10 and 2.13 that for any ζ > 0, there exists Cζ , Dζ and C \ e |D(m)(z)| 6 ϕCζ Ψ + ∞ 1B(z) ¯ z∈S(Cζ ) ¯ holds on \ Γ(z, ϕDζ ) (2.50) z∈S(Cζ ) ¯ which is with ζ-high probability. Choosing larger Cζ , applying Lemma 2.14 with choosing A being (2.43), with (2.14), we obtain that for some Cζ , Λ(z) 6 ϕCζ Ψ1/2 ,

∀z ∈ S(Cζ ) (2.51) ¯ holds on (2.43). Using (2.50) and (2.35), we obtain that for any ζ > 0, there exists Cζ such that (2.51) holds with ζ-high probability. Furthermore, (2.45) implies \ B c (z) z∈S(Cζ ) ¯ is with ζ-high probability. Together with (2.25), (2.51), we obtain (2.1) and complete the proof of Lemma 2.1.

20

3. Strong bound on [Z]. For proving Theorem 1.2 and 1.3, the key input is the following lemma which gives a much stronger bound of [Z]. The following is the main result of this section: Lemma 3.1. Let K, L > 0, such that ϕL > K 2 (log N)4 . Suppose for some event Ξ ⊂ ∩z∈S(L) (Γ(z, K) ∩ B c (z)) , ¯

we have

˜ Λ(z) 6 Λ(z), ∀z ∈ S(L) ¯ 2 ˜ with some deterministic number Λ(z) and P(Ξc ) 6 e−p(log N ) where p depends on N and 1 ≪ p ≪ (log NK)−1 ϕL/2 .

Then there exists a subset Ξ′ of Ξ such that P(Ξ \ Ξ′ ) 6 e−p and for any z ∈ S(L), ¯ s ˜ z[Z] 6 Cp5 K 2 Ψ e 2, Ψ e := ℑ mW + Λ , in Ξ′ Nη

(3.1)

(3.2)

Note: In the application of this lemma, pN and K = O(ϕO(1) ). First, we are going to introduce the abstract Z lemma, which is similar to Theorem 5.6 of [4]. Also see [11] for a similar lemma for generalized Wigner matrices. Theorem 3.2 (Abstract decoupling lemma). N and Ii ⊂ I,

Let I be finite set which may depend on

16i6N

Let Z1 , . . . , ZN be random variables which depend on the independent random variables {xα , α ∈ I}. Let Ei denote the expectation value respect to {xα , α ∈ Ii } and IEi = 1 − Ei . Define the commuting projection operators Qi = IEi , Pi = Ei ,

Pi2 = Pi , Q2i = Qi ,

and for A ⊂ {1, 2, . . . , N} QA :=

Y i∈A

We use the notation

Qi ,

[Qi , Pj ] = [Pi , Pj ] = [Qi , Qj ] = 0

PA :=

Y

Pi

i∈A

N 1 X [QZ] = Qi Zi . N i=1

Let Ξ be an event and p an even integer. Suppose following assumptions hold with some constants C0 , c0 > 0.

21

(i) (Bound on QA Zi in Ξ). There exist deterministic positive numbers X < 1 and Y such that for any set A ⊂ {1, 2, . . . , N} with i ∈ A and |A| 6 p, QA Zi in Ξ can be written as the sum of two new random variables 1(Ξ)(QA Zi ) = Zi,A + 1(Ξ)QA 1(Ξc )Zei,A

and

|Zi,A | 6 Y C0 X|A| (ii) (Rough bound on Zi ).

|A|

(3.3)

|A|

|Zei,A | 6 Y C0 N C0

,

(3.4)

max |Zi | 6 Y N C0 .

(3.5)

i

(iii) (Ξ has high probability). 3/2 p

P[Ξc ] 6 e−c0 (log N )

.

(3.6)

Then, under the assumptions (i) – (iii), we have # "  p E 1(Ξ)[QZ]p 6 (Cp)4p X 2 + N −1 Y p

(3.7)

for some C > 0 and any sufficiently large N. Before we give the proof, we introduce a trivial but useful identity " s−1 ! !# n n+1 n Y X Y Y Y (xi + yi ) = xi yi(=s) (xi + yi ) s=1

i=1

i=1

i=s

(3.8)

i=s+1

Q with the convention that i∈∅ = 1. It implies that n n Y   Y (xi ) 6 n max |yi | max |xi + yi | + max |xi | (xi + yi ) − i i i i=1

i=1

For any 1 6 k 6 n, it follows from

Qn

i=1 (xi

n n Y X (xk + yk ) (xi + yi ) = i=1

s6=k,s=1

"

+ yi ) = (xk + yk )

s−1 Y

i6=k,i=1

xi

!

Y i=s

yi(=s)

Q

i6=k (xi n Y

+ yi ) and (3.8) that (xi + yi )

i6=k,i=s+1

!#

(3.9)

22

Proof of Lemma 3.2 First, by definition, we have # " p Y 1 X E1 Ξ (3.10) Qjα Zjα E 1(Ξ)[QZ]p = p N j ,...,j α=1 p

1

For fixed j1 , . . . , jp , let Tα = Qjα Zjα . Now using (3.9) with choosing k = 1, xi , = Pj1 Ti and yi = Qj1 Ti in (3.9) (Note: here xi + yi = Ti ), we have " ! !# p+1 p X Y Y Y T1 Pj1 Tα (Qj1 Ts ) Tα Tα = s=2

α=1

αs,α6=1

We define Aα,s := 1{α |Bα ∪ {jα }| > 2t, t := |{j1 , . . . , jp }| (3.15) α

24

Now it only remains to prove (3.13) under the condition (3.15). First, we write E1 Ξ

Y

PAα QBα Tα = E Ξ

p Y

(PAα QBeα Zjα ),

α=1

α

eα := Bα ∪ {jα } B

Using (3.8) with x = P ΞQZ and y = P Ξc QZ (x + y = P QZ ), we have EΞ

p Y

(PAα QBeα Zjα ) =

α=1

p+1 X s=1

! p s−1  Y  Y (PAi QBei Zj(3.16) ) EΞ (PAi (Ξ)QBei Zji ) PAs (Ξc )QBes Zjs i i=s+1

i=1

First for s 6 p, one can use the following fomular: for any f and h p E|h(P Ξc Qf )| 6 khk∞ k(Ξc Qf )k2 6 P(Ξc ) kf k∞ khk∞

Let

p s−1 Y Y (PAi QBei Zji ), h= (PAi (Ξ)QBei Zji ) i=s+1

i=1

f = Zjs ,

P = PA s ,

Q = QBes

By (3.5) and p > 1, we have |h| 6 Y p−1 N Cp ,

|f | 6 Y N C

P Then with (3.6), we proved that ps=1 of r.h.s of (3.16) is bounded above by Y p N Cp exp[−c(log N)3/2 p] which can be neglected in proving (3.13). Then it only remains to bound the r.h.s of (3.16) in the case s = p + 1, i.e., to prove p Y (3.17) (PAα ΞQBeα Zjα ) 6 (Cp)2p Y p X 2t , t := |{j1 , . . . , jp }| EΞ α=1

under the assumption (3.15). Using (3.3) and (3.8), with x = P ΞZ and y = P ΞQΞc Ze we can write the l.h.s. of (3.17) as (3.18)



p Y

α=1 p+1

=

X s=1

(PAα ΞQBeα Zjα ) EΞ

s−1  Y i=1

PAi (Ξ)Zji ,Bei



PAs (Ξ)QBes (Ξ )Zejs ,Bes c

p  Y

(PAi ΞQBei Zji )

i=s+1

!

25

Now we repeat the argument for (3.16). For s 6 p, one can use the following fomula: for any f and h p E|h(P ΞQΞc f )| 6 khk∞ k(Ξc f )k2 6 P(Ξc ) kf k∞ khk∞ Let

h=Ξ

s−1  Y

PAi (Ξ)Zji ,Bei

i=1

p  Y

i=s+1

 PAi ΞQBei Zji ,

With the assumption in (3.4), we know

Pp

s=1

f = Zejs ,Bes ,

P = PA s ,

Q = QBes

of r.h.s of (3.18) is bounded above by

Y p N Cp exp[−c(log N)3/2 p] which can be neglected in proving (3.17). For the main term, i.e., s = p + 1 in r.h.s. of (3.18), using (3.4) and (3.15), we have EΞ

p Y

(PAα ΞZjα ,Beα ) 6 (CY )p (C0 Xp)2t 6 (CY p2 )p X 2t

α=1

and complete the proof of Theorem 3.2.

We note that, with(2.5) and (2.21), we can write zZ as zZi = Qi

h −1 i , Gii

Qi := 1 − Pi ,

Pi := Exi

(3.19)

Lemma 3.3. Let Zi = (Gii )−1 , Pi and Qi defined as in (3.19). We assume that η = ℑ z > 3/2 N −C for some C > 0. Suppose there exists p and Ξ, such that P(Ξc ) 6 e−p(log N ) and in Ξ max |Qi Zi | 6 CY X, i

Λo (z) 6 CX ≪ 1, mini |Gii (z)|

min |Gii (z)| > Y −1 , i

p6

C (3.20) (log N)X

where X and Y are deterministic numbers. Then there exists Ξ′ ⊂ Ξ with P((Ξ′ )c ) 6 e−p and 1 X  5 X 2 + N −1 Y (3.21) Q Z i i 6 Cp N i

26

Proof: We are going to apply Theorem 3.2. Then (3.21) follows from (3.7) and Markov inequality. First one can easily prove (3.5) and (3.6). It only remains to show that for i ∈ A ⊂ {1, 2 · · · , N} and |A| 6 p there exists Zi,A and Zei,A 1(Ξ)(QA Zi ) = Zi,A + 1(Ξ)QA (Ξc )Zei,A ,

Zi,A 6 Y C0 X|A|

|A|

,

C Zei,A 6 Y C |A| N(3.22)

It holds in the case A = {i} by the assumption. Then we assume that |A| > 2. As in Lemma 5.1 in [4] let A = A(H) = A(X † X) be a quantity defined with X † X, we define X  (A)S,U := (−1)|V | A(V ) , A(V ) := A (X (V ) )† (X (V ) ) S/U ⊂V ⊂S

then A=

X

(A)S,U

U ⊂S

By definition, (A)S,U is independent of the j-th column of X if j ∈ S/U. Therefore, QS A = QS (A)S,S In our case, QA/{i} Zi = QA



1 Gii

A/{i},A/{i}

,

Then we choose Zi,A



1 := 1(Ξ)QA Ξ Gii

A/{i},A/{i}

,

Zei,A :=



1 Gii

A/{i},A/{i}

It is easy to prove the bound for Zei,A in (3.22) with definition. For the bound of Zi,A , it only remains to prove that for 2 6 |A| 6 pN  A/{i},A/{i} |A| 1 (3.23) 1(Ξ) 6 Y C0 X|A| Gii To prove it, we first show that for |T | 6 p, (T )

max |Gij | 6 C max |Gij | i,j ∈T /

i,j

(T )

min |Gii | > c min |Gii |, i∈T /

i

(3.24)

27

We start from |T | = 1, i.e., T = {k}. First using (2.7) and the assumptions of this lemma, we have (Gii )−1 =

−Gij Gji

(j) Gii Gjj Gii

(j)

(j)

+ (Gii )−1 = (1 + O(X 2 ))(Gii )−1

and (k) |Gij |

It shows that

G G ik kj 6 Λo (1 + O(X)) = Gij − Gkk

(k)

max |Gij | 6 (1 + O(X)) max |Gij |, i,j

i,j6=k

(k)

min |Gii | > (1 − O(X)) min |Gii | i

i6=k

Then with induction on |T | and the assumption Xp ≪ 1, we obtain the desired result (3.24). Now we return to prove (3.23) from the case |A| = 2. If i 6= j , with (2.7), (3.24) and (3.20), we have 

1 Gii

j,j

(j)

= (Gii )−1 − (Gii )−1 =

−Gij Gji

(j) Gii Gjj Gii

6 O(Y X 2 )

The general case has been proved in Lemma 5.11 of [4], which gives that 

1 Gii

A/{i},A/{i}

 |A| (T ) maxi,j ∈T,T |G | / ⊂A/{i} ij 6 (C|A|)|A|  |A|+1 (T ) minj ∈T,T |G | / ⊂A/{i} jj

Together with (3.24) and (3.20), we obtain (3.23) and complete the proof. At last we need (V ) to point out that the definition of Gij (ij ∈ / V ) in [4] is different from this paper, though they are equivalent. The one we use in this paper is defined as G(V ) = (X (V ) )† (X (V ) ) − z In [4], it is defined as G(V ) = H (V ) − z

−1

−1

where H (V ) is the minor of H obtained by removing all i − th row and columns of H indexed by i ∈ V . But one can see that if H = X † X then H (V ) = (X (V ) )† (X (V ) ).

28

e and Y = C Proof of Lemma 3.1: It is a special case of Lemma 3.3 with X = K Ψ for some large C. First maxi |Qi Zi | 6 CY X is proved in (2.31). By assumptions, Ξ ⊂ ∩z∈S(L) (Γ(z, K) ∩ B c (z)), then ¯ e = X 6 CK(Nη)−1/2 ≪ 1 Λo , Λd 6 KΨ 6 K Ψ in Ξ. Then we have

Λo (z) 6 CX ≪ 1, mini |Gii (z)|

min |Gii (z)| > Y −1 i

Furthermore, (3.1) and η > N −1 ϕL (since z ∈ S(L)) imply p 6 C((log N)X)−1 and completes the proof. 4. Strong Marcenko Pastur law and rigidity of eigenvalues. 4.1. Proof of Theorem 1.2 . First, we assume ζ > 1. With lemma 2.10 and lemma 2.1, for any ζ > 0, there exists Cζ , such that Ξ1 ⊂

\

z∈S(Cζ )

B c (z) ∩ Γ(z, Cζ )

(4.1)

¯

holds with (ζ + 2)-high probability. Then with Lemma 2.13, for z ∈ S(3Cζ ), we have ¯ |D(m)(z)| 6 ϕ2Cζ Ψ2 + O[Z],

in Ξ1

Let Λ1 = 1, then Λ 6 Λ1 in Ξ1 . Therefore, we can apply Lemma 3.1 with p = p1 = − log[1 − P(Ξ1 )]/(log N)2 Without loss of generality, we can assume that P(Ξ1 ) is not too close to 1, otherwise, we can choose Ξ as a subset of itself. Then with Definition 1.1, we have p1 = Cϕζ+2 /(log N)2 We assume that Cζ > 6ζ then (3.1) holds and (3.2) gives that for z ∈ S(3Cζ ), we have that ¯ for some Ξ2 ⊂ Ξ1 ,

P(Ξ2 ) = e−p1

29

and [Z] 6 ϕ2Cζ +11ζ Ψ21 ,

Ψ1 :=

s

ℑ mW + Λ1 , Nη

in Ξ2

Since in Ξ2 ⊂ Ξ1 , by assumption, Λ 6 Λ1 , then Ψ 6 Ψ1 in Ξ2 ⊂ Ξ1 , i.e., |D(m)(z)| 6 ϕ2Cζ +11

ℑ mW + Λ1 , Nη

in Ξ2

(4.2)

Then applying Lemma 2.14, (2.45) shows that for z ∈ S(3Cζ ) ¯ 1/2

Λ(z) 6 Λ2 (z) := ϕCζ +6ζ Λ1 (Nη)−1/2 ,

in Ξ2

Repeating this process, by choosing p2 = − log[1 − P(Ξ2 )]/(log N)2 = Cϕζ+2 /(log N)4 we have that there exists Ξ3 such that Ξ3 ⊂ Ξ2 ,

P(Ξ3 ) = e−p2

for z ∈ S(3Cζ ) ¯ 1/2

Λ(z) 6 Λ3 (z) := ϕCζ +6ζ Λ2 (Nη)−1/2 6 ϕ2Cζ +12ζ (Nη)−3/4 ,

in Ξ3

Now we iterate this process K times, K := log log N/ log 1.9. For k 6 K, we have that for some Ξk ⊂ Ξk−1 ,

P(Ξk ) = e−pk−1

where pN,k = − log[1 − P(Ξk−1 )]/(log N)2 = Cϕζ+2 /(log N)2k > ϕζ and for z ∈ S(3Cζ ) ¯ k

1/2

Λ(z) 6 Λk+1 (z) := ϕCζ +6ζ Λk (Nη)−1/2 6 ϕ2Cζ +12ζ (Nη)−1+(1/2) , Note: K

N (1/2) 6 ϕ

in Ξk+1

30

Then for k = K, for z ∈ S(3Cζ ) ¯ K

Λ(z) 6 Λk+1 (z) 6 ϕ2Cζ +12ζ (Nη)−1+(1/2) 6 ϕ2Cζ +12ζ+1 (Nη)−1

(4.3)

holds with ζ-high probability and complete the proof of (1.17). Furthermore, since ΞK+1 ⊂ Ξ1 with (4.1), we obtain (1.18). Now we suppose (1.19) holds and prove (1.20) first. Using (1.18), we have max

λ− /56E65λ+

ℑ Gii (E + iϕCζ N −1 ) 6 C

(4.4)

By definition, ℑ Gii =

X α

|vα (i)|2 η (λα − E)2 + η 2

Then choosing E = λα and η = ϕCζ N −1 , with (4.4), we have for any α |vα (i)|2 6 η = ϕCζ N −1 which implies (1.20). Here the (1.19) guarantees λ− /5 6 E 6 5λ+ . Now it only remains to prove (1.19). The proof proceeds via the following four steps. Step one of proof on (1.19): First we prove (4.5)

λ− − N −2/3 ϕCζ 6 min{λj : 1d>1 λ− /5 6 λj 6 5λ+ }

6 max{λj : 1d>1 λ− /5 6 λj 6 5λ+ } 6 λ+ + N −2/3 ϕCζ

By repeating the iteration one more time, i.e., replace Λ1 in (4.2) with Λk+1 in (4.3), we have |D(m)(z)| 6 ϕ



ℑ mW +

1 Nη



for some large Cζ . With (2.44) again, we obtain that for some Dζ > 1   ℑ mW δ 1 Dζ , δ := Λ(z) 6 ϕ √ + Nη (Nη)2 κ+η+δ For any E : E > λ+ + N −2/3 ϕ4Dζ , we choose z = E + iη and η := ϕ−Dζ N −1/2 κ1/4 ,

κ = E − λ+

(4.6)

31

then it is easy to check, with κ > N −2/3 ϕ4Dζ , that √



κ ≫ ϕ η,



κ ≫1 Nη 2



Nη κ ≫ ϕ ,

(4.7)

With (2.13) and κ > η, we have ηE ℑ mW (z) = C √ κ

(4.8)

which implies δ6

C √ + (Nη)−2 . N κ

Therefore, κ > δ. Together with (4.6) and (4.7), we have   η 1 1 1 Dζ √ Λ(z) 6 Cϕ + ≪ κ Nη κ Nη Nη Combining (4.8) and the last inequality of (4.7), we have ℑ mW (z) ≪

1 Nη

Therefore, we obtain ℑ m(z) ≪

1 Nη

Note: if ℑ m(z) < (2Nη)−1 , (z = E + iη) then the number of the eigenvalue in [E − η, E + η] is zero, which follows from ℑm =

1 X η > N α (λα − E)2 + η 2

X

α:|λα −E|6η

1 2Nη

(4.9)

Since it holds for any E > λ+ + N −2/3 ϕ4Dζ . Then we have proved that for any ζ > 0, there exists some Dζ > 0 such that max{λj : λj 6 5λ+ } 6 λ+ + N −2/3 ϕ4Dζ hold with ζ-high probability. An analogous bound for the smallest eigenvalue can be proved similarly.

32

Step two of proof on (1.19): Recall n(E) in (1.9) and nW (E) in (1.10). Then we prove that C(log N)ϕCζ , E1 , E2 ∈ [1d>1 λ− /4, 4λ(4.10) (n(E1 ) − n(E2 )) − (nW (E1 ) − nW (E2 )) 6 +] N which implies that # {j : λj ∈ / [1d>1 λ− /5, 5λ+ ]} 6 ϕCζ

(4.11)

We note that though we only need (4.11) for (1.19), but (4.10) is very useful for Theorem 1.3. Then we prove it at here. The proof is similar to the one of Theorem 2.2 in [11]. We now translate the information on the Stieltjes transform obtained in Theorem 1.2 to prove (4.10) on the location of the eigenvalues. Lemma 4.1. Let ̺∆ be a signed measure on the real line. For any E1 , E2 ∈ [A1 , A2 ], with |A1,2 | 6 O(1) and η = N −1 we define f (λ) = fE1 ,E2 ,η (λ) to be a characteristic function of [E1 , E2 ] smoothed on scale η, i.e., f ≡ 1 on [E1 + η, E2 − η], f ≡ 0 on R \ [E1 , E2 ] and |f ′ | 6 Cη −1 , |f ′′| 6 Cη −2 . Let m∆ be the Stieltjes transform of ̺∆ . Suppose for some positive number U (may depend on N) we have |m∆ (x + iy)| 6 Then

with some constant C.

CU Ny

for y < 1,

x ∈ [A1 , A2 ] ,

Z ∆ fE1 ,E2 ,η (λ)̺ (λ)dλ 6 CU| log η| N R

(4.12)

(4.13)

Proof of Lemma 4.1: For simplicity, we drop the ∆ superscript in the proof. Using HelfferSjostrand functional calculus, let χ(y) be a smooth cutoff function with support in [−1, 1], with χ(y) = 1 for |y| 6 1/2 and with bouded derivatives. Z 1 iyf ′′ (x)χ(y) + i(f (x) + iyf ′ (x))χ′ (y) f (λ) = 2π R2 λ − x − iy

Since f and χ are real, Z f (λ)̺(λ)dλ 6 (4.14)

Z

(|f (x)| + |y||f ′(x)|)|χ′ (y)||m(x + iy)|dxdy Z Z ′′ +C yf (x)χ(y) ℑ m(x + iy)dxdy Z|y|6η Z ′′ +C yf (x)χ(y) ℑ m(x + iy)dxdy ,

C

R2

|y|>η

R

33

The first term is estimated by, with (4.12), Z (|f (x)| + |y||f ′(x)|)|χ′ (y)||m(x + iy)|dxdy 6 CU.

(4.15)

R2

For the second term in r.h.s of (4.14) we use that from (4.12) it follows for any 1 > y > 0 that y| ℑ m(x + iy)| 6 CU.

(4.16)

suppf ′ (x) ⊂ {|x − E1 | 6 η} ∪ {|x − E2 | 6 η},

(4.17)

With |f ′′ | 6 Cη −2 and we get second term in r.h.s of (4.14) 6 CU. Now we integrate the third term in (4.14) by parts first in x, then in y. Then we bound it with absolute value by Z Z ′ C η|f (x)|| Re m(x + iη)|dx + C y|f ′(x)χ′ (y) Re m(x + iy)| (4.18) 2 R R Z Z C + | Re m(x + iy)|dxdy. η η6y61 suppf ′ By using (4.12) and (4.17) in the first term, (4.15) in the second and (4.12) in the third, we have Z Z 1 −1 dy 6 CU| log η| dx (4.18) 6CU + CUη suppf ′ η6y61 yN We will apply this lemma with [A1 , A2 ] = [1d>1 λ− /4, 4λ+ ] and the choice that the signed measure is the difference of the empirical density and the WP law, 1 X ̺∆ (dλ) = ̺(dλ) − ̺W (λ)dλ, ̺(dλ) := δ(λi − λ). N i Now we prove that (4.10) holds. By Theorem 1.2, the assumptions of Lemma 4.1 hold for the difference m∆ = m − mW U = ϕCζ if y > y0 := ϕCζ /N. For y 6 y0 , set z = x + iy, z0 = x + iy0 and estimate Z y0  ∂η m(x + iη) − msc (x + iη) dη. (4.19) |m(z) − mW (z)| 6 |m(z0 ) − mW (z0 )| + y

34

Note that 1 X ∂η Gjj (x + iη) |∂η m(x + iη)| = N j 1 X 1 1 X |Gjk (x + iη)|2 = ℑ Gjj (x + iη) = ℑ m(x + iη), 6 N jk Nη j η and similarly Z |∂η mW (x + iη)| =

Z ̺W (s) 1 ̺W (s) ds 6 ds = ℑ mW (x + iη). 2 2 (s − x − iη) |s − x − iη| η

Now we use the fact that the functions y → y ℑ m(x + iy) and y → y ℑ mW (x + iy) are monotone increasing for any y > 0 since both are Stieltjes transforms of a positive measure. Therefore the integral in (4.19) can be bounded by (z0 = x + iy0 ) Z Z y0    y0 dη dη  ℑ m(x + iη) + ℑ mW (x + iη) 6 y0 ℑ m(z0 ) + ℑ mW (z0 ) (4.20) η η2 y y By definition, ℑ msc (x + iy0 ) 6 |msc (x + iy0 )| 6 C. By the choice of y0 and Theorem 1.2, we have ϕCζ 6C ℑ m(x + iy0 ) 6 ℑ msc (x + iy0 ) + Ny0

(4.21)

with very high probability. Together with (4.20) and (4.19), this proves that (4.12) holds for y 6 y0 as well if U is increased to U = CϕCζ . The application of Lemma 4.1 shows that for any η > 1/N Z Z Cζ fE1 ,E2 ,η (λ)̺(λ)dλ − fE1 ,E2 ,η (λ)̺sc (λ)dλ 6 C(log N)ϕ . (4.22) N R R

With the fact y → y ℑ m(x + iy) is monotone increasing for any y > 0, (4.21) implies a crude upper bound on the empirical density. Indeed, for any interval I := [x − η, x + η], with η = 1/N, we have   CϕCζ . n(x + η) − n(x − η) 6 Cη ℑ m x + iη 6 Cy0 ℑ m x + iy0 6 N

Together with (4.22), we have proved (4.10).

(4.23)

35

Step three of proof on (1.19): Now we prove λj 6 5λ+ , holds with ζ-high probability. Since in previous proof, 5 is not a special number, then it only remains to prove that for some large K, the following bound holds with ζ-high probability, λj 6 Kλ+ . Let z = E + iη,

E > Kλ+ ,

η = EN −2/3

(4.24)

With (4.10) with E1 = λ− and E2 = Kλ+ , we have proved that there are at least ϕO(1) eigenvalues larger than Kλ+ . Then by definition, ℑ m(z) 6

Cη ϕCζ + , E2 Nη

| Re m(z)| 6 CE −1 +

ϕCζ 6 O(E −1 ), Nη

E > Kλ+

so as m(T) for |T| = O(1). Now using Lemma 1.4, as in (2.28) and (2.31), we have   ϕCζ ϕCζ −1 −1/2 , hxi |G (i,j) |xj i 6 E −1 N −1/2 + |Zi | 6 |E| E N + Nη Nη

(4.25)

(4.26)

First we estimate Gii , with (2.5) 1 6 CE −1 |Gii | = (i) −1 − z − d − zdm (z) + Zi and Gij with (2.6)

Gij 6 E

−1



ϕCζ + E −1 N −1/2 Nη



(4.27)

where we used (4.25), (4.26) and the fact E is large enough and η = EN −2/3 . Furthermore (i)

m

 Cζ 2 ϕ 1 X Gji Gij −1 −1 −1/2 6E +E N −m6 N Gii Nη

where we used (2.6) Gji ϕCζ (i) (i,j) −1 −1/2 = zGjj hxi |G |xj i 6 E N + Gii Nη

(4.28)

36

Using these bounds, Gii =

1 + E −1 O(m(i) − m) + E −2 O(Zi ) −1 − z − d − zdm

Summing up Gii 1 m= + O(E −2 ) −1 − z − d − zdm



ϕCζ + E −1 N −1/2 Nη

2

+ O(E −2 )[Z]

(4.29)

Since the real part of −1 − z − d − zdm is much larger than its image part, then, ℑ

1 1 6 CE −2 η + ℑ m 1 − z − d − zdm 2

(4.30)

Together with (4.29) ℑ m 6 CE

−2

η+E

−1



ϕCζ + E −1 N −1/2 Nη



=



Nη 2 ηN 1/2 ϕCζ + + E2 E2 E



1 Nη

(4.31)

If E > N ε for some ε > 0, we have ℑm ≪

1 Nη

(4.32)

With (4.9), it implies that there is no eigenvalues locating in [E − η, E + η] holds with ζ-high probability, i.e., there is no eigenvalues locating larger than N ε . Now, it only remains to prove (4.32) for Kλ+ 6 E 6 N ε . Around (4.32), we have proved that maxj λj 6 N ε , then |Gii | > N −2ε

 Therefore, applying (3.21) and (3.19) with choosing X = N ε N −1/2 + p = N ε , by using (4.24), (4.27), (4.26), (4.28), we have O[Z] 6 N





N

−1/2

ϕCζ + Nη

C

ϕ ζ Nη



, Y = N 2 ε and

2

Inserting it in (4.29), with (4.25), (4.30), we obtain that (4.32). Again with (4.9), it implies that there is no eigenvalues locating in [Kλ+ , N ε ].

37

Step four of proof on (1.19): Now we prove the last component of the proof for (1.19), i.e., in the case of d > 1, i.e, N > M, we have λM > λ− /5, Since in previous proof, 5 is not a special number, then it only remains to prove that for some large K, the following bound holds with ζ-high probability, λj > λ− /K.

(4.33)

Let z = E + iη,

0 6 E 6 λ− /K,

η = N −1/2−ε

(4.34)

for some small enough ε > 0. Since we have proved that among λi , i 6 M, there are at least Cζ ϕO(1) eigenvalues less than λ− /K, then for some C, c > 0 ( Here ϕN η is contributed by these ϕO(1) eigenvalues) ϕCζ 1 , ℑ Tr G(z) 6 Cη + N Nη

ϕCζ 1 ϕCζ c− 6 Re Tr G(z) 6 C + Nη N Nη

(4.35)

so as G (T) for |T| = O(1). Then using Lemma 1.4,   ϕCζ ϕCζ −1/2 |Zi | 6 |z| N + 6 |z|N −1/2+2ε , hxi |G (i,j)|xj i 6 N −1/2 + 6 N −1/2+2ε (4.36) Nη Nη First we estimate Gii , with (2.5), we obtain, Gii =



1 −z − zd Tr G (i) (z) + Zi N

−1

(4.37)

Then using (4.35) we have c|z|−1 6 |Gii | 6 C|z|−1

(4.38)

|Gij | 6 |z|−1 N −1/2+Cε

(4.39)

Similarly with (2.6), we have

As (2.3), we have Tr G(i) (z) − Tr G (i) (z) =

1 M −N +1 = Tr G(z) − Tr G(z) + . z z

38

Together with (4.37), Gii =



 −1  1 1 (i) + Zi −z − zd Tr G(z) − zd m − m − N Nz

Using the fact: c|z| 6 | − z − zd N1 Tr G(z)| 6 C|z|, (4.36) and |m(i) − m| 6 (Nη)−1 , we can sum up Gii , with Taylor expansion, and obtain 1 +δ 1 − z − d − zdm(z) ! X δ := |z|−1 O (m(i) − m) + (Nz)−1 + |z|−2 O([Z]) + |z|−1 N −1+Cε

m=

(4.40)

i

Similarly, by estimating Gii − Gjj , we have |Gii − m| 6 |z|−1 N −1/2+Cε First we estimate m(i) − m in (4.40), (i)

m

−1 [G2 ]ii −1 [G2 ]ii −1 X Gji Gij = = + O(|z|N −3/2+Cε ) [G2 ]ii −m= N j Gii N Gii N m

Averaging m(i) − m, we obtain that

X  −1 Tr[G2 ] 1 X (i) [G2 ]ii m −m = 2 + O(|z|N −5/2+Cε ) N i N m i

(4.41)

Since we have proved that there are at least ϕO(1) non-zero eigenvalues less than λ− /K, then Tr[G2 ] =

X α

1 N −M = + O(ϕCζ )η −2 + O(N) 2 (λα − z) z2

(4.42)

These three terms come from zero eigenvalues, small eigenvalues and normal eigenvalues, i.e., around [λ− , λ+ ] respectively. We denote these three parts as T0 , Ts and Tn . Similarly, we have     N −M z N −M Cζ −1 Nm = Tr[G] = 1+O + O(z) (4.43) + O(ϕ )η + O(N) = −z −z Nη

39

(Note: here z 6 O(1) is small enough) and X |uα(i)|2 X |uα (i)|2 X 2 X |uα (i)|2 (G )ii 6 |uα(i)|2 = C + C + C 2 2 2 (λα − z) |z | η α α∈T α∈T α∈T 0

s

n

The last one implies that

X (G2 )ii 6 C N + O(ϕCζ )η −2 + O(N) |z|2 i

Together with (4.41), we have

 −1 Tr[G2 ] 1 X (i) m −m = 2 + O(|z|−1 N −3/2+Cε ) N i N m

(4.44)

Using (4.42) and (4.43), we have −1 1 Tr[G2 ] = + O(zN 2ε ) + O( ) + O(1) Nm z Nη

(4.45)

Now combining (4.44) and (4.45) with (4.40), we obtain  δ 6 |z|−1 O |z −1 |N −3/2+Cε + N −1+Cε + |z|−2 O([Z])

(4.46)

Now we apply Lemma 3.3 to estimate [Z], with choosing X = N −1/2+Cε , Y = C|z| and p = N ε , using (4.36), (4.38) and (4.39), we have |z|−2 |[Z]| 6 |z|−1 N −1+Cε Together with (4.46), we have obtained δ 6 |z|−1 O |z −1 |N −3/2+Cε + N −1+Cε We note if m =

1 1−z−d−zdm(z)

m − mW = which implies that 

+ δ, then



1 1 − +δ 1 − z − d − zdm(z) 1 − z − d − zdmW (z)

 zd − 1 (m − mW ) = δ (1 − z − d − zdm(z))(1 − z − d − zdmW (z))

(4.47)

40

As above, we have c|z| 6 |1 − z − d − zdm(z)|, |1 − z − d − zdmW (z)| 6 C|z| for small enough z 6 O(1). Therefore, we have |m − mW | 6 |zδ| Using (4.47), we have  |m − mW | 6 O |z −1 |N −3/2+Cε + N −1+Cε ≪ (Nη)−1

Furthermore, it is easy to prove that   1 − d−1 ℑ mW − = O(η) ≪ (Nη)−1 −z Together with Tr G = Tr G − z −1 (N − M), we have 1 ℑ Tr G(z) ≪ , η

As (4.9), we have λα ∈ / [E − η, E + η] for E ∈ [0, λ− /K] with large enough K = O(1) and complete the proof of (4.33) and complete the proof of Theorem 1.2. 4.2. Proof of Theorem 1.3. First, we prove (1.22). Recall (4.10) and the fact that there is no eigenvalue in (0, λ− /4] ∪ [4λ+ , +∞], We have that C(log N)ϕCζ max n(E) − nW (E) 6 , E∈R N

(4.48)

holds with ζ-high probability. The supremium over E is a standard argument for extremely small events and we omit the details. Now we turn to the proof of (1.21). The proof is very similarly to the one for generalized Wigner matrix in [11]. For reader and self containing, we repeat the argument at here again. By symmetry, we assume that 1 6 j 6 N/2 and let E = γj , E ′ = λj . Setting tN = (log N)ϕCζ for simplicity, from (4.48) we have nW (E) = n(E ′ ) = nW (E ′ ) + O(tN /N).

(4.49)

Clearly E > λC := (λ+ + 3λ− )/4, and using (4.48) E ′ > λC also holds with an overwhelming probability. First, using (1.19) and nW (x) ∼ (λ+ − x)3/2 , for λC 6 x 6 λ+ ,

(4.50)

41

i.e.

j ∼ (λ+ − E)3/2 , N we know that (1.21) holds (with a possibly increased power ) if nW (E) = nW (γj ) =

E, E ′ > λ+ − tN N −2/3 . Hence, we can assume that one of E and E ′ is in the interval [λC , λ+ − tN N −2/3 ]. With 3/2 (4.50), this assumption implies that at least one of nW (E) and nW (E ′ ) is larger than tN /N. Inserting this information into (4.49), we obtain that both nW (E) and nW (E ′ ) are positive and  −1/2  nW (E) = nW (E ′ ) 1 + O(tN ) ,

in particular, λ+ − E ∼ λ+ − E ′ . Using that n′W (x) ∼ (λ+ − x)1/2 for λC 6 x 6 λ+ , we obtain that n′W (E) ∼ n′W (E ′ ), and in fact n′W (E) is comparable with n′W (E ′′ ) for any E ′′ between E and E ′ . Then with Taylor’s expansion, we have

|nW (E ′ ) − nW (E)| 6 C|n′W (E)||E ′ − E|. (4.51) √ Since n′W (E) = ρW (E) ∼ κ and nW (E) ∼ κ3/2 , moreover, by E = γj we also have nW (E) = j/N, we obtain from (4.49) and (4.51) that |E ′ − E| 6

C|nW (E ′ ) − nW (E)| CtN CtN CtN 6 6 6 2/3 1/3 , ′ ′ 1/3 nW (E) NnW (E) N(nW (E)) N j

which proves (1.21), again, after increasing power. 5. Universality of eigenvalues in Bulk. In this section, we are going to prove Theorem 1.6. As mentioned in the introduction, our arguments are valid for both real and complex valued entries. First, we consider a flow of random matrices Xt satisfying the following matrix valued stochastic differential equation 1 1 dXt = √ dβt − Xt dt, 2 M

(5.1)

where βt is a real matrix valued process whose elements are standard real valued independent Brownian motions. The initial condition X0 = X = [xij ] satisfying (1.1) and (1.2). For any fixed t > 0, the distribution of Xt coincides with that of d

Xt = e−t/2 X0 + (1 − e−t )1/2 V,

(5.2)

42

where V is a real matrix with Gaussian entries which have mean 0 and variance 1/M. The singular values of the matrix Xt also satisfy a system of coupled SDEs which is also called the Dyson Brownian motion (with a drift in our case). More precisely, let β

e−HW (w) µ = µN (dw) = dw Zβ # " N N 1 −1  X 2 X X 1 1 − β w β i log |wj2 − wi2 | − − −1+ log |wi | ,(5.3) HW (w) = β 2d N d N i 1 we define the n-point correlation functions (marginals) of the probability measure ft dµ by Z (n) pt,N (w1 , w2 , . . . , wn ) = ft (w)µ(w)dwn+1 . . . dwN . (5.6) RN−n

With a slight abuse of notations, we will sometimes also use µ to denote the density of the measure µ with respect to the Lebesgue measure. The correlation functions of the equilibrium measure are denoted by Z (n) pµ,N (w1 , w2 , . . . , wn ) = µ(w)dwn+1 . . . dwN . (5.7) RN−n

Now we are ready to prove the strong local ergodicity of the Dyson Brownian motion (n) which states that the correlation functions of the Dyson Browian motion pt,N and those of (n) the equilibrium measure pµ,N are close:

43

Theorem 5.1. Let X = [xij ] with the entries xij satisfying (1.1) and (1.2). Let E ∈ [λ− + c, λ+ − c] with some c > 0. Then for any ε′ > 0, δ > 0, 0 < b = bN < c/2, any integer n > 1 and for any compactly supported continuous test function O : Rn → R we have sup t>N −1+δ+ε′

 (n) × pt,N

Z

(n)

Z dE ′ 1 dα1 . . . dαn O(α1 , . . . , αn ) 2b Rn ̺W (E)n E−b (5.8)i h  αn  α1 (n) 2ε′ −1 −1+ε′ −1/2 −δ/2 ′ ′ b N +b N , ,...,E + − pµ,N E + 6 Cn N N̺W (E) N̺W (E) E+b

(n)

where pt,N and pµ,N , (5.6)–(5.7), are the correlation functions of the eigenvalues of the Dyson Brownian motion flow (5.2) and those of the equilibrium measure respectively and Cn is a constant. ′

Remark 5.2. Notice that if we choose δ = 1 − 2ε′ and thus t = N −ε , then we can set ′ b ∼ N −1+8ε so that the right hand side of (5.8) vanishes as N → ∞. From the MP law we know that the spacing of the eigenvalues in the bulk is O(N −1 ) and thus we see that Theorem 5.1 yields universality with essentially no averaging in E. Proof of Theorem 5.1. The proof follows from the main result in [8] (Theorem 2.1) which states that the local ergodicity of Dyson Brownian motion ((5.8)) holds for t > N −2a+δ for any δ > 0 provided that there exists an a > 0 such that N 1 X sup E (λj (t) − γj )2 6 CN −1−2a t>N −2a N j=1

(5.9)

p holds with a constant C uniformly in N. Here λj (t) is the singular value of the matrix Xt given in (5.2). The condition (5.9) is a simple consequence of (1.21) as long as a < 1/2. Strictly speaking, there are four assumptions in the hypothesis of Theorem 2.1 in [8]. Assumptions I and II of Theorem 2.1 in [8] are automatically satisfied in the setting that the Dyson Brownian motion is generated by flows on the Covariance matrix ensembles. Assumption IV of Theorem of [8] states that the local density of the singular values of Xt in the scale larger than N −1+c is bounded above by a constant. As in [8] this follows from the large deviation estimate (1.17) since a bound on ℑm(z), z = E + iη, can be easily used to prove an upper bound on the local density of eigenvalues in a window of size η about E. As usual, the additional condition in [8] on the entropy Sµ (ft0 ) 6 CN m for some constant m for t0 = N −2a , holds due to the regularization property of the Ornstein-Uhlenbeck process. Thus for a given 0 < ε′ < 1, choosing a = 1/2 − ε′ /2, A = ε′ in the second part of Theorem 2.1 in [8] and using (1.21), we obtain (5.9) and the proof is finished.

44

For any ε > 0, applying Theorem 5.1 with δ = 1 − 2ε, ε′ = ε and b = −1 + 8ε, we obtain universality for all ensembles with the matrix elements distributed according to M −1/2 ξt with ξt = e−t/2 ξ0 + (1 − e−t )1/2 ξG ,

(5.10)

where the matrix ξG has independent Gaussian random variables with mean 0 and variance 1 and t ∼ N −ε and the initial condition ξ0 has entries satisfying our conditions (1.1) and (1.2). In other words, for t ∼ N −ε the random matrices ξt which are distributed according to (5.10) have the same correlation functions as that of the matrix with Gaussian entries, averaged on a length of O(N −1+8ε ). Thus in order to prove Theorem 1.6, it remains to find a random matrix ξet of the form (5.10) (with time t = N −ε ) whose eigenvalue correlation functions well approximate that of the spectrum of the given matrix X satisfying (1.1) and (1.2). The requirements on the entries of the matrix ξ˜t are just mean zero, variance one and subexponential decay; however it turns out that for any fixed X and ε, one may find a ξe0 such that ξet satisfies (5.10), with t ∼ N −ε , and the entries (ξet )ij have mean 0, variance 1 and the same third moment as those of the initial condition X. Moreover ξet can be chosen in such a way so that its entries have fourth moment very close to those of X. More precisely, Lemma 3.4 in [10] yields that for any given matrix X satisfying (1.1) and (1.2) and t ∼ N −ε , there exists a matrix ξet of the form (5.10) such that for 1 6 k 6 3, √ √ E M xkij = E (ξet )kij , E ( M xij )4 − E (ξet )4ij 6 Ct ∼ N −ε .

Now to finish the proof of Theorem 1.6, it only remains to show that that the correlation functions of the eigenvalues of two matrix ensembles at a fixed energy (i.e., for a fixed value of E = ℜ(z)) are identical up to the scale 1/N provided that the first four moments of the matrix elements of these two ensembles are almost identical. To achieve this, as shown for the Wigner matrices [9] (see Section (8.6-8.13) of [9]), it is enough to show that the corresponding Green functions are close for these two matrix ensembles. This is the content of the following theorem which we call, following [9], the Green function comparison theorem. Let X v = [xvij ], with the entries xvij satisfying (1.1) and (1.2), and let Gv (z) = (X v † X v − z)−1 = (H v − z)−1 be the Green function corresponding to X v . Define the matrix X w and the Green function Gw (z) analogously. Theorem 5.3. Assume that the first three moments of xvij and xw ij are identical, i.e., u E(xvij )u = E(xw ij ) ,

0 6 u 6 3,

and the difference between the fourth moments of xvij and xw ij is much less than 1, say √ √ 4 −δ , (5.11) E ( M xvij )4 − E ( M xw ij ) 6 N

45

for some given δ > 0. Let ε > 0 be arbitrary and choose an η with N −1−ε 6 η 6 N −1 . For any sequence of positive integers k1 , . . . , kn , set complex parameters zjm = Ejm ± iη,

j = 1, . . . ki ,

m = 1, . . . , n

with an arbitrary choice of the ± signs and λ− + κ 6 |Ejm | 6 λ+ − κ for some c > 0. Let F (x1 , . . . , xn ) be a function such that for any multi-index α = (α1 , . . . , αn ) with 1 6 |α| 6 5 and for any ε′ > 0 sufficiently small, we have   ′ α ε′ 6 N C0 ε (5.12) max |∂ F (x1 , . . . , xn )| : max |xj | 6 N j   α 2 6 N C0 (5.13) max |∂ F (x1 , . . . , xn )| : max |xj | 6 N j

for some constant C0 . P Then, there is a constant C1 , depending on α, i ki and C0 such that for any η with −1−ε −1 N 6 η 6 N and for any choices of the signs in the imaginary part of zjm "k "k #! # n 1 Y Y 1 1 v n v w v 1 Tr Tr G (z ) − EF (G → G ) G (z ) , . . . , EF j j kn N k1 N j=1 j=1 6C1 N −1/2+C1 ε + C1 N −δ+C1 ε ,

(5.14)

where in the second term the arguments of F are changed from the Green functions of H v to H w and all other parameters remain unchanged. Once again we note the equivalence of (5.8) and (6.14) as discussed in [9] (Sections 8.68.13). The only difference is that in [9], the equivalence is proved for Wigner matrices, but the arguments are easily adapted for covariance matrices. Thus to complete the proof of Theorem 1.6, all that remains to be done is the proof of Theorem 5.3 which we give below. Proof of Theorem 5.3. The proof is very similar to Lemma 2.3 of [9]. The only differences are a few simple linear algebraic identities. Therefore, we will only prove the simple case of k = 1 and n = 1. Fix a bijective ordering map on the index set of the independent matrix elements, n o φ : {(i, j) : 1 6 i 6 M, 1 6 j 6 N} → 1, . . . , MN , and define the family of random matrices Xγ , 0 6 γ 6 MN, [Xγ ]ij = [X v ]ij

φ(i, j) > γ,

46

= [X w ]ij

φ(i, j) 6 γ .

In particular we have X0 = X v and XM N = X w . Denote Hγ , Gγ and Gγ as Hγ = Xγ† Xγ ,

Gγ = (Hγ − z)−1 ,

Gγ = (Xγ Xγ† − z)−1 .

First, using the delocalization result (1.20) and the rigidity of eigenvalues (1.21), it is easy to have the following estimate on the matrix elements of the resolvent: [Gγ (z)] + [Gγ (z)] 6 N Cε max max max max (5.15) kl kl −1−ε γ

kl

η>N

κ>c

holds with ζ-high probability for any ζ = O(1). For instance, for γ = 0, we have the P vα † vα identity G0 (z) = N α=1 λα −z where λα , vα are the eigenvalues and eigenvectors of H0 . By the delocalisation result (1.20) we obtain N 1 ϕCζ X . |G0 (z)| 6 N α=1 |λα − z|

We write the above sum as X α

XX X 1 1 1 |Ik | = 6 , |λα − z| |λα − z| |λα − E| + η k α∈I k

(5.16)

k

where Ik is the set that N −1 2K−1 6 (λα − E) 6 N −1 2K . By the rigidity of eigenvalues we obtain that |IK | 6 C2K with ζ-high probability. Substituting this bound in (5.16) yields the estimate (5.15). For 1 6 i 6 N, it is easy to check that (i) Gkl

(Gxi )k (x†i G)l = Gkl + , 1 − hxi , G(z) xi i

Gkl =

(i) Gkl

(G (i) xi )k (x†i G (i) )l − . 1 + hxi , G (i) (z) xi i

(5.17)

With (2.5), we have −1 , hxi , G(z) xi i = 1 + z Gii z Gii G (i) xi = −z Gii G (i) xi . Gxi = 1 + hxi , G (i) (z)xi i

hxi , G (i) (z) xi i = −1 +

(5.18) (5.19)

47

Furthermore, with (2.6), we have hxi , G (ij) xj i hxj , G (ij) xj i hxi , G xj i = hxi , G xj i − 1 + hxj , G (ij) xj i Gij hxi , G (ij) xj i (i) = −zGjj hxi , G (ij) xj i = − = (ij) 1 + hxj , G xj i Gii (i)

(ij)

and similarly hxi , Gxj i = −zGii hxi , G (i) xj i = zGij

(5.20)

which implies that hxi , G (i) xj i =

Gij , Gii

hxi , Gxj i = −zGij .

(5.21)

Let xi be the ith row of X (recall xi is the i th column of X). By symmetry, the above identities also hold if one switches G and G, xi and xi . Combining the above identities with (5.15), we obtain     † † max [Gγ (z)]kl + [Xγ Gγ (z)]kl + Gγ Xγ (z) kl + Xγ Gγ Xγ (z) kl 6 N Cε . max max max −1−ε γ

kl

η>N

|κ|>c

(5.22)

Consider the telescopic sum of differences of expectations     1 1 1 1 (5.23) −E F Tr w Tr v EF N H −z N H −z      M N X 1 1 1 1 = EF − EF . Tr Tr N Hγ − z N Hγ−1 − z γ=1 Let E (ij) denote the matrix whose matrix elements are zero everywhere except at the (i, j) (ij) position, where it is 1, i.e., Ekℓ = δik δjℓ . Fix an γ > 1 and let (i, j) be determined by φ(i, j) = γ. We will compare Hγ−1 with Hγ . Note that these two matrices differ only in the (i, j) matrix element and they can be written as V := xvij E (ij) ,

Xγ−1 = Q + V,

Xγ = Q + W,

(ij) W := xw ij E

with a matrix Q that has zero matrix element at the (i, j) position. Define the Green functions R=

1 Q† Q

−z

,

S=

1 Hγ−1 − z

,

T =

1 . Hγ − z

48

The following lemma is at the heart of the Green function comparison first established in [9] (then also used in [10, 11, 5]) which states that the difference of smooth functionals of Green functions of two matrices which differ from a single entry can be bounded above as a function of its first four moments. √ Lemma 5.4. Let mk be the k th moment of M xvij , then      1 1 e E F Tr S − F Tr R = A(Q, m1 , m2 , m3 ) + N −5/2+Cε + A(Q)m 4 N N (5.24) for a functional A(Q, m1 , m2 , m3 ) which only depends on the distribution of Q and m1 , m2 , m3 . e The constant A(Q) depends only on the distribution of Q and satisfies the bound e |A(Q)| 6 N −2+Cε .

Before giving the proof of Lemma 5.4, let us it to conclude the forgoing argument in the proof of Theorem 5.3. Note that the matrices Hγ and Q also differ by one entry, and therefore applying Lemma 5.4 yields      1 1 ′ e Tr T − F Tr R = A(Q, m1 , m2 , m3 ) + N −5/2+Cε + A(Q)m E F 4 (5.25) N N √ w where m′4 is the fourth moment of Mxw ij (by hypothesis, the first three moments of xij are v ′ −δ identical to those of xij .) Since |m4 − m4 | 6 N by hypothesis, we have     1 1 1 1 EF − EF 6 CN −5/2+Cε + CN −2−δ+Cε . Tr Tr N Hγ − z N Hγ−1 − z After summing up in (5.23) we have thus proved that     1 1 1 1 −EF 6 CN −1/2+Cε + CN −δ+Cε , Tr v Tr w EF N H −z N H −z obtaining precisely what we set out to show in (5.14). The proof can be easily generalized to functions of several variables. Thus to conclude the proof of of Theorem 5.3, we just need to give the proof of Lemma 5.4. Proof of Lemma 5.4. We first claim that the estimate (5.15) holds for the Green function R as well. To see this, we have, from the resolvent expansion, R = S+S(V † X +X † V +V † V )S+. . .+[S(V † X +X † V +V † V )]9 S+[S(V † X +X † V +V † V )]10 R .

49

Since the matrix V has only at most one non-zero entry, when computing the (k, ℓ) matrix element of this matrix identity, each term is a finite sum involving matrix elements of S, XS, SX † , XSX † or R (only for the last term) and xvij . Using the bound (5.22) for the S matrix elements, the subexponential decay for xvij and the trivial bound |Rij | 6 η −1 , we obtain that the estimate (5.15) holds for R. Similarly by expanding XR, RX and XRX, we can obtain (5.22) for XR, RX and XRX, QR, RQ and QRQ. Now we prove (5.24). By the resolvent expansion, S = R − R(V † Q + Q† V + V † V )R + . . . − [R(V † Q + Q† V + V † V )]9 R + O(N −4 )(5.26) holds with extremely high probability. Thus we may write X 1 1 Tr S = Tr R + yk + O(N −4 ), N N k620

where yk is the sum √ of the terms in (5.26), in which there are exactly k V ’s. Recall mk as the k-th moment of M xij , which is order P one if k = O(1). The terms yk satisfy the bound (with K = (k1 , k2, · · · , kn ) and |K| := i ki ) |yk | 6 N Cε N −k/2 ,

Ev yk1 yk2 · · · ykn = N −|K|/2 m|K| zK (Q),

|zK (Q)| 6 N Cε (5.27)

for some zK (Q) only depends on the distribution Q and the last inequality holds with with ζ-high probability. Here Ev is the expectation value with respect to the distribution of the entries of the matrix X v . Then we have !n   4  X X 1 1 (n)  1 1 EF =E Tr F Tr R yk (5.28) + O(N −5/2+Cε ) . N Hγ−1 − z n! N n=0 k620

and with (5.27) (5.29)   4  X 1 1 1 (n)  1 EF =E Tr F Tr R N Hγ−1 − z n! N n=0 =B + O(N

−5/2+Cε

X

k1 ,...,kn

N −|K|/2 m|K| zK (Q)

e ) + A(Q, m1 , m2 , m3 ) + A(Q)m 4 .

!

+ O(N −5/2+Cε )

where A(Q, m1 , m2 , m3 ) only depends on the distribution of Q and m1 , m2 , m3 and   4   X X 1 (n) 1 F Tr R  N −|K|/2 m|K| zK (Q) B=E n! N n=0 k1 ,...,kn :|K|>5, ki 620

50

 4   X 1 (n) 1 e=E A F Tr R  n! N n=0

X

k1 ,...,kn :|K|=4

Now it only remains to prove

|B| 6 O(N −5/2+Cε ),



N −2 zK (Q)

e 6 O(N −2+Cε ) A

Using the estimate (5.22)for R and  the derivative bounds (5.12) for the typical values of 1 (n) 1 Tr R, we see that F Tr R (n 6 4) are bounded by N Cε with ζ-high probability, N N

where C is an explicit constant. Similarly zK (ki 6 20) is also bounded by N Cε for some C > 0 with ζ-high probability. Now we define Ξg as the good set where these quantities are  

bounded by N Cε . Furthermore, using (5.13) and definition of zK , we know that F (n) and zK are bounded by N (5.22), we have (5.30)

e = EΞg A

4 X n=0

C

c

1 N

Tr R

c

for some C > 0 in A . Since A has a very small probability by

1





1 (n) F Tr R  n! N

X

k1 ,...,kn :|K|=4



N −2 zK (Q) + O(N −5/2+Cε ) .

e 6 O(N −2+Cε ). Similarly with Then with the bounds on F (n) and zK in Ξg , we obtain A e 6 O(N −5/2+Cε ) and complete the proof of Lemma 5.4 and thereby m|K| 6 O(1), we have B also finishing the proof of Theorem 5.3. 6. Universality of eigenvalues at Edge. In this section, we are going to proof the edge universality stated in Theorem 1.7. The proof is based on Theorem 2.4 of [11] which is an analogous result for Wigner matrices. Here we consider the largest eigenvalue λ1 , but the same argument applies to the lowest non-zero eigenvalue as well. Also for the rest of this section, let us fix a constant ζ > 0. For any E1 6 E2 let N (E1 , E2 ) := #{E1 6 λj 6 E2 } denote the number of eigenvalues of the covariance matrix N1 X † X in [E1 , E2 ] where X is a random matrix whose entries satisfy (1.1) and (1.2). By Theorem 1.2 and 1.3 (rigidity of eigenvalues), there exist positive constants Cζ such that |λ1 − λ+ | 6 ϕCζ N −2/3  N λ+ − 2ϕCζ N −2/3 , λ+ + 2ϕCζ N −2/3 6 ϕ2Cζ

(6.1) (6.2)

51

holds with ζ-high probability. Using these estimates, we can assume that s in (1.28) satisfies −ϕCζ 6 s 6 ϕCζ

(6.3)

Eζ := λ+ + 2ϕCζ N −2/3

(6.4)

Set

and for any E 6 Eζ define χE := 1[E,Eζ ] to be the characteristic function of the interval [E, Eζ ]. For any η > 0 we define θη (x) :=

π(x2

η 1 1 = ℑ 2 +η ) π x − iη

(6.5)

to be an approximate delta function on scale η. In the following elementary lemma we compare the sharp counting function N (E, Eζ ) = Tr χE (H) by its approximation smoothed on scale η. Lemma 6.1. For any ε > 0, set ℓ1 := N −2/3−3ε and η := N −2/3−9ε . Then there exist constants C, c such that for any E satisfying 3 |E − λ+ | 6 ϕCζ N −2/3 2

(6.6)

with the Cζ in (6.1)-(6.4), we have | Tr χE (H) − Tr χE ∗ θη (H)| 6 C N −2ε + N (E − ℓ1 , E + ℓ1 ) holds with ζ-high probability.



Proof of Lemma 6.1. From equations (6.13) and (6.17) of [11] we obtain  | Tr χE (H) − Tr χE ∗ θη (H)| 6 C N (E − ℓ1 , E + ℓ1 ) + N −5ε Z 1 +C Nη (Eζ − E) ℑ m(E − y + iℓ1 )dy 2 2 R y + ℓ1

(6.7)

(6.8)

Using the rigidity of eigenvalues (1.21), one can prove that Z

E−y>5λ+

1 ℑm(E − y + iℓ1 ) dy + 2 y + ℓ21

Z

E−y6λ− /5

y2

1 ℑm(E − y + iℓ1 ) dy = O(1) + ℓ21

52

with ζ-high probability. On the interval λ− /5 6 E − y 6 5λ+ we use (1.17), i.e., ℑ m(E − y + iℓ1 ) 6 ℑ mW (E − y + iℓ1 ) +

ϕCζ Nℓ1

q and the elementary estimate ℑ mW (E − y + iℓ1 ) 6 C ℓ1 + E − y − λ+ . Using the definitions of ℓ1 and η it can be shown that (see Equation (6.18) of [11]) Z 1 ℑ m(E − y + iℓ1 ) dy 6 N −2ε . Nη (Eζ − E) 2 + ℓ2 y R 1 Now the Lemma follows from (6.8). Let q : R → R+ be a smooth cutoff function such that q(x) = 1 if |x| 6 1/9,

q(x) = 0 if |x| > 2/9,

and we assume that q(x) is decreasing for x > 0. Then we have the following corollary for Lemma 6.1: Corollary 6.2. Let ℓ1 be as in Lemma 6.1 and set ℓ := 12 ℓ1 N 2ε = 21 N −2/3−ε . Then for all E such that |E − λ+ | 6 ϕCζ N −2/3

(6.9)

with the Cζ in (6.1)-(6.4), the following inequality Tr χE+ℓ ∗ θη (H) − N −ε 6 N (E, ∞) 6 Tr χE−ℓ ∗ θη (H) + N −ε

(6.10)

holds with ζ-high probability. Furthermore, we have E q (Tr χE−ℓ ∗ θη (H)) 6 P(N (E, ∞) = 0) 6 E q (Tr χE+ℓ ∗ θη (H)) + Ce−ϕ



(6.11)

for sufficiently large N independent of E. Proof. For any E satisfying (6.9) we have Eζ − E ≫ ℓ thus |E − λ+ − ℓ|N 2/3 6 23 ϕCζ (see (6.6)), therefore (6.7) holds for E replaced with y ∈ [E − ℓ, E] as well. We thus obtain Z E −1 Tr χE (H) 6 ℓ dy Tr χy (H) E−ℓ Z E Z E   −1 −1 dy N −2ε + N (y − ℓ1 , y + ℓ1 ) dy Tr χy ∗ θη (H) + Cℓ 6ℓ E−ℓ

E−ℓ

ℓ1 6 Tr χE−ℓ ∗ θη (H) + CN −2ε + C N (E − 2ℓ, E + ℓ) ℓ

53

holds with ζ-high probability. From (1.22), (6.9), ℓ1 /ℓ = 2N −2ε and ℓ 6 N −2/3 , we gather that Z E+ℓ ℓ1 1 1−2ε N (E − 2ℓ, E + ℓ) 6 N ̺W (x)dx + N −2ε (log N)L1 6 N −ε ℓ 2 E−2ℓ holds with ζ-high probability, where we estimated the explicit integral using that the integration domain is in a CN −2/3 ϕCζ -vicinity of the edge at λ+ . We have thus proved N (E, Eζ ) = Tr χE (H) 6 Tr χE−ℓ ∗ θη (H) + N −ε . Using (6.1) we see that one can replace N (E, Eζ ) by N (E, ∞) with a change of probability Cζ of at most Ce−ϕ . This proves the upper bound of (6.10) and the lower bound can be proved similarly. On the event that (6.10) holds, the condition N (E, ∞) = 0 implies that Tr χE+ℓ ∗ θη (H) 6 1/9. Thus we have Cζ

P (N (E, ∞) = 0) 6 P (Tr χE+ℓ ∗ θη (H) 6 1/9) + Ce−ϕ .

(6.12)

Together with the Markov inequality, this proves the upper bound in (6.11). For the lower bound, we use     E q Tr χE−ℓ ∗θη (H) 6 P Tr χE−ℓ ∗θη (H) 6 2/9 6 P N (E, ∞) 6 2/9+N −ε = P N (E, ∞) = 0 ,

where we used the upper bound from (6.10) and that N is an integer. This completes the proof of the Corollary 6.2. 6.1. Green Function Comparison Theorem. Recall the matrices X v = [xvij ], X w = [xw ij ], H = (X v )† X v , H w = (X w )† X w and their respective Green functions Gv , Gw from Section 5. Define mv = N1 Tr Gv (z) and mw = N1 Tr Gw (z). Also notice from (6.5) that θη (H) = π1 ℑ m(iη). Corollary 6.2 bounds the probability of N (E, ∞) = 0 in terms of the expectations of two functionals of Green functions. In this subsection, we show that the difference between the expectations of these functionals w.r.t. two ensembles X v and X w is negligible assuming their second moments match. The precise statement is the following Green function comparison theorem on the edges. All statements are formulated for the upper spectral edge λ+ , but identical arguments hold for the lower spectral edge λ− as well. v

Theorem 6.3 (Green function comparison theorem on the edge). function whose derivatives satisfy max |F (α) (x)| (|x| + 1)−C1 6 C1 , x

α = 1, 2, 3, 4

Let F : R → R be a (6.13)

54

with some constant C1 > 0. Then there exists ε0 > 0 depending only on C1 such that for any ε < ε0 and for any real numbers E, E1 and E2 satisfying |E − λ+ | 6 N −2/3+ε ,

|E1 − λ+ | 6 N −2/3+ε ,

|E2 − λ+ | 6 N −2/3+ε ,

and η = N −2/3−ε , we have v (6.14) E F (Nη ℑ mv (z)) − Ew F (Nη ℑ mw (z)) 6 CN −1/6+Cε ,

z = E + iη,

and

 Z v E F N

E2

E1

 Z dy ℑ m (y + iη) − E F N v



E2

w

for some constant C and large enough N.

 dy ℑ m (y + iη) 6 CN −1/6+Cε (6.15) w

E1

Theorem 6.3 holds in a much greater generality. We state the following extension which can be used to prove (1.29), the generalization of Theorem 1.7. The class of functions F in the following theorem can be enlarged to allow some polynomially increasing functions similar to (6.13). But for the application to prove (1.29), the following form is sufficient. The proof of Theorem 6.4 is similar to that of Theorem 6.3 and will be omitted. Theorem 6.4. Suppose that the assumptions of Theorem 1.7 hold. Fix any k ∈ N+ and let F : Rk → R be a bounded smooth function with bounded derivatives. Then for any sufficiently small ε there exists a δ > 0 such that for any sequence of real numbers Ek < . . . < E1 < E0 with |Ej − λ+ | 6 N −2/3+ε , j = 0, 1, . . . , k, we have  Z E0  Z E0 v w −δ EF N dy ℑ m(y + iη), . . . , N dy ℑ m(y + iη) − EF (m → m ) , 6 N (6.16) E1

Ek

where in the second term the arguments of F are changed from mv to to mw and all other parameters remain unchanged.

Assuming that Theorem 6.3 holds, we now prove Theorem 1.7. Proof of Theorem 1.7. As we discussed in (6.1) and (6.2), we can assume that (6.3) holds for the parameter s. We define E := 2 + sN −2/3 that satisfies (6.9). We define Eζ as in (6.4)

55

with the Cζ such that (6.1) and (6.2) hold. With the left side of (6.11), for any sufficiently small ε > 0, we have Ew q (Tr χE−ℓ ∗ θη (H)) 6 Pw (N (E, ∞) = 0) with the choice

1 ℓ := N −2/3−ε , 2

η := N −2/3−9ε .

By definition: 1 Tr χE−ℓ ∗ θη (H) = N π

Z



E−ℓ

ℑ m(y + iη)dy

The bound (6.15) applying to the case E1 = E − ℓ and E2 = Eζ shows that there exist δ > 0, for sufficiently small ε > 0, such that Ev q (Tr χE−ℓ ∗ θη (H)) 6 Ew q (Tr χE−ℓ ∗ θη (H)) + N −δ

(6.17)

(note that 9ε plays the role of the ε in the Green function comparison theorem). Then applying the right side of (6.11) in Lemma 6.2 to the l.h.s of (6.17), we have   Pv (N (E − 2ℓ, ∞) = 0) 6 Ev q (Tr χE−ℓ ∗ θη (H)) + C exp − cϕO(1) .

Combining these inequalities, we have

Pv (N (E − 2ℓ, ∞) = 0) 6 Pw (N (E, ∞) = 0) + 2N −δ

(6.18)

for sufficiently small ε > 0 and sufficiently large N. Recalling that E = 2 + sN −2/3 , this proves the first inequality of (1.28) and, by switching the role of v, w, the second inequality of (1.28) as well. This completes the proof of Theorem 1.7. Proof of Theorem 6.3. The proof is similar to that of Lemma 5.3. We need to compare the matrices H v and H w . Instead of replacing the matrix elements one by one (NM times) and comparing their successive differences, here we estimate the successive difference of matrices which differ by a column. Indeed for 1 6 γ 6 N, denote by Xγ the random matrix whose j-th column is the same as that of X v if j 6 γ and that of X w otherwise; in particular X0 = X v and XN = X w . As before, we define Hγ = Xγ† Xγ . We will compare Hγ−1 with Hγ using the following lemma. For simplicity, we denote m e (i) (z) = m(i) (z) − (Nz)−1 .

56

Lemma 6.5. For any random matrix X whose entries satisfy (1.1) and (1.2), if |E − λ+ | 6 N −2/3+ε and N −2/3 ≫ η > N −2/3−ε for some ε > 0, then we have  E F (Nη ℑ m(z)) − E F Nη ℑ m e (i) (z) = A(X (i) , m1 , m2 ) + N −7/6+Cε (6.19) (i) (i) where the functional A(X and the first 2 ) only depends on the distribution of X √ , m1 , m√ two moments m1 , m2 of M xij = M (X)ij .

Notice that Lemma 6.5 implies that     1 1 E F η ℑ Tr − E F η ℑ Tr = A(Xγ(γ) , m1 , m2 ) + N −7/6+Cε (6.20) Hγ−1 − z Hγ − z (γ)

(γ)

where A(Xγ , m1 , m2 ) only depends on the distribution of Xγ and m1 , m2 . As done in Theorem 5.3, the proof of Theorem 6.3 now can be completed via the telescoping argument. Thus to finish the proof of Theorem 6.3, all that needs to be shown is Lemma 6.5 which is proven below: Proof of Lemma 6.5. Without loss of generality, we assume that i = 1 and ε is small enough. First, we claim some bounds about G(1) and G (1) . For any ζ > 0,  x1 (G (1) )2 x1 6 N 1/3+Cε (6.21) (1) [G ]ij 6 N Cε ,

(1) 2  [G ] ij 6 N 1/3+Cε ,

(6.22)

with ζ-high probability for some C > 0, where i could be equal to j. We postpone the proof of these bounds to the end. For Lemma 6.5, using (2.5) and (2.7), we have  (1) (1) (1) (1)†  x · X G G X · x 1 1 Tr G − Tr G(1) + z −1 = G11 + z −1 + −z − z(x1 , G (1) (z) x1 )  = z G11 x1 (G (1) )2 (z) x1 . (6.23) Define the quantity B to be

B = −z mW By (2.5), B = −z mW





x1 , G

(1)



(z)x1 −



−1 −1 z mW (z)



.

   −1 mW − G11 −1 −1 − −1 = z G11 (z) z mW (z) G11

(6.24)

57

From (1.18) , we obtain that |B| 6 N −1/3+2ε ≪ 1 ,

(6.25)

with with ζ-high probability with any ζ > 0. Therefore, we have the identity X mW G11 = = mW (−B)k . B+1 k>0

(6.26)

Define y with the l.h.s of (6.23), y := η Tr G − Tr G(1) + z −1



so that using (6.23) and (6.26) we obtain ∞   X (1) 2 yk , yk := ηz mW (−B)k−1 x1 (G (1) )2 x1 . y = ηz G11 x1 (G ) x1 = k=1

Since z and mW are O(1), together with (6.21) and (6.25), |yk | 6 O(N −k/3+Cε ) and |y| 6 O(N −1/3+Cε )

(6.27)

holds with ζ-high probability. Consequently the expansion  F (Nη ℑ m(z)) − F Nη ℑ m e (1) (z) = 3 X k=1

(6.28)

 e (1) (z) (ℑ y)k + O(N −4/3+Cε ) F (k) Nη ℑ m

holds with ζ-high probability. First using (6.27) we obtain that   F (3) Nη ℑ m e (1) (z) (ℑ y)3 = F (3) Nη ℑ m e (1) (z) (ℑ y1 )3 + O(N −4/3+Cε )

(6.29)

holds with ζ-high probability. Moreover, we have 3

E1 (ℑ y1) = E1 (ηz mW )

3

x1 (G

(1) 2

) x1

3

= (ηz mW )

3

X

k1 ,...,k6

E1 (

6 Y i=1

3 Y x1ki ) [(G (1) )2 ]k2i−1 , k2i , i=1

where E1 is the √ expectation value with respect to the first column of X. Recall mk is the k-th moment of M xij . We know if there is a ki not equal to any other xj then E1 (

6 Y i=1

x1ki ) = 0 = m1 ,

58

and if each ki appears exactly twice, then E1 (

6 Y

x1ki ) = m32 .

i=1

Therefore, we have e3 (X (1) , m1 , m2 ) + (ηz mW )3 E1 (ℑ y1) = A 3

X

E1 (

6 Y

x1ki )[G (1) )2 ]k1 k2 [G (1) )2 ]k3 k4 [G (1) )2 ]k5 k6

i=1

(1),(2)

where we sum up the ki ’s such that (1) no ki appearing only once and (2) at least one ki e3 (X (1) , m1 , m2 ) only depends on X (1) , m1 and m2 . appearing three times and the functional A WithPthis condition, we obtain that there are at most two elements in the set {k1 , k2, . . . , k6 }, i.e., (1),(2) 1 6 CN 2 . Then using (6.22), the bounds on mk ’s, we have e3 (X (1) , m1 , m2 ) + O(N −2+Cε ) E1 (ℑ y1 )3 = A

(6.30)

By definition, it is easy to prove |Nη ℑ m e (1) | 6 N Cε with ζ-high probability. Then using (6.29) and the fact that m e (1) only depends on X (1) , we have  EF (3) Nη ℑ m e (1) (z) (ℑ y)3 = A3 (X (1) , m1 , m2 ) + O(N −4/3+Cε ) . (6.31)

where A3 (X (1) , m1 , m2 ) only depends on the distribution of X (1) , m1 and m2 . Now we estimate the term with F (2) in (6.28). As in (6.29), we have    F (2) Nη ℑ m e (1) (z) (ℑ y)2 = F (2) Nη ℑ m e (1) (z) (ℑ y1)2 + 2(ℑ y1)(ℑ y2 ) + O(N −4/3+Cε ) . (6.32) By definition,

E1 (ℑ y1)2 + 2(ℑ y1)(ℑ y2 ) = C1 (z)η 2 x1 (G (1) )x1



x1 (G (1) )2 x1

2

+ C2 (z)η 2 x1 (G (1) )2 x1

2

where C1 (z), C2 (z) = O(1) are constant depending on z and mW (z). Using the bounds on G (1) in (6.22), as in (6.30), we have e2 (X (1) , m1 , m2 ) + O(N −5/3+Cε ) E1 (ℑ y1)2 + (ℑ y1)(ℑ y2) = A

e2 (X (1) , m1 , m2 ) only depends on X (1) , m1 and m2 . Then with (6.34), as in (6.31), where A  EF (2) Nη ℑ m e (1) (z) (ℑ y)2 = A2 (X (1) , m1 , m2 ) + O(N −4/3+Cε ) . (6.33)

59

for some functional A2 which only depends on the distribution of X (1) , m1 and m2 . Now we estimate the term with F (1) in (6.28). As in (6.29), we have

  F (1) Nη ℑ m e (1) (z) (ℑ y)2 = F (1) Nη ℑ m e (1) (z) [ℑ y1 + ℑ y2 + ℑy3 ] + O(N −4/3+Cε ) . (6.34) Similar argument as (6.33) and (6.31) yields  EF (1) Nη ℑ m e (1) (z) (ℑ y) = A1 (X (1) , m1 , m2 ) + O(N −4/3+Cε ) .

(6.35)

Inserting (6.35), (6.33) and (6.31) into (6.28), we obtain (6.19) and complete the proof of Lemma 6.5 and consequently finish the proof of Theorem 6.3. At last, we prove (6.21) and (6.22). For (6.21), using the large deviation lemma (Lemma 1.4), we obtain that for any ζ > 0, (6.36)



x1 (G (1) )2 x1 6 ϕCζ (N −1 Tr |G (1) |4 )1/2 6 ϕCζ 6 ϕCζ

1 1 X (1) 2 Nη α |λα − z|2

!1/2

!1/2 1 X 1 4 N α |λ(1) α − z|  1/2 1 Cζ (1) =ϕ ℑ m (z) Nη 3

with ζ-high probability. Then with (1.17) and (2.34), we have (6.21). For (6.22), we note G (1) =

1 X (1) (X (1) )†

−z

Comparing with (1.3), we see that (G (1) , (X (1) )† ) play the role of (G, X). Since

q

M (X (1) )† N −1

is just a normal (N − 1) × M random matrix, whose entry has variance (N − 1)−1 , the results in (1.18) also holds for G (1) with slight changes. One can easily obtain that max |[G (1) ]ij | 6 C, ij

max |[G (1) ]ij | 6 CN −1/3+Cε

with ζ-high probability for any ζ > 0 proving (6.22) and we are done. 7. Acknowledgements. The authors would like to thank Horng-Tzer Yau, Antti Knowles, Lazlo Erd¨os and Paul Bourgade for very useful discussions and help. NSP and JY gratefully acknowledge the NSF grants DMS-1107070 and DMS-1001655 respectively.

60

REFERENCES [1] Ben Arous, G., Peche, S., Universality of local eigenvalue statistics for some sample covariance matrices. Comm. Pure Appl. Math., 58 , No. 10, 1316-1357 (2005). [2] Dieng, M., Tracy, C.A., Application of Random Matrix Theory to Multivariate Statistics, Arxiv Preprint, http://arxiv.org/abs/math/0603543 (2006). [3] Dyson, F.J.: A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 3, 1191-1198 (1962) [4] Erd˝ os, L., Knowles, A., Yau, H.-T and Yin, J. Spectral Statistics of Erd´ os-R´enyi Graphs I: Local Semicircle Law, arxiv preprints. 1103.1919 [5] Erd˝ os, L., Knowles, A., Yau, H.-T and Yin, J. Spectral Statistics of Erd´ os-R´enyi Graphs II: Eigenvalue Spacing and the Extreme Eigenvalues, arxiv preprints. 1103.3869 [6] Erd˝ os, L., Schlein, B., Yau, H.-T., Universality of random matrices and local relaxation flow. Preprint. arxiv.org/abs/0907.5605 [7] Erd˝ os, L., P´ech´e, G., Ram´ırez, J., Schlein, B., and Yau, H.-T., Bulk universality for Wigner matrices. Commun. Pure Appl. Math. 63, No. 7, 895–925 (2010) [8] Erd˝ os, L., Schlein, B., Yau, H.-T., Yin, J., The local relaxation flow approach to universality of the local statistics for random matrices. Preprint arXiv:0911.3687 [9] Erd˝ os, L., Yau, H.-T., Yin, J., Bulk universality for generalized Wigner matrices. Preprint arXiv:1001.3453 [10] Erd˝ os, L., Yau, H.-T., Yin, J., Universality for generalized Wigner matrices with Bernoulli distribution. Preprint arXiv: 1003.3813. [11] Erd˝ os, L., Yau, H.-T., Yin, J., Rigidity of Eigenvalues of Generalized Wigner Matrices. [12] Harding, M., Explaining the single factor bias of arbitrage pricing models in finite samples, Economic Letters, 99, 85-88 (2008). [13] Johnstone, I. M., On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29, No. 2, 295-327 (2001). [14] Johnstone, I.M., High Dimensional Statistical Inference and Random Matrices, International Congress of Mathematicans, Vol. 1, 307-333, Eur. Math. Soc. Zurich (2007). [15] Johnstone, I.M., Multivariate analysis and Jacobi ensembles: largest eigenvalue, Tracy-Widom limits and rates of convergence, Ann. of Stat., 36, No. 6, 2638-2716 (2008). [16] Onatski, A.,Testing hypotheses about the number of factors in large factor models. Econometrika, To Appear, (2009). [17] Patterson, N., Price, A.L., Reich, D., Population structure and eigen analysis, PLoS Genetics, No. 2 (12) (2006):e190. [18] Peche, S., Universality results for largest eigenvalues of some sample covariance matrix ensembles, Probab. Theory Related Fields 143, No. 3-4, (2009). [19] Soshnikov, A., Universality at the edge of the distribution of the largest eigenvalue in certain classes of sample covariance matrices, J. Stat. Phys., 108, 1033-1056 (2002). [20] Tao, T. and Vu, V., Random covariance matrices: Universality of local statistics of eigenvalues, ArXiv e-prints : 0912.0966.

61 [21] Wang, K., Random covariance matrices: Universality of local statistics of eigenvalues up to the edge, ArXiv e-prints, 1104.4832 (2011). Department of Statistics, Harvard University 1 Oxford Street, Cambridge 02138, MA, USA E-mail: [email protected]

Department of math, University of Wisconsin -Madison 480 Lincoln Dr. Madison 53706 ,WI, USA E-mail: [email protected]