CEIS Tor Vergata RESEARCH PAPER SERIES Vol. 15, Issue 2, No. 397 – February 2017

Representation, Estimation and Forecasting of the Multivariate IndexAugmented Autoregressive Model Gianluca Cubadda and Barbara Guardabascio

Representation, Estimation and Forecasting of the Multivariate Index-Augmented Autoregressive Model∗ Gianluca Cubadda†

Barbara Guardabascio

Università di Roma "Tor Vergata", Italy

ISTAT, Italy

January 30, 2017

Abstract We examine the conditions under which each individual series that is generated by a vector autoregressive model can be represented as an autoregressive model that is augmented with the lags of few linear combinations of all the variables in the system. We call this modelling Multivariate Index-Augmented Autoregression (MIAAR). We show that the parameters of the MIAAR can be estimated by a switching algorithm that increases the Gaussian likelihood at each iteration. Since maximum likelihood estimation may perform poorly when the number of parameters gets larger, we propose a regularized version of our algorithm to handle a medium-large number of time series. We illustrate the usefulness of the MIAAR modelling both by empirical applications and simulations. JEL: C32 Keywords: Multivariate index autoregressive models, reduced rank regression, dimension reduction, shrinkage estimation, macroeconomic forecasting. ∗ We

are grateful to Alain Hecq, Antoni Espasa, Giuseppe Storti, Helmut Lütkepohl, Massimiliano Marcellino, Paulo Ro-

drigues, and Pietro Coretto for useful comments, as well as Elisa Scambelloni for research assistance on a previous version of this paper. The usual disclaimers apply. † Corresponding author. Dipartimento di Economia e Finanza, Universita’ di Roma "Tor Vergata", Via Columbia 2, 00133 Roma, Italy. Email: [email protected]

1

1

Introduction

Many years after the seminal paper by Sims (1980), the Vector Auto-Regressive model (VAR henceforth) is still widely used for analyzing and forecasting multivariate economic time series. The main reason is that the VAR, notwithstanding its simplicity, is a very general statistical model that is able to reproduce rather complex dynamic interactions of the data. However, it is well known that the price to pay for this generality is the so-called curse of dimensionality, i.e. the VAR parameters proliferate with the number of time series at hand. Given the typical sample sizes of macroeconomic data, this implies that Ordinary Least Squares (OLS) estimation of the VAR is viable for small scale models only. Recently, a large body of research has tackled the issue of adapting the VAR modelling to medium and large dimensional systems. In order to reach this goal, two complementary approaches have been pursued so far. On one hand, a line of research focuses on shrinkage estimation, often within a Bayesian framework; see, inter alia, Banbura et al. (2010), Koop (2013), Carriero et al. (2015), and Kock and Callot (2015). On the other hand, the idea is to assume that a latent low-dimensional structure is present in the largedimensional VAR system. Contributions along this line include the global VAR by Pesaran et al. (2004), the factor augmented VAR by Bernanke et al. (2005), the partial least squares approach suggested by Cubadda and Hecq (2011), the inﬁnite-dimensional VAR by Chudik and Pesaran (2011), the Bayesian reduced rank regression methods proposed by Carriero et al. (2011), and the regularized canonical correlation analysis by Bernardini and Cubadda (2015). The present paper wishes to contribute to the latter approach above, although we borrow some idea from the former one as well. In particular, we start by examining the conditions under which each equation of the VAR can be written as an Auto-Regressive model (AR) augmented with the lags of few linear combinations of all the variables in the system. We show that this parameter reduction occurs when there is a reduced number of channels through which each variable is inﬂuenced by the past of other variables in the system, which is line with the common belief that a small number of shocks is responsible for most of aggregate economic ﬂuctuations. We label this approach as index-augmented autoregression, given that its multivariate representation is an extension of the multivariate autoregressive index model by Reinsel (1983). We show that optimal estimation of the proposed model can be achieved by a switching algorithm that maximizes the Gaussian likelihood at each step. Since the Maximum Likelihood (ML) approach is not well suited for high-dimensional systems, we propose a regularized version of our estimator that can handle medium-large VARs. Moreover, we examine analogies and diﬀerences among our model and two largely popular approaches in the literature, namely reduced rank regression and dynamic factor models, see e.g. Johansen (2008) and Stock and Watson (2011), respectively, and the references therein. As a by-product of our analysis, we

2

develop a reduced rank regression model that is augmented with variable-speciﬁc AR structures, which, to the best of our knowledge, has not been proposed before. Finally, we evaluate the proposed methods by both empirical applications and simulations. The paper is organized as follows. Section 2 deals with the theoretical aspects at the level of both model representation and statistical inference. In Section 3 a comprehensive forecasting is performed to evaluate the merits of the proposed modelling over traditional forecasting methods. In Section 4 a Monte Carlo study evaluates the ﬁnite sample properties of the various methods in terms of both model speciﬁcation and forecasting performance. Finally, Section 5 concludes.

2

Theory

In this Section we ﬁrst present the derivation of the proposed modelling, then we discuss statistical inference in both small and medium-large frameworks. Moreover we comment on analogies and diﬀerences with other popular approaches in multivariate time series analysis.

2.1

Representation

Let yt ≡ (y1t , . . . , yN t )′ denote the N -vector of the time series of interest. We assume that yt is generated

by a stationary vector autoregressive model of order p (VAR(p) hereafter): yt = Φ(L)yt−1 + εt , where Φ(L) =

Pp−1

h=0

t = 1, 2, ..., T,

(1)

Φh Lh and εt are i.i.d. innovations with E(εt ) = 0, E(εt ε′t ) = Σ (positive deﬁnite) and

ﬁnite fourth moments. From Equation (1) we see that each series yit is generated by the following dynamic regression model: yit = φii (L)yit−1 +

N P

φik (L)ykt−1 + εit ,

i = 1, ..., N

(2)

k6=i

where φik (L) is the generic element of the polynomial matrix Φ(L), and εit is i-th element of the innovation vector εt . In order to reduce the number of parameters of model (1), we further assume that N P

k6=i

φik (L) =

q P

j=1

βij (L)

N P

ωkj ,

(3)

k6=i

where βij (L) is a scalar polynomial of order at most (p − 1), and ωkj is a scalar for i = 1, ..., N and

j = 1, ..., q < N .

In order to motivate the above assumption, we remark that the unrestricted VAR foresees (N −1) linearly

independent mechanisms by which past external information is transmitted to each variable. However, since 3

it is commonly believed that a relatively small number of shocks is responsible for most of aggregate economic ﬂuctuations, it is reasonable to assume that there is a reduced number of channels through which each variable is inﬂuenced by the past of other variables in the system. In words, this is exactly what Equation (3) states. Cubadda et al. (2013) relied on a particular case of this assumption, namely q = 1, in order to build business cycle indicators in a medium-dimensional framework. Notice that Assumption (3) is equivalent to postulating the following structure for each series yit q P yit = δi (L)yit−1 + βij (L)fjt−1 + εit ,

(4)

j=1

i h PN Pq where δi (L) = φii (L) − j=1 βij (L)ωij , and fjt = k=1 ωkj ykt for j = 1, ..., q. Following Reinsel (1983),

we refer to the q-dimensional series ft = [f1t , .., fqt ] as the index variables and we label (4) as the IndexAugmented Auto-Regressive (IAAR) model for series yit . Under assumption (3), model (1) can rewritten as follows p r P P yt = Dh yt−h + βh ω ′ yt−h + εt , h=1

(5)

h=1

where Dh is a N × N diagonal matrix for h = 1, ..., p, and βh and ω are N × q matrices for h = 1, ..., r ≤ p.

Note that, for more generality, we allow for the number r of the lagged indexes to be smaller than the order p of the autoregressive component. In terms of parsimony, Equation (5) needs N (p + qr + q) − q 2 parameters

instead of N 2 p in Equation (1).1 Previewing the results of the empirical analysis, we can anticipate that

both r and p are typically small in our applications, leading to a large reduction of the number of parameters w.r.t. the VAR. The Multivariate IAAR (MIAAR) representation makes apparent that the Multivariate Autoregressive Index model (MAI) by Reinsel (1983) is a special case of (5) that is obtained by imposing Dh = 0 for h = 1, ..., q. Recently, there has been a renewed interest in the MAI. For instance, Carriero et al. (2016) derived classical and Bayesian estimation of large MAIs and applied this modelling for structural analysis, whereas Cubadda et al. (2017) proposed a multivariate realized volatility model that is endowed with a common index structure. We believe that extending the MAI by allowing for variable-speciﬁc AR structures is of particular value for forecasting purposes. Our main motivation comes from the observation that parameter reduction in the MAI is essentially obtained by endowing the VAR coeﬃcient matrices with a reduced-rank structure. However, Carriero et al. (2011) and Bernardini and Cubadda (2015) documented that univariate models often outperform medium-large reduced-rank VAR models at short forecasting horizons. These ﬁndings suggest there should be room for improvement in forecasting by combining the MAI with individual AR components. 1 Notice

that, since the matrix ω is assumed to be full-rank, we can always normalize it as ω ′ = [Iq , ̟′ ], where ̟ is a

(n − q) × q matrix.

4

2.2

Maximum likelihood estimation

With respect to the MAI, optimal estimation of parameters of Model (5) is more complicated due to the presence of the variable-speciﬁc AR structures. Based on Boswijk (1995) and Cubadda et al. (2017), we oﬀer a switching algorithm that has the property to increase the Gaussian likelihood in each step. In details, the procedure goes as follows 1. In view of Equation (5) and given (initial) estimates of both D = [D1 , ..., Dp ] and ω, maximize the conditional Gaussian likelihood L(β, Σ|D, ω) by estimating β = [β1 , ..., βr ] and Σ with OLS on the following equation

yt −

p P

Dh yt−h

=

h=1

r P

βh ω ′ yt−h + εt ,

h=1

2. Premultiply by Σ−1/2 and apply the Vec operator to both the sides of Equation (5), then use the property Vec(ABC) = (C ′ ⊗ A)Vec(B) to get Vec(Σ−1/2 yt ) =

p P

h=1

′ (yt−h ⊗ Σ−1/2 )Vec(Dh ) +

r P

h=1

′ (yt−h ⊗ Σ−1/2 βh )Vec(ω ′ ) + Vec(Σ−1/2 εt )

Finally, reparametrize the above model as Vec(Σ−1/2 yt ) =

p P

h=1

′ ⊗ Σ−1/2 )M ]δh + [(yt−h

r P

h=1

′ ⊗ Σ−1/2 βh )Vec(ω ′ ) + Vec(Σ−1/2 εt ), (yt−h

(6)

where δh is a N -vector such that Dh = diag(δh ), and M is a binary N 2 × N -matrix whose generic

element mij is such that

1 mij = 0

if i = 1 + (j − 1)(N + 1),

j = 1, ..., N

otherwise

Given the previously obtained estimates of β and Σ, maximize L(D, ω|β, Σ) by estimating [δ1 , ..., δp ] and Vec(ω ′ ) with OLS on Equation (6).

3. Repeat steps 1 and 2 till det(Σ) is numerically minimized.2 With respect to Newton-type optimization method, the above algorithm has several pros and one con. The former are that the switching algorithm is computationally simpler, it does not require a normalization condition on the parameters ω, the optimization is explicit at each step, and over-identifying restrictions can be easily imposed on β, D, and ω; the latter is that direct maximization over the full parameter space may be faster. To mitigate this disadvantage of the switching algorithm, it is important to properly choose the initial values for D and ω. We suggest to start with the estimates of these parameters obtained from 2A

general proof of the convergence of this family of iterative procedures is given by Oberhofer and Kmenta (1974).

5

the Dynamic Factor Model (DFM) by Stock and Watson (2002a, 2002b), where ω is estimated by the ﬁrst q principal components of series yt and D is estimated by OLS equation by equation. Apart from computational simplicity, this choice ensures that the MIAAR has a better ﬁt to the data than the DFM even with just one iteration of the switching algorithm.

2.3

Relationships with reduced-rank models

The MIAAR can be seen as a multivariate model with a partial reduced-rank structure and, in this sense, it is closely related to Reduced Rank Regression (RRR), see e.g. Johansen (2008) and the references therein. The basic RRR model reads yt = α

p P

h=1

ψh′ yt−h + εt ,

where ψh and α are N × q matrices.

The RRR model is widely used because of its interpretation in terms of common features and its ease in

estimation, see e.g. Engle and Kozicki (1993). Moreover, it implies that the marginal processes of series yt follow parsimonious univariate models, thus solving the so-called autoregressivity paradox (Cubadda et al., 2009). However, univariate models often forecast better than medium-large RRR models in the short run even if the dimensionality problem is properly addressed in estimation, see Carriero et al. (2011) and Bernardini and Cubadda (2015). These ﬁndings advocate to extend the basic RRR setup into the following Augmented RRR (ARRR) model yt =

p P

Ch yt−h + α

h=1

r P

h=1

ψh′ yt−h + εt ,

where Ch is a N × N diagonal matrix for h = 1, ..., p.

The ARRR, which, to the best of our knowledge, has not been proposed before, can be optimally estimated

by a variant of the switching algorithm reported in Subsection (2.2). In particular, it is required to iteratively alternate between yt −

p P

=α

Ch yt−h

h=1

r P

h=1

ψh′ yt−h + εt ,

keeping [C1 , ..., Cp ] and [ψ1 , ..., ψr ] ﬁxed, and Vec(Σ−1/2 yt ) =

p P

h=1

′ ⊗ Σ−1/2 )M ]γh + [(yt−h

r P

h=1

′ ⊗ Σ−1/2 α)Vec(ψh′ ) + Vec(Σ−1/2 εt ), (yt−h

where γh is a N -vector such that Ch = diag(γh ), keeping α and Σ ﬁxed. Notice that the assumption underlying the ARRR is the following N P

k6=i

φik (L) =

q P

j=1

6

αij

N P

k6=i

ψkj (L),

(7)

where ψkj (L) is a scalar polynomial of order r − 1 (r ≤ p) and αij is a scalar, which implies the following structure for each series yit

yit = λi (L)yit−1 +

q P

αij ξjt−1 + εit ,

j=1

h i Pq PN where λi (L) = φii (L) − j=1 αij ψij (L) , and ξjt = k=1 ψkj (L)yit .

Although Assumptions (3) and (7) look similar, the interpretation of the latter is more tedious since

the weights of factors [ξ1t , .., ξqt ] are lag speciﬁc. Hence, Assumption (7) cannot be justiﬁed on the grounds of the existence of common shocks among the variables. Moreover, it is easy to see that the properties of the RRR model in terms of common features and parsimony of the marginal models are not shared by the ARRR model. However, we stress that MIAAR and ARRR boil down to the same model when r = 1, a case that is relevant for the empirical applications of this paper.

2.4

Relationships with dynamic factor models

Remarkably, the mathematical formulation of the IAAR model (4) is basically the same as the single equation representation that is implied by the largely popular DFM, see e.g. Stock and Watson (2011) and the references therein. This implies that the two models generate forecasts in a similar manner. For both the IAAR and DFM, the parameters of the forecasting equations are estimated by projecting the variable of interest onto its own lags and those of few linear combinations of all the variables in the system. Notwithstanding this similarity, there are also important diﬀerences between the MIAAR and the DFM. First, the DFM is obtained by postulating that the predictors have a factor structure, whereas the MIAAR model is obtained through restrictions (3) on each of the VAR’s equations, which can be easily tested for within a ML framework. As shown by Stock and Watson (2005), the DFM imposes instead several kinds of overidentifying restrictions to the VAR, which are usually rejected by the data. Second, at the estimation level, the indexes are constructed taking into account the comovements between the target variables and the predictors whereas factors in the DFM are typically obtained by principal component methods, which look at the internal variability of the predictors only.3 Third, estimation of the DFM requires that not only T but even N increases without bounds, which is quite at odds with ample empirical evidences suggesting that forecasting performances deteriorate when the N increases beyond a medium dimension, see, inter alia, Watson (2003), Boivin and Ng (2006), and Caggiano et al. (2011). Provided that q is small and T is large, the MIAAR approach can, in principle, be applied even to medium-large VARs. However, ML estimation may perform poorly when either N or (p + q) approaches 3 Some

supervised procedures for factors extraction have recently been proposed to overcome this limitation of the DFM. In

the empirical section, we will use one of them as a competitor of the methods considered here.

7

T because inversions of large covariance matrices are then required in step 2. We tackle this issue in the following Subsection.

2.5

Regularized estimation

Overall, ML estimation of Model (5) requires to invert three covariance matrices. The ﬁst one is the sample covariance matrix of the lagged indexes [ft−1 , .., ft−r ]′ in step 1 of the switching algorithm. Since we will observe in the empirical application that the identiﬁed number of indexes infrequently exceeds 6 with r = 1, we can conclude that no regularization is generally required in this step. Second, the square root of the residual covariance matrix S of the ﬁrst step regression needs to be inverted for step 2. Obviously, S is singular when N > T and its inverse becomes numerically unstable as N approaches T . Third, the OLS estimate of parameters Θ = δ1′ , ..., δp′ , Vec(ω ′ )′

′

in step 2 requires the inverse of the matrix X ′ X, where X |{z}

N T ×N (p+q)

=

Y−1 ⊗ S −1/2 M, ..., Y−p ⊗ S −1/2 M,

r P

h=1

(Y−h ⊗ S −1/2 βbh ) ,

(8)

′ Y−h = [y1−h , ..., yT −h ] for h = 1, ..., p, and βbh is the estimate of βh for h = 1, ..., r. Since N is a common

factor of both the number of rows and columns of X, numerical instability in inverting X ′ X arises when (p + q) is large.

When either N or (p+q) are relatively large compared to T , the procedure that we suggest to improve the ﬁnite sample performance of the ML estimator goes as follows. After reaching convergence of the switching algorithm, use a shrinkage estimator of Σ in place of S to construct the X matrix. Then use a ridge regression in place of the OLS in step 2 to obtain estimates of D and ω, and ﬁnally use those to estimate β and Σ in turn as described in step 1. With a slight abuse of terminology, we call this procedure Regularized Maximum Likelihood (RML) estimation of model (5). In particular, the shrinkage estimator that we use for Σ has the form Z = S(1 − α) + αµIm where µ > 0, and α ∈ [0, 1]. In the choice of the optimal α and µ, we follow the approach of Ledoit and Wolf (2003, 2004) that requires to minimize the expected value of the Frobenius distance between the shrinkage estimator Z and the population covariance matrix Σ. The solution for this minimization problem is P σii , µ∗ = i m P P i j V ar(sij ) α∗ = P P ∗ 2 i j [V ar(sij ) + (µ − σij ) ] 8

where sij and σij are, respectively, the generic elements of matrices S and Σ for i, j = 1, ..., N . Extending a previous result by Ledoit and Wolf (2003) for the case that εt is i.i.d with ﬁnite fourth moments, Bernardini and Cubadda (2015) show that α∗ is O(1/T ) even when εt is a linear process with ﬁnite fourth cumulants.4 Finally, the proposed shrinkage estimator is Z ∗ = S(1 − α b∗ ) + α b∗ µIm , where α b∗ = P P i

P

P P i

j

[ (vtij ) LRV

[ (vtij ) + T (b µ∗ − sij )2 ]

j [LRV

,

[ (vtij ) is a consistent estimator of the long-run variance of vtij . vtij = uit ujt , and LRV √ Notice that, since T α b∗ is a T -consistent estimator of the optimal shrinkage constant limT T α∗ , the

µ b∗ = m−1

i sii ,

diﬀerence (S − Z ∗ ) is Op (1/T ); see Ledoit and Wolf (2003) and Bernardini and Cubadda (2015) for further details.

In order to regularize the second step regression, we rely on the following ridge estimator of Θ in place of the OLS one: e ′ Ye e + λIN (p+q) )−1 X e = (X e ′X Θ h i′ −1/2 −1/2 e is obtained from Equation (8) when Z ∗ is used instead where Ye = Vec(Z ∗ yp+1 ), ..., Vec(Z ∗ yT ) , X

of S, and λ is a positive scalar. Following Hoerl et al. (1975), we ﬁx the shrinking parameter λ as λ=

N (p + q)b σ2 b b ′Θ Θ

b = (X e ′ X) e −1 X e ′ Ye , and σ e Θ) b ′ (Y − X e Θ)/N b where Θ b2 = (Y − X (T − p − q).

√

3

From the asymptotic properties of the shrinkage and ridge estimators, it easily follows that RML provides

T -consistent estimates of the MIAAR parameters.

Empirical applications

In this Section we perform an out-of-sample forecasting exercise in order to asses the practical merits of the proposed modelling over traditional forecasting methods. Another issue that we want to tackle is the role of the system dimension on forecasting accuracy. Our interest is motivated by previous contributions showing that more variables are not always beneﬁcial to the forecasting performances of VAR models even if the dimensionality problem is properly addressed in estimation, see e.g. Banbura et al. (2010) and Koop (2013). We use a data-set similar as in Carriero et al. (2011). It consists of 40 US monthly time series comprising real variables, money and prices, and ﬁnancial indicators. The time span is from 1960.02 to 2015.12, hence 4 Strictly

speaking, Bernardini and Cubadda (2015) deal with shrinkage estimation towards a diagonal matrix but the proof

of the stated result for a scalar target is entirely analogous and therefore it is omitted.

9

the overall sample size is T ∗ = 671. All the variables are transformed to achieve stationarity, demeaned and standardized to unit variance. Tables 1-2 report the list of the series under analysis and the relative transformations. Table 1: Data description - #1 − 20 Variable

Transformation

#

Industrial Production Index

(1 − L)ln

1

Employees on Nonfarm Payrolls Private

(1 − L)ln

Eﬀective Fed Funds Interest Rate

(1 − L)

(1 − L12 )(1 − L)ln

CPI-U: all items

2 3 4

NAPM New Orders Index

no transf.

5

NAPM Production Index

no transf.

6

Interest Rate: U.S. Treasury Const Maturities, 1-Yr

(1 − L)

7

Interest Rate: U.S. Treasury Const Maturities, 5-Yr Interest Rate: U.S. Treasury Const Maturities, 10-Yr

(1 − L)

(1 − L)

12

Money Stock: M1

(1 − L )(1 − L)ln

Total Capacity Utilization

(1 − L)

Manufacturing and Trade Sales

(1 − L)ln

Composite Help Wanted Index

(1 − L)

Unemployment Rate Interest Rate: U.S. Treasury Bills, Sec Mkt, 3-Mo

(1 − L)

(1 − L)

8 9 10 11 12 13 14 15

NAPM - Suppliers Deliveries Index (Percent)

no transf.

16

Commercial Paper Rate

(1 − L)

17

Producer Price Index: Finished Goods Producer Price Index: Intermed Mat. Supplies and Components Bond Yield: Moody’s AAA Corporate

(1 − L12 )(1 − L)ln

(1 − L12 )(1 − L)ln

(1 − L)

18 19 20

In order to asses the role of the system dimension on forecasting, we apply the competing methods on four nested data-sets, each of them including series ranging from 1 up to N for N = 4, 10, 20, 40 in Tables 1-2. In particular, we conduct two diﬀerent forecasting exercises: the ﬁrst aims at comparing the overall forecasting performances of the competing methods, whereas the second focuses on forecasting the ﬁrst four key series in Table 1, namely Industrial Production Index (IP), Employees on non farm payrolls (EMP), Eﬀective Federal Funds Interest Rate (FYFF), and Consumer Price Index (CPI). In both cases, we use in

10

Table 2: Data description - #21 − 40 Variable

Transformation

#

Interest Rate: U.S. Treasury Bills, Sec Mkt, 6-Mo

(1 − L)

21

Personal Income Less transfer Payments Producer Price Index: Finished Consumer Goods Employee hours in nonag. establishment Sales of retail stores

(1 − L)ln 12

(1 − L )(1 − L)ln

(1 − L)ln

(1 − L)ln

Real consumption

(1 − L)ln 12

Commercial and Industrial Loans

22 23 24 25 26 27

NAPM Inventories Index

(1 − L )(1 − L)ln

no transf.

28

US Eﬀective Exchange Rate

(1 − L)ln

29

Money supply - M2 real

12

30

12

31

(1 − L )(1 − L)ln

Money Stock: M2

(1 − L )(1 − L)ln

Monetary Base Depository Inst Reserves, Nonborrowed Producer Price Index: Crude Materials SP’s Common Stock: Dividend Yield SP’s Common Stock Price Index SP’s Common Stock Price-Earings Ratio Bond Yield: Moody’s BAA Corporate Personal Income Consumer Credit Outstanding, Nonrevolving

(1 − L12 )(1 − L)ln

(1 − L12 )(1 − L)ln 12

(1 − L )(1 − L)ln

(1 − L)

(1 − L)ln

(1 − L)ln

(1 − L)

(1 − L)ln 12

(1 − L )(1 − L)ln

32 33 34 35 36 37 38 39 40

turn the four data-sets. The forecasting exercises are performed using a rolling window of 10 years, hence T = 120. For ﬁve forecasting methods, iterative forecasts of yτ +h are computed using the predictors xτ = [yτ′ , .., yτ′ −p+1 ]′ for τ = T + p, ..., T ∗ − h and h = 1, 3, 6, 12. The competing methods of ML and RML are univariate AR models,

and two alternative strategies of estimating the parameters of the multivariate model (5)

First, the indexes fτ are obtained as the ﬁrst q Principal Components (PC) of series yτ as advocated by Stock and Watson (2002a, 2002b). Second, the indexes are constructed by Partial Least Squares (PLS) having removed from elements of yτ the linear inﬂuence of their own lags. PLS are a family of statistical procedures aiming at maximizing the covariance between two sets of variables. Cubadda and Guardabascio

11

(2012) and Groen and Kapetianos (2016), inter alia, showed that PLS compares quite well to PC regression in macroeconomic forecasting when the dependent variable is a scalar. However, the approaches used by these authors are not fully appropriate in our case because the resulting indexes would be diﬀerent for each variable. Hence, we use a variant of the multivariate PLS method by Cubadda and Hecq (2011), which consists in computing the weights ω as the eigenvectors corresponding to the q largest eigenvalues of the cross-covariance matrix between the lags of yτ and the residuals of N individual autoregressions of elements of yτ . Both in the case of PC and PLS, the coeﬃcient matrices β and D are then estimated by OLS. At the univariate level, we use AR(13) models, which are able to remove serial correlation from the considered series. The identiﬁcation of the multivariate model requires to specify the number of indexes q as well as the lag order of both the individual variables, p, and the indexes, r. We ﬁx p = 13 as for the univariate AR models. We then compare the results for the various estimation methods letting r vary from 1 up to 6, and we ﬁnd no signiﬁcant gains from using an index lag order longer than 1. Therefore, we use Model (5) with p = 13 and r = 1. Finally, we choose q by means of the information criteria of Akaike (AIC), Schwarz (BIC) and Hannan-Quinn (HQIC), having set the maximum number of indexes equal to 7.5 The RML method requires to estimate the long run variances of eit ejt (i, j = 1, ..N , t = 1, .., T ) where et = [e1t , ..., ent ]′ are the ML residuals. We opt for ignoring the time series dependence in these crossproducts and we simply compute the variances of them. Apart from simplicity, the reason is twofold. First, under correct speciﬁcation of the model, the residuals et are i.i.d. in large samples, so their cross-products should not be signiﬁcantly autocorrelated. Second, Sancetta (2006) shows that there is no substantial gain from accounting for time series dependence in the cross-products when the variable dimension is large, that is typically the situation where the shrinkage is stronger. In Tables 3-4 we report both the average of the mean square forecast errors relative to the AR(13) forecasts (ARMSFE) over the N series and the quartiles of the number of indexes that are selected by each information criterion for each of the multivariate forecasting methods. In order to ﬁnd the set of multivariate models which forecast equally well, we rely on the model conﬁdence set (MCS) analysis by Hansen et al. (2011). In particular, the test for the null hypothesis of equal predictive ability at the 20% level is implemented using a block bootstrap scheme with 5000 resamples. The empirical ﬁndings indicate that RML delivers the most accurate forecasts in 14 cases out of 16. The two exceptions are: N = 10 (N = 40) for h = 12 (1), where ML with q selected by AIC (HQIC) is the best performer. The MCS analysis reveals that RML (ML) [PLS] belongs to the superior set in 15 (3) [1] of the cases, whereas AR and PC are never included among the best performers. Interestingly, RML improves over ML even when N = 4, a dimension for which the shrinkage of the parameters in Σ and Θ is small 5 Notice

that in computing the information criteria the number of parameters that we use is N [(r + 1)q + p] − q 2 and not

N (rq + p). Hence, the indexes are not treated as known quantities as typically done in DFMs.

12

Table 3: ARMSFE and quartiles of the estimated q distribution - N = 4, 10 ARMSFE (N, p, r)

Method/criterion

h=1

h=3

h=6

h = 12

[Q1 Q2 Q3 ]

(4, 13, 0)

AR

100.00

100.00

100.00

100.00

(4, 13, 1)

ML/BIC

97.13

98.85

98.59

97.75

[1 1 1]

(4, 13, 1)

ML/HQIC

97.13

98.85

98.59

97.75

[1 1 1]

(4, 13, 1)

ML/AIC

97.42

98.92

98.67

97.87

[1 1 1]

(4, 13, 1)

RML/BIC

93.67*

96.47*

96.87*

95.55*

[1 1 1]

(4, 13, 1)

RML/HQIC

93.94

96.54

96.91*

95.53*

[1 1 1]

(4, 13, 1)

RML/AIC

94.80

97.27

97.59

96.03

[1 1 1]

(4, 13, 1)

PC/BIC

96.74

97.43

98.97

98.82

[1 1 1]

(4, 13, 1)

PC/HQIC

97.37

97.87

99.08

99.06

[1 1 1]

(4, 13, 1)

PC/AIC

96.75

97.28

98.89

98.81

[1 1 3]

(4, 13, 1)

PLS/BIC

97.77

97.47

98.65

98.56

[1 1 1]

(4, 13, 1)

PLS/HQIC

98.25

97.78

99.66

99.12

[1 1 1]

(4, 13, 1)

PLS/AIC

98.79

98.26

96.87

99.56

[1 1 2]

(10, 13, 0)

AR

100.00

100.00

100.00

100.00

(10, 13, 1)

ML/BIC

92.69

97.16

99.34

98.45

[1 1 1]

(10, 13, 1)

ML/HQIC

91.21

96.58

98.91

97.78

[1 1 2]

(10, 13, 1)

ML/AIC

91.05

96.20

95.95*

94.36*

[1 2 4]

(10, 13, 1)

RML/BIC

90.71

95.53

98.46

98.82

[1 1 1]

(10, 13, 1)

RML/HQIC

89.17

95.54

98.29

98.31

[1 1 2]

(10, 13, 1)

RML/AIC

88.38*

94.50*

95.59*

95.95

[1 2 4]

(10, 13, 1)

PC/BIC

94.51

96.67

99.10

98.44

[1 1 1]

(10, 13, 1)

PC/HQIC

94.09

97.97

99.62

99.20

[1 1 1]

(10, 13, 1)

PC/AIC

93.88

98.45

99.25

97.71

[1 1 4]

(10, 13, 1)

PLS/BIC

93.27

96.36

99.53

98.90

[1 1 1]

(10, 13, 1)

PLS/HQIC

92.70

97.12

100.29

99.15

[1 1 1]

(10, 13, 1)

PLS/AIC

92.83

97.62

100.14

98.94

[1 1 2]

Note: N is the number of the series, p is the lag order of the individual series, r is the lag order of the indexes, ARMSFE is the mean square forecast error relative to the AR(p) forecasts, h is the forecasting horizon, Qi indicates the i-th quartile of the number of indexes. The best result for each pair (N, h) is in bold. * indicates models belonging to the superior set at 20% level

13

Table 4: ARMSFE and quartiles of the estimated q distribution - N = 20, 40 ARMSFE (N, p, r)

Method/criterion

h=1

h=3

h=6

h = 12

(20, 13, 0)

AR

100.00

100.00

100.00

100.00

(20, 13, 1)

ML/BIC

91.92

95.49

98.77

100.41

[1 1 1]

(20, 13, 1)

ML/HQIC

90.89

95.51

97.89

97.80

[1 1 1]

(20, 13, 1)

ML/AIC

93.99

97.11

95.87

96.80

[4 5 6]

(20, 13, 1)

RML/BIC

89.86

94.24

97.58

100.82

[1 1 1]

(20, 13, 1)

RML/HQIC

88.45*

93.34*

95.64

97.16

[1 1 1]

(20, 13, 1)

RML/AIC

90.15

93.63*

94.16*

95.71*

[3 4 6]

(20, 13, 1)

PC/BIC

93.42

97.09

99.70

98.70

[1 1 1]

(20, 13, 1)

PC/HQIC

92.97

96.95

99.42

99.24

[1 1 1]

(20, 13, 1)

PC/AIC

91.68

97.17

99.93

99.26

[1 1 2]

(20, 13, 1)

PLS/BIC

92.88

97.03

100.32

99.92

[1 1 1]

(20, 13, 1)

PLS/HQIC

91.26

96.05

99.20

98.61

[1 1 1]

(20, 13, 1)

PLS/AIC

91.68

96.87

99.71

98.91

[1 1 2]

(40, 13, 0)

AR

100.00

100.00

100.00

100.00

(40, 13, 1)

ML/BIC

92.90

95.72

96.91

98.65

[1 1 1]

(40, 13, 1)

ML/HQIC

90.94*

96.32

98.69

98.37

[1 1 2]

(40, 13, 1)

ML/AIC

101.18

100.43

99.37

101.35

[7 7 7]

(40, 13, 1)

RML/BIC

91.29*

94.57*

95.42*

97.25*

[1 1 1]

(40, 13, 1)

RML/HQIC

91.65

95.24*

97.23

96.84*

[1 2 3]

(40, 13, 1)

RML/AIC

99.21

98.13

97.39

98.30*

[6 7 7]

(40, 13, 1)

PC/BIC

95.60

97.77

100.11

100.62

[1 1 1]

(40, 13, 1)

PC/HQIC

94.94

97.12

99.27

99.50

[1 1 1]

(40, 13, 1)

PC/AIC

94.16

96.72

98.45

99.46

[1 1 2]

(40, 13, 1)

PLS/BIC

94.70

97.09

99.42

99.58

[1 1 1]

(40, 13, 1)

PLS/HQIC

93.49

95.95

98.83

97.99*

[1 1 2]

(40, 13, 1)

PLS/AIC

92.82

95.85

98.05

97.17*

[1 2 3]

See the notes of Table 3.

14

[Q1 Q2 Q3 ]

Table 5: RMSFE - IP Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

RML

HQIC

20

87.85∗

2nd best-method

h=1

ML

AIC

20

90.21∗

2nd best-N

h=1

RML

AIC

10

91.61

h=3

RML

HQIC

20

95.05∗

2nd best-method

h=3

ML

HQIC

20

96.20∗

2nd best-N

h=3

RML

AIC

40

96.48∗

1st best

h=6

PC

AIC

10

93.64∗

2nd best-method

h=6

RML

BIC

40

93.90∗

2nd best-N

h=6

RML

BIC

40

93.90∗

1st best

h = 12

ML

AIC

10

94.61∗

2nd best-method

h = 12

PLS

AIC

40

96.71∗

2nd best-N

h = 12

PLS

AIC

40

96.71∗

st

1

st

1

best

best

Note: RMSFE is the mean square forecast error relative to the AR(13) forecasts. indicates models belonging to the superior set at 20% level

but not null. ML is, however, quite often the second best performer after RML. PC and PLS have similar performances for N = 4, 10, whereas the latter outdoes the former for N = 20, 40. In terms of model selection, no information criterion uniformly dominates the others in the choice of the best forecasting model. With reference to ML and RML, BIC (AIC) is more eﬀective when N = 4, 40 (N = 10, 20), whereas HQIC has quite homogeneous performances over all the dimensions. AIC often selects the best speciﬁcation for PC, whereas the larger is the system dimension the less parsimonious is the best selection criterion for PLS. Regarding the choice of the number of indexes, the most accurate forecasts are obtained with the estimated q that equals 1 or 2 but in the cases of the two largest forecasting horizons when N = 20, for which the optimal q ranges from 3 to 6. Turning our attention to the four key target series, beyond computing the MSFE relative to AR univariate forecast (RMSFE) for each of them, we rely on the MCS analysis as well. For the sake of space, in Tables 5-8 we present the results of the best performer together with: (i) the second best forecasts obtained with a diﬀerent estimation method; (ii) the second best forecasts obtained with a diﬀerent system dimension N . The empirical ﬁndings indicate that RML delivers the most accurate forecasts in 9 cases out of 16. It performs best for IP when h = 1, 3, for EMP in all horizons but h = 12, for FYFF in all horizons but 15

Table 6: RMSFE - EMP Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

RML

HQIC

20

87.85∗

2nd best-method

h=1

ML

AIC

20

90.22∗

2nd best-N

h=1

RML

AIC

10

94.07

h=3

RML

HQIC

20

95.05∗

2nd best-method

h=3

ML

HQIC

20

96.20∗

2nd best-N

h=3

RML

AIC

40

96.48∗

1st best

h=6

RML

AIC

20

91.25∗

2nd best-method

h=6

ML

AIC

20

94.94

best-N

h=6

ML

AIC

10

95.42

best

h = 12

ML

AIC

10

91.23∗

2nd best-method

h = 12

RML

AIC

20

91.77∗

2nd best-N

h = 12

RML

AIC

20

91.77∗

1

st

st

1

2

nd

1

st

best

best

See the notes of Table 5.

Table 7: RMSFE - FYFF Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

ML

HQIC

40

64.35∗

2nd best-method

h=1

RML

HQIC

10

65.34∗

2nd best-N

h=1

RML

HQIC

10

65.34∗

1st best

h=3

RML

BIC

20

84.69∗

2nd best-method

h=3

PLS

HQIC

10

87.20∗

2nd best-N

h=3

RML

AIC

10

85.76∗

1st best

h=6

RML

HQIC

20

91.30∗

2nd best-method

h=6

ML

AIC

10

92.33∗

2nd best-N

h=6

RML

AIC

10

91.46∗

1st best

h = 12

RML

HQIC

4

94.95∗

2nd best-method

h = 12

ML

AIC

10

96.49∗

2nd best-N

h = 12

ML

AIC

10

96.49∗

1

st

best

See the notes of Table 5.

16

Table 8: RMSFE - CPI Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

PC

BIC

40

92.74∗

2nd best-method

h=1

ML

HQIC

40

93.40∗

2nd best-N

h=1

RML

BIC

10

93.46∗

1st best

h=3

PLS

HQIC

10

90.72∗

2nd best-method

h=3

PC

HQIC

20

90.92∗

2nd best-N

h=3

PC

HQIC

20

90.92∗

1st best

h=6

PLS

AIC

10

93.49∗

2nd best-method

h=6

RML

AIC

10

93.61∗

2nd best-N

h=6

PC

AIC

20

95.13

h = 12

RML

AIC

10

88.46∗

2nd best-method

h = 12

PLS

AIC

10

88.85∗

2nd best-N

h = 12

PLS

AIC

40

90.15∗

1

st

1

st

best

best

See the notes of Table 5.

h = 1, and for CPI in the longest horizon. Moreover, RML is the only method that uniquely forms the model superior set in one case. The second best method is ML, which is the best predictor of IP and EMP in the longest horizon, as well as of FYFF in the shortest horizon. PC (PLS) performs best for CPI and IP (FYFF) when h = 6 ( h = 3). The most accurate forecasts are obtained when the number of factors q is chosen according to the HQIC in 50% of cases. Again, the univariate AR model never provides the best result. Regarding the role of the system dimension, the best forecasts are obtained with N = 20 (N = 10) [N = 40] {N = 4} in 7 (6) [2] {1} of the cases. Interestingly, when the best result is obtained using either the smallest or the largest dimension, there is always a forecast obtained using an intermediate dimension that belongs to the superior set as well, whereas the reverse is not true. These results corroborate previous ﬁndings on the merits of medium dimensional models for macroeconomic forecasting.

17

4

Monte Carlo analysis

In this Section we perform a small Monte Carlo study to evaluate the ﬁnite sample performances of ML, RML, and PC. In particular, the Data Generating Process (DGP) is the following model yt =

13 P

Dh yt−h + βω ′ yt−1 + εt ,

h=1

where εt are i.i.d. N (0, Σε ), and all the parameters are set equal to their estimates obtained by PLS in combination with the HQIC. Hence, this Monte Carlo analysis is explicitly designed to be unfair with respect to the (R)ML approach.6 In order to obtain those estimates, we use two subsamples of the variables listed in Table 1: a rather volatile period as the Pre-Great Moderation (1960.01-1984.12, PGM), and a more stable one as the Great Moderation (1985.01-2007.12, GM). We generate systems of N = 20 variables with a number of observations of vector series yt equal to T + 150 for T = 120, 240. The ﬁrst 50 points are used as a burn-in period, the last 100 observations to compute the h−step ahead forecast errors for h = 1 and the intermediate T observations to estimate the competing models. The various combinations of methods and information criteria are evaluated by means of two statistics. First, the ARMSFE for the forecasting horizon h = 1. Second, the percentage with which the estimated number of indexes qb coincides with the true q. In order to asses the signiﬁcance of the diﬀerences between the

two best performers according to each of these statistics, we compute the t-test for the null hypothesis that the diﬀerences between the ARMSFEs are centered on 0, and the McNemar’s test for the null hypothesis that the probabilities of identifying the true number of factors are the same. The results are reported in Tables 9-10.

With the sample size T = 120, we observe that RML in combination with AIC is the best performer according to both the evaluation criteria when the DGP is PGM, whereas all the methods identify the true q when BIC is used and RML in combination with BIC is the best forecaster when the DGP is GM. Results are similar when T = 240. When the DGP is PGM, RML in combination with AIC still delivers the most accurate forecasts but the highest percentage of correct model identiﬁcations is given by ML in combination with AIC, whereas both BIC and HQIC always identify the true q and RML in combination with either BIC or HQIC forecasts best when the DGP is GM. Looking at the comparison between PLS and PC, we notice that PLS outdoes PC with PGM, while their results are similar with GM. 6 In

an alternative Monte Carlo design, the parameters of the simulated MIAAR are those obtained in the empirical application

through RML using the AIC. The results, available upon request, are clearly more favourable for the RML method.

18

Table 9: Simulation results - N = 20, T = 120 Method/Criterion ML/BIC

Statistics

PGM (q = 4)

GM (q = 1)

% in which qb = q

0.00

100.00

% in which qb > q

0.00

0.00

89.71

93.61

% in which qb = q

1.00

99.60

% in which qb > q

0.00

0.40

88.88

93.65

ARMSFE ML/HQIC

ARMSFE ML/AIC

RML/BIC

% in which qb = q

30.00

75.80

% in which qb > q

24.20

24.20

ARMSFE

89.26

95.08

0.00

100.00

% in which qb = q % in which qb > q

0.00

0.00

88.30

91.37

% in which qb = q

0.60

99.80

% in which qb > q

0.00

0.20

86.88

91.38

ARMSFE RML/HQIC

ARMSFE RML/AIC

% in which qb = q

32.80 *

57.40

% in which qb > q

22.80

42.60

ARMSFE PLS/BIC

85.93 ***

% in which qb = q % in which qb > q

PLS/AIC

0.00

0.00 95.42

% in which qb = q

0.00

100.00

% in which qb > q

0.00

0.00

ARMSFE

90.35

95.42

% in which qb = q

15.00

97.20

% in which qb > q

0.80

2.80

ARMSFE PC/BIC

% in which qb = q % in which qb > q

PC/AIC

88.57

95.53

0.00

100.00

0.00

0.00

94.00

95.38

% in which qb = q

1.20

100.00

% in which qb > q

0.00

0.00

ARMSFE

92.69

95.38

% in which qb = q

21.00

100.00

% in which qb > q

5.80

0.00

90.12

95.38

ARMSFE PC/HQIC

93.27 100.00

91.27

ARMSFE PLS/HQIC

0.00

ARMSFE

(P)GM indicates the sample on which the model estimated by PLS/HQIC is used as DGP. * (**) and [***] indicate signiﬁcance at the 10% (5%) and [1%] level of test of equal ARMSFEs, and of the McNemar’s test on the diﬀerences between the proportions in which qb=q for the two best performers. See the notes of Table (3) for the meaning of other acronyms.

19

Table 10: Simulation results - N = 20, T = 240 Method/Criterion ML/BIC

Statistics

PGM (q = 4)

GM (q = 1)

% in which qb = q

0.00

100.00

% in which qb > q

0.00

0.00

88.61

93.20

% in which qb = q

5.80

100.00

% in which qb > q

0.00

0.00

85.12

93.20

ARMSFE ML/HQIC

ARMSFE ML/AIC

% in which qb = q % in which qb > q

4.40

83.91

93.32

% in which qb = q

0.00

100.00

% in which qb > q

0.00

ARMSFE RML/HQIC

RML/AIC

88.38 4.00

100.00

% in which qb > q

0.00

0.00

ARMSFE

84.24

% in which qb = q

72.80

% in which qb > q

6.40 82.80 ***

PLS/AIC

PC/AIC

6.40 92.63

0.00 0.00

0.00

89.51

94.42

% in which qb = q

9.60

100.00

% in which qb > q

0.00

0.00

100.00

ARMSFE

86.25

94.42

% in which qb = q

65.40

99.00

4.80

1.00

84.19

94.44

% in which qb = q

0.80

100.00

% in which qb > q

0.00

0.00

ARMSFE

91.49

94.38

% in which qb = q

30.40

100.00

% in which qb > q

3.40

0.00

ARMSFE

PC/HQIC

93.60

% in which qb > q

% in which qb > q PC/BIC

92.51 ***

% in which qb = q ARMSFE

PLS/HQIC

0.00 92.51 ***

% in which qb = q

ARMSFE PLS/BIC

95.60

8.80

ARMSFE RML/BIC

73.60 ***

ARMSFE

87.56

94.38

% in which qb = q

44.80

99.00

% in which qb > q

40.00

1.00

ARMSFE

84.95

94.38

See the notes of Table (9).

20

5

Conclusions

In this paper we have proposed a new modelling, called MIAAR, that can be derived by means of reasonable and testable restrictions on the VAR parameter space. The basic assumption is that there is a limited number of channels through which each variable is inﬂuenced by the past of other variables in the system. This is line with the common belief that only a few shocks are responsible for the bulk of aggregate ﬂuctuations. Provided that the number of linear combinations that conveys past external information is small compared to the number of variables, the MIAAR is considerably more parsimonious than the unrestricted VAR. The parameters of the MIAAR can be estimated by a switching algorithm that increases the Gaussian likelihood at each iteration. Since ML estimation is known to perform poorly when the number of parameters is large, we also oﬀer a regularized estimator for medium-large VARs. Using both simulations and empirical examples, we have shown that the MIAAR outperforms wellknown methods for macroeconomic forecasting when applied to systems of various dimensions. Moreover, our empirical results conﬁrm that medium dimensional systems are the most useful for macroeconomic forecasting.

References [1] Banbura, M., Giannone, D., and L. Reichlin (2010), Large Bayesian VARs, Journal of Applied Econometrics, 25, 71-92. [2] Bernanke, B., Boivin, J., and P.S. Eliasz (2005), Measuring the Eﬀects of Monetary Policy: A Factor-augmented Vector Autoregressive (FAVAR) Approach, The Quarterly Journal of Economics, 120, 387-422 [3] Bernardini E. and G. Cubadda (2015), Macroeconomic Forecasting and Structural Analysis through Regularized Reduced-Rank Regression, International Journal of Forecasting, 31, 682-691. [4] Boswijk, H.P. (1995), Identiﬁability of cointegrated systems, Tinbergen Institute Working Paper, 78. [5] Boivin, J. and S. Ng (2006), Are more data always better for factor analysis, Journal of Econometrics, 132, 169—194. [6] Caggiano, G., Kapetanios, G., and V. Labhard (2011), Are more data always better for factor analysis? Results for the euro area, the six largest euro area countries and the UK, Journal of Forecasting, 30, 736—752.

21

[7] Carriero, C., Clark, T., and M. Marcellino (2015), Bayesian VARs: speciﬁcation choices and forecast accuracy, Journal of Applied Econometrics, 30, 46—73. [8] Carriero, A. Kapetanios, G., and M. Marcellino (2011), Forecasting Large Datasets with Bayesian Reduced Rank Multivariate Models, Journal of Applied Econometrics, 26, 735-761. [9] Carriero, A. Kapetanios, G., and M. Marcellino (2016), Structural analysis with Multivariate Autoregressive Index models, Journal of Econometrics, 192, 332—348. [10] Chudik, A. and M.H Pesaran (2011), Inﬁnite-dimensional VARs and factor models, Journal of Econometrics, 163, 4-22. [11] Cubadda G. and B. Guardabascio (2012), A Medium-N Approach to Macroeconomic Forecasting, Economic Modelling, 29, 1099-1105. [12] Cubadda G., Guardabascio B., and A. Hecq (2013), A General to Speciﬁc Approach for Constructing Composite Business Cycle Indicators, Economic Modelling, 33, 367-374. [13] Cubadda G., Guardabascio B., and A. Hecq (2017), A Vector Heterogeneous Autoregressive Index Model for Realized Volatility Measures, International Journal of Forecasting, 33, 337—344. [14] Cubadda, G. and A. Hecq (2011), Testing for Common Autocorrelation in Data Rich Environments, Journal of Forecasting, 30, 325—335. [15] Cubadda, G., Hecq A., and F.C. Palm (2009), Studying Interactions without Multivariate Modeling, Journal of Econometrics, 148, 25-35. [16] Engle, R.F. and S. Kozicki (1993), Testing for Common Features (with comments), Journal of Business and Economic Statistics, 11, 369-395. [17] Groen, J. and G. Kapetanios (2016), Revisiting useful approaches to data-rich macroeconomic forecasting, Computational Statistics & Data Analysis, 100, 221—239. [18] Hansen, P.R., Lunde, A., and J. Nason (2011), The model conﬁdence set, Econometrica, 79, 453-497. [19] Hoerl, A.E., Kennard, R.W. and K.F. Baldwin (1975), Ridge Regression: Some Simulations, Communications in Statistics - Theory and Methods, 4 105—123. [20] Johansen, S. (2008), Reduced Rank Regression, in Durlauf, S.N. and L.E. Blume (eds.) The New Palgrave Dictionary of Economics 2nd Edition, Palgrave Macmillan.

22

[21] Kock, A.B. and L. Callot (2015), Oracle inequalities for high dimensional vector autoregressions, Journal of Econometrics, 186, 325—344 [22] Koop, G.M. (2013), Forecasting with Medium and Large Bayesian VARs, Journal of Applied Econometrics, 28, 177—203. [23] Ledoit, O. and M. Wolf (2003), Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, Journal of Empirical Finance, Vol. 10, pp. 603-621. [24] Ledoit, O. and M. Wolf (2004), A well-conditioned estimator for large-dimensional covariance matrices, Journal of Multivariate Analysis, 88, 365-411. [25] Oberhofer, W. and J. Kmenta (1974), A general procedure for obtaining maximum likelihood estimates in generalized regression models, Econometrica, 42, 579—590. [26] Pesaran, M.H., Schuermann, T., and S.M. Weiner (2004), Modelling regional interdependencies using a global error-correcting macroeconometric model, Journal of Business and Economics Statistics, 22, 129-162 [27] Reinsel, G. (1983), Some Results on Multivariate Autoregressive Index Models, Biometrika, 70, 145— 15. [28] Sancetta, A. (2008), Sample Covariance Shrinkage for High Dimensional Dependent Data, Journal of Multivariate Analysis, 99, 949-967. [29] Sims, C. (1980), Macroeconomics and Reality, Econometrica, 48, 1-48. [30] Stock J.H. and M.W. Watson (2002a), Forecasting Using Principal Components from a Large Number of Predictors, Journal of the American Statistical Association, 97, 1167-1179. [31] Stock J.H. and M.W. Watson (2002b), Macroeconomic Forecasting using Diﬀusion Indexes, Journal of Business and Economics Statistics, 20, 147-162. [32] Stock, J.H. and M.W. Watson (2005), Implications of Dynamic Factor Models for VAR Analysis, NBER Working Papers, 11467. [33] Stock, J.H. and M.W. Watson (2011), Dynamic Factor Models, in Clements, M.P., and D.F. Henry (eds.) Oxford Handbook of Economic Forecasting, Oxford University Press. [34] Watson, M.W. (2003), Macroeconomic forecasting using many predictors, in Dewatripont, M., L. Hansen and S. Turnovsky (eds.), Advances in Econometrics, Theory and Applications, Eighth World Congress of the EconometricSociety, Vol. III, 87-115.30.

23

RECENT PUBLICATIONS BY CEIS Tor Vergata

The “Double Expansion of Morbidity” Hypothesis: Evidence from Italy V.Atella, F.Belotti, C.Cricelli, D.Dankova, J.Kopinska, A.Palma and A.Piano Mortari CEIS Research Paper, 396, February 2017 The baraccati of Rome: internal migration, housing, and poverty in fascist Italy (1924-1933) Stefano Chianese CEIS Research Paper, 395, February 2017 Optimal Monetary Policy in a Pure Currency Economy with Heterogenous Agents Nicola Amendola, Leo Ferraris and Fabrizio Mattesini CEIS Research Paper, 394, December 2016 A Closer Look at EU Current Accounts Mariarosaria Comunale CEIS Research Paper, 393, August 2016 The Effect of the Italian Unification on the Comparative Regional Development in Literacy, 1821-1911 Carlo Ciccarelli and Jacob Weisdorf CEIS Research Paper, 392, July 2016 A Vector Heterogeneous Autoregressive Index Model for Realized Volatility Measures Gianluca Cubadda, Barbara Guardabascio and Alain Hecq CEIS Research Paper, 391, July 2016 The Corporate Legality Game. A Lab Experiment on The Impact of Policies, Frames and Information Leonardo Becchetti, Vittorio Pelligra and Fiammetta Rossetti CEIS Research Paper, 390, July 2016 Disentangling Adverse Selection, Moral Hazard and Supply Induced Demand: An Empirical Analysis of The Demand For Healthcare Services Vincenzo Atella, Alberto Holly and Alessandro Mistretta CEIS Research Paper, 389, June 2016 Smallholder productivity and weather shocks: Adoption and impact of widely promoted agricultural practices in Tanzania Aslihan Arslan, Federico Belotti and Leslie Lipper CEIS Research Paper, 388, June 2016 Volatility and Growth with Recursive Preferences Barbara Annicchiarico, Alessandra Pelloni and Fabrizio Valenti CEIS Research Paper, 387, June 2016

DISTRIBUTION Our publications are available online at www.ceistorvergata.it DISCLAIMER The opinions expressed in these publications are the authors’ alone and therefore do not necessarily reflect the opinions of the supporters, staff, or boards of CEIS Tor Vergata. COPYRIGHT Copyright © 2017 by authors. All rights reserved. No part of this publication may be reproduced in any manner whatsoever without written permission except in the case of brief passages quoted in critical articles and reviews. MEDIA INQUIRIES AND INFORMATION For media inquiries, please contact Barbara Piazzi at +39 06 72595652/01 or by email at [email protected] Our web site, www.ceistorvergata.it, contains more information about Center’s events, publications, and staff. DEVELOPMENT AND SUPPORT For information about contributing to CEIS Tor Vergata, please contact Carmen Tata at +39 06 72595615/01 or by e-mail at [email protected]

Representation, Estimation and Forecasting of the Multivariate IndexAugmented Autoregressive Model Gianluca Cubadda and Barbara Guardabascio

Representation, Estimation and Forecasting of the Multivariate Index-Augmented Autoregressive Model∗ Gianluca Cubadda†

Barbara Guardabascio

Università di Roma "Tor Vergata", Italy

ISTAT, Italy

January 30, 2017

Abstract We examine the conditions under which each individual series that is generated by a vector autoregressive model can be represented as an autoregressive model that is augmented with the lags of few linear combinations of all the variables in the system. We call this modelling Multivariate Index-Augmented Autoregression (MIAAR). We show that the parameters of the MIAAR can be estimated by a switching algorithm that increases the Gaussian likelihood at each iteration. Since maximum likelihood estimation may perform poorly when the number of parameters gets larger, we propose a regularized version of our algorithm to handle a medium-large number of time series. We illustrate the usefulness of the MIAAR modelling both by empirical applications and simulations. JEL: C32 Keywords: Multivariate index autoregressive models, reduced rank regression, dimension reduction, shrinkage estimation, macroeconomic forecasting. ∗ We

are grateful to Alain Hecq, Antoni Espasa, Giuseppe Storti, Helmut Lütkepohl, Massimiliano Marcellino, Paulo Ro-

drigues, and Pietro Coretto for useful comments, as well as Elisa Scambelloni for research assistance on a previous version of this paper. The usual disclaimers apply. † Corresponding author. Dipartimento di Economia e Finanza, Universita’ di Roma "Tor Vergata", Via Columbia 2, 00133 Roma, Italy. Email: [email protected]

1

1

Introduction

Many years after the seminal paper by Sims (1980), the Vector Auto-Regressive model (VAR henceforth) is still widely used for analyzing and forecasting multivariate economic time series. The main reason is that the VAR, notwithstanding its simplicity, is a very general statistical model that is able to reproduce rather complex dynamic interactions of the data. However, it is well known that the price to pay for this generality is the so-called curse of dimensionality, i.e. the VAR parameters proliferate with the number of time series at hand. Given the typical sample sizes of macroeconomic data, this implies that Ordinary Least Squares (OLS) estimation of the VAR is viable for small scale models only. Recently, a large body of research has tackled the issue of adapting the VAR modelling to medium and large dimensional systems. In order to reach this goal, two complementary approaches have been pursued so far. On one hand, a line of research focuses on shrinkage estimation, often within a Bayesian framework; see, inter alia, Banbura et al. (2010), Koop (2013), Carriero et al. (2015), and Kock and Callot (2015). On the other hand, the idea is to assume that a latent low-dimensional structure is present in the largedimensional VAR system. Contributions along this line include the global VAR by Pesaran et al. (2004), the factor augmented VAR by Bernanke et al. (2005), the partial least squares approach suggested by Cubadda and Hecq (2011), the inﬁnite-dimensional VAR by Chudik and Pesaran (2011), the Bayesian reduced rank regression methods proposed by Carriero et al. (2011), and the regularized canonical correlation analysis by Bernardini and Cubadda (2015). The present paper wishes to contribute to the latter approach above, although we borrow some idea from the former one as well. In particular, we start by examining the conditions under which each equation of the VAR can be written as an Auto-Regressive model (AR) augmented with the lags of few linear combinations of all the variables in the system. We show that this parameter reduction occurs when there is a reduced number of channels through which each variable is inﬂuenced by the past of other variables in the system, which is line with the common belief that a small number of shocks is responsible for most of aggregate economic ﬂuctuations. We label this approach as index-augmented autoregression, given that its multivariate representation is an extension of the multivariate autoregressive index model by Reinsel (1983). We show that optimal estimation of the proposed model can be achieved by a switching algorithm that maximizes the Gaussian likelihood at each step. Since the Maximum Likelihood (ML) approach is not well suited for high-dimensional systems, we propose a regularized version of our estimator that can handle medium-large VARs. Moreover, we examine analogies and diﬀerences among our model and two largely popular approaches in the literature, namely reduced rank regression and dynamic factor models, see e.g. Johansen (2008) and Stock and Watson (2011), respectively, and the references therein. As a by-product of our analysis, we

2

develop a reduced rank regression model that is augmented with variable-speciﬁc AR structures, which, to the best of our knowledge, has not been proposed before. Finally, we evaluate the proposed methods by both empirical applications and simulations. The paper is organized as follows. Section 2 deals with the theoretical aspects at the level of both model representation and statistical inference. In Section 3 a comprehensive forecasting is performed to evaluate the merits of the proposed modelling over traditional forecasting methods. In Section 4 a Monte Carlo study evaluates the ﬁnite sample properties of the various methods in terms of both model speciﬁcation and forecasting performance. Finally, Section 5 concludes.

2

Theory

In this Section we ﬁrst present the derivation of the proposed modelling, then we discuss statistical inference in both small and medium-large frameworks. Moreover we comment on analogies and diﬀerences with other popular approaches in multivariate time series analysis.

2.1

Representation

Let yt ≡ (y1t , . . . , yN t )′ denote the N -vector of the time series of interest. We assume that yt is generated

by a stationary vector autoregressive model of order p (VAR(p) hereafter): yt = Φ(L)yt−1 + εt , where Φ(L) =

Pp−1

h=0

t = 1, 2, ..., T,

(1)

Φh Lh and εt are i.i.d. innovations with E(εt ) = 0, E(εt ε′t ) = Σ (positive deﬁnite) and

ﬁnite fourth moments. From Equation (1) we see that each series yit is generated by the following dynamic regression model: yit = φii (L)yit−1 +

N P

φik (L)ykt−1 + εit ,

i = 1, ..., N

(2)

k6=i

where φik (L) is the generic element of the polynomial matrix Φ(L), and εit is i-th element of the innovation vector εt . In order to reduce the number of parameters of model (1), we further assume that N P

k6=i

φik (L) =

q P

j=1

βij (L)

N P

ωkj ,

(3)

k6=i

where βij (L) is a scalar polynomial of order at most (p − 1), and ωkj is a scalar for i = 1, ..., N and

j = 1, ..., q < N .

In order to motivate the above assumption, we remark that the unrestricted VAR foresees (N −1) linearly

independent mechanisms by which past external information is transmitted to each variable. However, since 3

it is commonly believed that a relatively small number of shocks is responsible for most of aggregate economic ﬂuctuations, it is reasonable to assume that there is a reduced number of channels through which each variable is inﬂuenced by the past of other variables in the system. In words, this is exactly what Equation (3) states. Cubadda et al. (2013) relied on a particular case of this assumption, namely q = 1, in order to build business cycle indicators in a medium-dimensional framework. Notice that Assumption (3) is equivalent to postulating the following structure for each series yit q P yit = δi (L)yit−1 + βij (L)fjt−1 + εit ,

(4)

j=1

i h PN Pq where δi (L) = φii (L) − j=1 βij (L)ωij , and fjt = k=1 ωkj ykt for j = 1, ..., q. Following Reinsel (1983),

we refer to the q-dimensional series ft = [f1t , .., fqt ] as the index variables and we label (4) as the IndexAugmented Auto-Regressive (IAAR) model for series yit . Under assumption (3), model (1) can rewritten as follows p r P P yt = Dh yt−h + βh ω ′ yt−h + εt , h=1

(5)

h=1

where Dh is a N × N diagonal matrix for h = 1, ..., p, and βh and ω are N × q matrices for h = 1, ..., r ≤ p.

Note that, for more generality, we allow for the number r of the lagged indexes to be smaller than the order p of the autoregressive component. In terms of parsimony, Equation (5) needs N (p + qr + q) − q 2 parameters

instead of N 2 p in Equation (1).1 Previewing the results of the empirical analysis, we can anticipate that

both r and p are typically small in our applications, leading to a large reduction of the number of parameters w.r.t. the VAR. The Multivariate IAAR (MIAAR) representation makes apparent that the Multivariate Autoregressive Index model (MAI) by Reinsel (1983) is a special case of (5) that is obtained by imposing Dh = 0 for h = 1, ..., q. Recently, there has been a renewed interest in the MAI. For instance, Carriero et al. (2016) derived classical and Bayesian estimation of large MAIs and applied this modelling for structural analysis, whereas Cubadda et al. (2017) proposed a multivariate realized volatility model that is endowed with a common index structure. We believe that extending the MAI by allowing for variable-speciﬁc AR structures is of particular value for forecasting purposes. Our main motivation comes from the observation that parameter reduction in the MAI is essentially obtained by endowing the VAR coeﬃcient matrices with a reduced-rank structure. However, Carriero et al. (2011) and Bernardini and Cubadda (2015) documented that univariate models often outperform medium-large reduced-rank VAR models at short forecasting horizons. These ﬁndings suggest there should be room for improvement in forecasting by combining the MAI with individual AR components. 1 Notice

that, since the matrix ω is assumed to be full-rank, we can always normalize it as ω ′ = [Iq , ̟′ ], where ̟ is a

(n − q) × q matrix.

4

2.2

Maximum likelihood estimation

With respect to the MAI, optimal estimation of parameters of Model (5) is more complicated due to the presence of the variable-speciﬁc AR structures. Based on Boswijk (1995) and Cubadda et al. (2017), we oﬀer a switching algorithm that has the property to increase the Gaussian likelihood in each step. In details, the procedure goes as follows 1. In view of Equation (5) and given (initial) estimates of both D = [D1 , ..., Dp ] and ω, maximize the conditional Gaussian likelihood L(β, Σ|D, ω) by estimating β = [β1 , ..., βr ] and Σ with OLS on the following equation

yt −

p P

Dh yt−h

=

h=1

r P

βh ω ′ yt−h + εt ,

h=1

2. Premultiply by Σ−1/2 and apply the Vec operator to both the sides of Equation (5), then use the property Vec(ABC) = (C ′ ⊗ A)Vec(B) to get Vec(Σ−1/2 yt ) =

p P

h=1

′ (yt−h ⊗ Σ−1/2 )Vec(Dh ) +

r P

h=1

′ (yt−h ⊗ Σ−1/2 βh )Vec(ω ′ ) + Vec(Σ−1/2 εt )

Finally, reparametrize the above model as Vec(Σ−1/2 yt ) =

p P

h=1

′ ⊗ Σ−1/2 )M ]δh + [(yt−h

r P

h=1

′ ⊗ Σ−1/2 βh )Vec(ω ′ ) + Vec(Σ−1/2 εt ), (yt−h

(6)

where δh is a N -vector such that Dh = diag(δh ), and M is a binary N 2 × N -matrix whose generic

element mij is such that

1 mij = 0

if i = 1 + (j − 1)(N + 1),

j = 1, ..., N

otherwise

Given the previously obtained estimates of β and Σ, maximize L(D, ω|β, Σ) by estimating [δ1 , ..., δp ] and Vec(ω ′ ) with OLS on Equation (6).

3. Repeat steps 1 and 2 till det(Σ) is numerically minimized.2 With respect to Newton-type optimization method, the above algorithm has several pros and one con. The former are that the switching algorithm is computationally simpler, it does not require a normalization condition on the parameters ω, the optimization is explicit at each step, and over-identifying restrictions can be easily imposed on β, D, and ω; the latter is that direct maximization over the full parameter space may be faster. To mitigate this disadvantage of the switching algorithm, it is important to properly choose the initial values for D and ω. We suggest to start with the estimates of these parameters obtained from 2A

general proof of the convergence of this family of iterative procedures is given by Oberhofer and Kmenta (1974).

5

the Dynamic Factor Model (DFM) by Stock and Watson (2002a, 2002b), where ω is estimated by the ﬁrst q principal components of series yt and D is estimated by OLS equation by equation. Apart from computational simplicity, this choice ensures that the MIAAR has a better ﬁt to the data than the DFM even with just one iteration of the switching algorithm.

2.3

Relationships with reduced-rank models

The MIAAR can be seen as a multivariate model with a partial reduced-rank structure and, in this sense, it is closely related to Reduced Rank Regression (RRR), see e.g. Johansen (2008) and the references therein. The basic RRR model reads yt = α

p P

h=1

ψh′ yt−h + εt ,

where ψh and α are N × q matrices.

The RRR model is widely used because of its interpretation in terms of common features and its ease in

estimation, see e.g. Engle and Kozicki (1993). Moreover, it implies that the marginal processes of series yt follow parsimonious univariate models, thus solving the so-called autoregressivity paradox (Cubadda et al., 2009). However, univariate models often forecast better than medium-large RRR models in the short run even if the dimensionality problem is properly addressed in estimation, see Carriero et al. (2011) and Bernardini and Cubadda (2015). These ﬁndings advocate to extend the basic RRR setup into the following Augmented RRR (ARRR) model yt =

p P

Ch yt−h + α

h=1

r P

h=1

ψh′ yt−h + εt ,

where Ch is a N × N diagonal matrix for h = 1, ..., p.

The ARRR, which, to the best of our knowledge, has not been proposed before, can be optimally estimated

by a variant of the switching algorithm reported in Subsection (2.2). In particular, it is required to iteratively alternate between yt −

p P

=α

Ch yt−h

h=1

r P

h=1

ψh′ yt−h + εt ,

keeping [C1 , ..., Cp ] and [ψ1 , ..., ψr ] ﬁxed, and Vec(Σ−1/2 yt ) =

p P

h=1

′ ⊗ Σ−1/2 )M ]γh + [(yt−h

r P

h=1

′ ⊗ Σ−1/2 α)Vec(ψh′ ) + Vec(Σ−1/2 εt ), (yt−h

where γh is a N -vector such that Ch = diag(γh ), keeping α and Σ ﬁxed. Notice that the assumption underlying the ARRR is the following N P

k6=i

φik (L) =

q P

j=1

6

αij

N P

k6=i

ψkj (L),

(7)

where ψkj (L) is a scalar polynomial of order r − 1 (r ≤ p) and αij is a scalar, which implies the following structure for each series yit

yit = λi (L)yit−1 +

q P

αij ξjt−1 + εit ,

j=1

h i Pq PN where λi (L) = φii (L) − j=1 αij ψij (L) , and ξjt = k=1 ψkj (L)yit .

Although Assumptions (3) and (7) look similar, the interpretation of the latter is more tedious since

the weights of factors [ξ1t , .., ξqt ] are lag speciﬁc. Hence, Assumption (7) cannot be justiﬁed on the grounds of the existence of common shocks among the variables. Moreover, it is easy to see that the properties of the RRR model in terms of common features and parsimony of the marginal models are not shared by the ARRR model. However, we stress that MIAAR and ARRR boil down to the same model when r = 1, a case that is relevant for the empirical applications of this paper.

2.4

Relationships with dynamic factor models

Remarkably, the mathematical formulation of the IAAR model (4) is basically the same as the single equation representation that is implied by the largely popular DFM, see e.g. Stock and Watson (2011) and the references therein. This implies that the two models generate forecasts in a similar manner. For both the IAAR and DFM, the parameters of the forecasting equations are estimated by projecting the variable of interest onto its own lags and those of few linear combinations of all the variables in the system. Notwithstanding this similarity, there are also important diﬀerences between the MIAAR and the DFM. First, the DFM is obtained by postulating that the predictors have a factor structure, whereas the MIAAR model is obtained through restrictions (3) on each of the VAR’s equations, which can be easily tested for within a ML framework. As shown by Stock and Watson (2005), the DFM imposes instead several kinds of overidentifying restrictions to the VAR, which are usually rejected by the data. Second, at the estimation level, the indexes are constructed taking into account the comovements between the target variables and the predictors whereas factors in the DFM are typically obtained by principal component methods, which look at the internal variability of the predictors only.3 Third, estimation of the DFM requires that not only T but even N increases without bounds, which is quite at odds with ample empirical evidences suggesting that forecasting performances deteriorate when the N increases beyond a medium dimension, see, inter alia, Watson (2003), Boivin and Ng (2006), and Caggiano et al. (2011). Provided that q is small and T is large, the MIAAR approach can, in principle, be applied even to medium-large VARs. However, ML estimation may perform poorly when either N or (p + q) approaches 3 Some

supervised procedures for factors extraction have recently been proposed to overcome this limitation of the DFM. In

the empirical section, we will use one of them as a competitor of the methods considered here.

7

T because inversions of large covariance matrices are then required in step 2. We tackle this issue in the following Subsection.

2.5

Regularized estimation

Overall, ML estimation of Model (5) requires to invert three covariance matrices. The ﬁst one is the sample covariance matrix of the lagged indexes [ft−1 , .., ft−r ]′ in step 1 of the switching algorithm. Since we will observe in the empirical application that the identiﬁed number of indexes infrequently exceeds 6 with r = 1, we can conclude that no regularization is generally required in this step. Second, the square root of the residual covariance matrix S of the ﬁrst step regression needs to be inverted for step 2. Obviously, S is singular when N > T and its inverse becomes numerically unstable as N approaches T . Third, the OLS estimate of parameters Θ = δ1′ , ..., δp′ , Vec(ω ′ )′

′

in step 2 requires the inverse of the matrix X ′ X, where X |{z}

N T ×N (p+q)

=

Y−1 ⊗ S −1/2 M, ..., Y−p ⊗ S −1/2 M,

r P

h=1

(Y−h ⊗ S −1/2 βbh ) ,

(8)

′ Y−h = [y1−h , ..., yT −h ] for h = 1, ..., p, and βbh is the estimate of βh for h = 1, ..., r. Since N is a common

factor of both the number of rows and columns of X, numerical instability in inverting X ′ X arises when (p + q) is large.

When either N or (p+q) are relatively large compared to T , the procedure that we suggest to improve the ﬁnite sample performance of the ML estimator goes as follows. After reaching convergence of the switching algorithm, use a shrinkage estimator of Σ in place of S to construct the X matrix. Then use a ridge regression in place of the OLS in step 2 to obtain estimates of D and ω, and ﬁnally use those to estimate β and Σ in turn as described in step 1. With a slight abuse of terminology, we call this procedure Regularized Maximum Likelihood (RML) estimation of model (5). In particular, the shrinkage estimator that we use for Σ has the form Z = S(1 − α) + αµIm where µ > 0, and α ∈ [0, 1]. In the choice of the optimal α and µ, we follow the approach of Ledoit and Wolf (2003, 2004) that requires to minimize the expected value of the Frobenius distance between the shrinkage estimator Z and the population covariance matrix Σ. The solution for this minimization problem is P σii , µ∗ = i m P P i j V ar(sij ) α∗ = P P ∗ 2 i j [V ar(sij ) + (µ − σij ) ] 8

where sij and σij are, respectively, the generic elements of matrices S and Σ for i, j = 1, ..., N . Extending a previous result by Ledoit and Wolf (2003) for the case that εt is i.i.d with ﬁnite fourth moments, Bernardini and Cubadda (2015) show that α∗ is O(1/T ) even when εt is a linear process with ﬁnite fourth cumulants.4 Finally, the proposed shrinkage estimator is Z ∗ = S(1 − α b∗ ) + α b∗ µIm , where α b∗ = P P i

P

P P i

j

[ (vtij ) LRV

[ (vtij ) + T (b µ∗ − sij )2 ]

j [LRV

,

[ (vtij ) is a consistent estimator of the long-run variance of vtij . vtij = uit ujt , and LRV √ Notice that, since T α b∗ is a T -consistent estimator of the optimal shrinkage constant limT T α∗ , the

µ b∗ = m−1

i sii ,

diﬀerence (S − Z ∗ ) is Op (1/T ); see Ledoit and Wolf (2003) and Bernardini and Cubadda (2015) for further details.

In order to regularize the second step regression, we rely on the following ridge estimator of Θ in place of the OLS one: e ′ Ye e + λIN (p+q) )−1 X e = (X e ′X Θ h i′ −1/2 −1/2 e is obtained from Equation (8) when Z ∗ is used instead where Ye = Vec(Z ∗ yp+1 ), ..., Vec(Z ∗ yT ) , X

of S, and λ is a positive scalar. Following Hoerl et al. (1975), we ﬁx the shrinking parameter λ as λ=

N (p + q)b σ2 b b ′Θ Θ

b = (X e ′ X) e −1 X e ′ Ye , and σ e Θ) b ′ (Y − X e Θ)/N b where Θ b2 = (Y − X (T − p − q).

√

3

From the asymptotic properties of the shrinkage and ridge estimators, it easily follows that RML provides

T -consistent estimates of the MIAAR parameters.

Empirical applications

In this Section we perform an out-of-sample forecasting exercise in order to asses the practical merits of the proposed modelling over traditional forecasting methods. Another issue that we want to tackle is the role of the system dimension on forecasting accuracy. Our interest is motivated by previous contributions showing that more variables are not always beneﬁcial to the forecasting performances of VAR models even if the dimensionality problem is properly addressed in estimation, see e.g. Banbura et al. (2010) and Koop (2013). We use a data-set similar as in Carriero et al. (2011). It consists of 40 US monthly time series comprising real variables, money and prices, and ﬁnancial indicators. The time span is from 1960.02 to 2015.12, hence 4 Strictly

speaking, Bernardini and Cubadda (2015) deal with shrinkage estimation towards a diagonal matrix but the proof

of the stated result for a scalar target is entirely analogous and therefore it is omitted.

9

the overall sample size is T ∗ = 671. All the variables are transformed to achieve stationarity, demeaned and standardized to unit variance. Tables 1-2 report the list of the series under analysis and the relative transformations. Table 1: Data description - #1 − 20 Variable

Transformation

#

Industrial Production Index

(1 − L)ln

1

Employees on Nonfarm Payrolls Private

(1 − L)ln

Eﬀective Fed Funds Interest Rate

(1 − L)

(1 − L12 )(1 − L)ln

CPI-U: all items

2 3 4

NAPM New Orders Index

no transf.

5

NAPM Production Index

no transf.

6

Interest Rate: U.S. Treasury Const Maturities, 1-Yr

(1 − L)

7

Interest Rate: U.S. Treasury Const Maturities, 5-Yr Interest Rate: U.S. Treasury Const Maturities, 10-Yr

(1 − L)

(1 − L)

12

Money Stock: M1

(1 − L )(1 − L)ln

Total Capacity Utilization

(1 − L)

Manufacturing and Trade Sales

(1 − L)ln

Composite Help Wanted Index

(1 − L)

Unemployment Rate Interest Rate: U.S. Treasury Bills, Sec Mkt, 3-Mo

(1 − L)

(1 − L)

8 9 10 11 12 13 14 15

NAPM - Suppliers Deliveries Index (Percent)

no transf.

16

Commercial Paper Rate

(1 − L)

17

Producer Price Index: Finished Goods Producer Price Index: Intermed Mat. Supplies and Components Bond Yield: Moody’s AAA Corporate

(1 − L12 )(1 − L)ln

(1 − L12 )(1 − L)ln

(1 − L)

18 19 20

In order to asses the role of the system dimension on forecasting, we apply the competing methods on four nested data-sets, each of them including series ranging from 1 up to N for N = 4, 10, 20, 40 in Tables 1-2. In particular, we conduct two diﬀerent forecasting exercises: the ﬁrst aims at comparing the overall forecasting performances of the competing methods, whereas the second focuses on forecasting the ﬁrst four key series in Table 1, namely Industrial Production Index (IP), Employees on non farm payrolls (EMP), Eﬀective Federal Funds Interest Rate (FYFF), and Consumer Price Index (CPI). In both cases, we use in

10

Table 2: Data description - #21 − 40 Variable

Transformation

#

Interest Rate: U.S. Treasury Bills, Sec Mkt, 6-Mo

(1 − L)

21

Personal Income Less transfer Payments Producer Price Index: Finished Consumer Goods Employee hours in nonag. establishment Sales of retail stores

(1 − L)ln 12

(1 − L )(1 − L)ln

(1 − L)ln

(1 − L)ln

Real consumption

(1 − L)ln 12

Commercial and Industrial Loans

22 23 24 25 26 27

NAPM Inventories Index

(1 − L )(1 − L)ln

no transf.

28

US Eﬀective Exchange Rate

(1 − L)ln

29

Money supply - M2 real

12

30

12

31

(1 − L )(1 − L)ln

Money Stock: M2

(1 − L )(1 − L)ln

Monetary Base Depository Inst Reserves, Nonborrowed Producer Price Index: Crude Materials SP’s Common Stock: Dividend Yield SP’s Common Stock Price Index SP’s Common Stock Price-Earings Ratio Bond Yield: Moody’s BAA Corporate Personal Income Consumer Credit Outstanding, Nonrevolving

(1 − L12 )(1 − L)ln

(1 − L12 )(1 − L)ln 12

(1 − L )(1 − L)ln

(1 − L)

(1 − L)ln

(1 − L)ln

(1 − L)

(1 − L)ln 12

(1 − L )(1 − L)ln

32 33 34 35 36 37 38 39 40

turn the four data-sets. The forecasting exercises are performed using a rolling window of 10 years, hence T = 120. For ﬁve forecasting methods, iterative forecasts of yτ +h are computed using the predictors xτ = [yτ′ , .., yτ′ −p+1 ]′ for τ = T + p, ..., T ∗ − h and h = 1, 3, 6, 12. The competing methods of ML and RML are univariate AR models,

and two alternative strategies of estimating the parameters of the multivariate model (5)

First, the indexes fτ are obtained as the ﬁrst q Principal Components (PC) of series yτ as advocated by Stock and Watson (2002a, 2002b). Second, the indexes are constructed by Partial Least Squares (PLS) having removed from elements of yτ the linear inﬂuence of their own lags. PLS are a family of statistical procedures aiming at maximizing the covariance between two sets of variables. Cubadda and Guardabascio

11

(2012) and Groen and Kapetianos (2016), inter alia, showed that PLS compares quite well to PC regression in macroeconomic forecasting when the dependent variable is a scalar. However, the approaches used by these authors are not fully appropriate in our case because the resulting indexes would be diﬀerent for each variable. Hence, we use a variant of the multivariate PLS method by Cubadda and Hecq (2011), which consists in computing the weights ω as the eigenvectors corresponding to the q largest eigenvalues of the cross-covariance matrix between the lags of yτ and the residuals of N individual autoregressions of elements of yτ . Both in the case of PC and PLS, the coeﬃcient matrices β and D are then estimated by OLS. At the univariate level, we use AR(13) models, which are able to remove serial correlation from the considered series. The identiﬁcation of the multivariate model requires to specify the number of indexes q as well as the lag order of both the individual variables, p, and the indexes, r. We ﬁx p = 13 as for the univariate AR models. We then compare the results for the various estimation methods letting r vary from 1 up to 6, and we ﬁnd no signiﬁcant gains from using an index lag order longer than 1. Therefore, we use Model (5) with p = 13 and r = 1. Finally, we choose q by means of the information criteria of Akaike (AIC), Schwarz (BIC) and Hannan-Quinn (HQIC), having set the maximum number of indexes equal to 7.5 The RML method requires to estimate the long run variances of eit ejt (i, j = 1, ..N , t = 1, .., T ) where et = [e1t , ..., ent ]′ are the ML residuals. We opt for ignoring the time series dependence in these crossproducts and we simply compute the variances of them. Apart from simplicity, the reason is twofold. First, under correct speciﬁcation of the model, the residuals et are i.i.d. in large samples, so their cross-products should not be signiﬁcantly autocorrelated. Second, Sancetta (2006) shows that there is no substantial gain from accounting for time series dependence in the cross-products when the variable dimension is large, that is typically the situation where the shrinkage is stronger. In Tables 3-4 we report both the average of the mean square forecast errors relative to the AR(13) forecasts (ARMSFE) over the N series and the quartiles of the number of indexes that are selected by each information criterion for each of the multivariate forecasting methods. In order to ﬁnd the set of multivariate models which forecast equally well, we rely on the model conﬁdence set (MCS) analysis by Hansen et al. (2011). In particular, the test for the null hypothesis of equal predictive ability at the 20% level is implemented using a block bootstrap scheme with 5000 resamples. The empirical ﬁndings indicate that RML delivers the most accurate forecasts in 14 cases out of 16. The two exceptions are: N = 10 (N = 40) for h = 12 (1), where ML with q selected by AIC (HQIC) is the best performer. The MCS analysis reveals that RML (ML) [PLS] belongs to the superior set in 15 (3) [1] of the cases, whereas AR and PC are never included among the best performers. Interestingly, RML improves over ML even when N = 4, a dimension for which the shrinkage of the parameters in Σ and Θ is small 5 Notice

that in computing the information criteria the number of parameters that we use is N [(r + 1)q + p] − q 2 and not

N (rq + p). Hence, the indexes are not treated as known quantities as typically done in DFMs.

12

Table 3: ARMSFE and quartiles of the estimated q distribution - N = 4, 10 ARMSFE (N, p, r)

Method/criterion

h=1

h=3

h=6

h = 12

[Q1 Q2 Q3 ]

(4, 13, 0)

AR

100.00

100.00

100.00

100.00

(4, 13, 1)

ML/BIC

97.13

98.85

98.59

97.75

[1 1 1]

(4, 13, 1)

ML/HQIC

97.13

98.85

98.59

97.75

[1 1 1]

(4, 13, 1)

ML/AIC

97.42

98.92

98.67

97.87

[1 1 1]

(4, 13, 1)

RML/BIC

93.67*

96.47*

96.87*

95.55*

[1 1 1]

(4, 13, 1)

RML/HQIC

93.94

96.54

96.91*

95.53*

[1 1 1]

(4, 13, 1)

RML/AIC

94.80

97.27

97.59

96.03

[1 1 1]

(4, 13, 1)

PC/BIC

96.74

97.43

98.97

98.82

[1 1 1]

(4, 13, 1)

PC/HQIC

97.37

97.87

99.08

99.06

[1 1 1]

(4, 13, 1)

PC/AIC

96.75

97.28

98.89

98.81

[1 1 3]

(4, 13, 1)

PLS/BIC

97.77

97.47

98.65

98.56

[1 1 1]

(4, 13, 1)

PLS/HQIC

98.25

97.78

99.66

99.12

[1 1 1]

(4, 13, 1)

PLS/AIC

98.79

98.26

96.87

99.56

[1 1 2]

(10, 13, 0)

AR

100.00

100.00

100.00

100.00

(10, 13, 1)

ML/BIC

92.69

97.16

99.34

98.45

[1 1 1]

(10, 13, 1)

ML/HQIC

91.21

96.58

98.91

97.78

[1 1 2]

(10, 13, 1)

ML/AIC

91.05

96.20

95.95*

94.36*

[1 2 4]

(10, 13, 1)

RML/BIC

90.71

95.53

98.46

98.82

[1 1 1]

(10, 13, 1)

RML/HQIC

89.17

95.54

98.29

98.31

[1 1 2]

(10, 13, 1)

RML/AIC

88.38*

94.50*

95.59*

95.95

[1 2 4]

(10, 13, 1)

PC/BIC

94.51

96.67

99.10

98.44

[1 1 1]

(10, 13, 1)

PC/HQIC

94.09

97.97

99.62

99.20

[1 1 1]

(10, 13, 1)

PC/AIC

93.88

98.45

99.25

97.71

[1 1 4]

(10, 13, 1)

PLS/BIC

93.27

96.36

99.53

98.90

[1 1 1]

(10, 13, 1)

PLS/HQIC

92.70

97.12

100.29

99.15

[1 1 1]

(10, 13, 1)

PLS/AIC

92.83

97.62

100.14

98.94

[1 1 2]

Note: N is the number of the series, p is the lag order of the individual series, r is the lag order of the indexes, ARMSFE is the mean square forecast error relative to the AR(p) forecasts, h is the forecasting horizon, Qi indicates the i-th quartile of the number of indexes. The best result for each pair (N, h) is in bold. * indicates models belonging to the superior set at 20% level

13

Table 4: ARMSFE and quartiles of the estimated q distribution - N = 20, 40 ARMSFE (N, p, r)

Method/criterion

h=1

h=3

h=6

h = 12

(20, 13, 0)

AR

100.00

100.00

100.00

100.00

(20, 13, 1)

ML/BIC

91.92

95.49

98.77

100.41

[1 1 1]

(20, 13, 1)

ML/HQIC

90.89

95.51

97.89

97.80

[1 1 1]

(20, 13, 1)

ML/AIC

93.99

97.11

95.87

96.80

[4 5 6]

(20, 13, 1)

RML/BIC

89.86

94.24

97.58

100.82

[1 1 1]

(20, 13, 1)

RML/HQIC

88.45*

93.34*

95.64

97.16

[1 1 1]

(20, 13, 1)

RML/AIC

90.15

93.63*

94.16*

95.71*

[3 4 6]

(20, 13, 1)

PC/BIC

93.42

97.09

99.70

98.70

[1 1 1]

(20, 13, 1)

PC/HQIC

92.97

96.95

99.42

99.24

[1 1 1]

(20, 13, 1)

PC/AIC

91.68

97.17

99.93

99.26

[1 1 2]

(20, 13, 1)

PLS/BIC

92.88

97.03

100.32

99.92

[1 1 1]

(20, 13, 1)

PLS/HQIC

91.26

96.05

99.20

98.61

[1 1 1]

(20, 13, 1)

PLS/AIC

91.68

96.87

99.71

98.91

[1 1 2]

(40, 13, 0)

AR

100.00

100.00

100.00

100.00

(40, 13, 1)

ML/BIC

92.90

95.72

96.91

98.65

[1 1 1]

(40, 13, 1)

ML/HQIC

90.94*

96.32

98.69

98.37

[1 1 2]

(40, 13, 1)

ML/AIC

101.18

100.43

99.37

101.35

[7 7 7]

(40, 13, 1)

RML/BIC

91.29*

94.57*

95.42*

97.25*

[1 1 1]

(40, 13, 1)

RML/HQIC

91.65

95.24*

97.23

96.84*

[1 2 3]

(40, 13, 1)

RML/AIC

99.21

98.13

97.39

98.30*

[6 7 7]

(40, 13, 1)

PC/BIC

95.60

97.77

100.11

100.62

[1 1 1]

(40, 13, 1)

PC/HQIC

94.94

97.12

99.27

99.50

[1 1 1]

(40, 13, 1)

PC/AIC

94.16

96.72

98.45

99.46

[1 1 2]

(40, 13, 1)

PLS/BIC

94.70

97.09

99.42

99.58

[1 1 1]

(40, 13, 1)

PLS/HQIC

93.49

95.95

98.83

97.99*

[1 1 2]

(40, 13, 1)

PLS/AIC

92.82

95.85

98.05

97.17*

[1 2 3]

See the notes of Table 3.

14

[Q1 Q2 Q3 ]

Table 5: RMSFE - IP Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

RML

HQIC

20

87.85∗

2nd best-method

h=1

ML

AIC

20

90.21∗

2nd best-N

h=1

RML

AIC

10

91.61

h=3

RML

HQIC

20

95.05∗

2nd best-method

h=3

ML

HQIC

20

96.20∗

2nd best-N

h=3

RML

AIC

40

96.48∗

1st best

h=6

PC

AIC

10

93.64∗

2nd best-method

h=6

RML

BIC

40

93.90∗

2nd best-N

h=6

RML

BIC

40

93.90∗

1st best

h = 12

ML

AIC

10

94.61∗

2nd best-method

h = 12

PLS

AIC

40

96.71∗

2nd best-N

h = 12

PLS

AIC

40

96.71∗

st

1

st

1

best

best

Note: RMSFE is the mean square forecast error relative to the AR(13) forecasts. indicates models belonging to the superior set at 20% level

but not null. ML is, however, quite often the second best performer after RML. PC and PLS have similar performances for N = 4, 10, whereas the latter outdoes the former for N = 20, 40. In terms of model selection, no information criterion uniformly dominates the others in the choice of the best forecasting model. With reference to ML and RML, BIC (AIC) is more eﬀective when N = 4, 40 (N = 10, 20), whereas HQIC has quite homogeneous performances over all the dimensions. AIC often selects the best speciﬁcation for PC, whereas the larger is the system dimension the less parsimonious is the best selection criterion for PLS. Regarding the choice of the number of indexes, the most accurate forecasts are obtained with the estimated q that equals 1 or 2 but in the cases of the two largest forecasting horizons when N = 20, for which the optimal q ranges from 3 to 6. Turning our attention to the four key target series, beyond computing the MSFE relative to AR univariate forecast (RMSFE) for each of them, we rely on the MCS analysis as well. For the sake of space, in Tables 5-8 we present the results of the best performer together with: (i) the second best forecasts obtained with a diﬀerent estimation method; (ii) the second best forecasts obtained with a diﬀerent system dimension N . The empirical ﬁndings indicate that RML delivers the most accurate forecasts in 9 cases out of 16. It performs best for IP when h = 1, 3, for EMP in all horizons but h = 12, for FYFF in all horizons but 15

Table 6: RMSFE - EMP Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

RML

HQIC

20

87.85∗

2nd best-method

h=1

ML

AIC

20

90.22∗

2nd best-N

h=1

RML

AIC

10

94.07

h=3

RML

HQIC

20

95.05∗

2nd best-method

h=3

ML

HQIC

20

96.20∗

2nd best-N

h=3

RML

AIC

40

96.48∗

1st best

h=6

RML

AIC

20

91.25∗

2nd best-method

h=6

ML

AIC

20

94.94

best-N

h=6

ML

AIC

10

95.42

best

h = 12

ML

AIC

10

91.23∗

2nd best-method

h = 12

RML

AIC

20

91.77∗

2nd best-N

h = 12

RML

AIC

20

91.77∗

1

st

st

1

2

nd

1

st

best

best

See the notes of Table 5.

Table 7: RMSFE - FYFF Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

ML

HQIC

40

64.35∗

2nd best-method

h=1

RML

HQIC

10

65.34∗

2nd best-N

h=1

RML

HQIC

10

65.34∗

1st best

h=3

RML

BIC

20

84.69∗

2nd best-method

h=3

PLS

HQIC

10

87.20∗

2nd best-N

h=3

RML

AIC

10

85.76∗

1st best

h=6

RML

HQIC

20

91.30∗

2nd best-method

h=6

ML

AIC

10

92.33∗

2nd best-N

h=6

RML

AIC

10

91.46∗

1st best

h = 12

RML

HQIC

4

94.95∗

2nd best-method

h = 12

ML

AIC

10

96.49∗

2nd best-N

h = 12

ML

AIC

10

96.49∗

1

st

best

See the notes of Table 5.

16

Table 8: RMSFE - CPI Ranking

Horizon

Method

Criterion

N

RMSFE

h=1

PC

BIC

40

92.74∗

2nd best-method

h=1

ML

HQIC

40

93.40∗

2nd best-N

h=1

RML

BIC

10

93.46∗

1st best

h=3

PLS

HQIC

10

90.72∗

2nd best-method

h=3

PC

HQIC

20

90.92∗

2nd best-N

h=3

PC

HQIC

20

90.92∗

1st best

h=6

PLS

AIC

10

93.49∗

2nd best-method

h=6

RML

AIC

10

93.61∗

2nd best-N

h=6

PC

AIC

20

95.13

h = 12

RML

AIC

10

88.46∗

2nd best-method

h = 12

PLS

AIC

10

88.85∗

2nd best-N

h = 12

PLS

AIC

40

90.15∗

1

st

1

st

best

best

See the notes of Table 5.

h = 1, and for CPI in the longest horizon. Moreover, RML is the only method that uniquely forms the model superior set in one case. The second best method is ML, which is the best predictor of IP and EMP in the longest horizon, as well as of FYFF in the shortest horizon. PC (PLS) performs best for CPI and IP (FYFF) when h = 6 ( h = 3). The most accurate forecasts are obtained when the number of factors q is chosen according to the HQIC in 50% of cases. Again, the univariate AR model never provides the best result. Regarding the role of the system dimension, the best forecasts are obtained with N = 20 (N = 10) [N = 40] {N = 4} in 7 (6) [2] {1} of the cases. Interestingly, when the best result is obtained using either the smallest or the largest dimension, there is always a forecast obtained using an intermediate dimension that belongs to the superior set as well, whereas the reverse is not true. These results corroborate previous ﬁndings on the merits of medium dimensional models for macroeconomic forecasting.

17

4

Monte Carlo analysis

In this Section we perform a small Monte Carlo study to evaluate the ﬁnite sample performances of ML, RML, and PC. In particular, the Data Generating Process (DGP) is the following model yt =

13 P

Dh yt−h + βω ′ yt−1 + εt ,

h=1

where εt are i.i.d. N (0, Σε ), and all the parameters are set equal to their estimates obtained by PLS in combination with the HQIC. Hence, this Monte Carlo analysis is explicitly designed to be unfair with respect to the (R)ML approach.6 In order to obtain those estimates, we use two subsamples of the variables listed in Table 1: a rather volatile period as the Pre-Great Moderation (1960.01-1984.12, PGM), and a more stable one as the Great Moderation (1985.01-2007.12, GM). We generate systems of N = 20 variables with a number of observations of vector series yt equal to T + 150 for T = 120, 240. The ﬁrst 50 points are used as a burn-in period, the last 100 observations to compute the h−step ahead forecast errors for h = 1 and the intermediate T observations to estimate the competing models. The various combinations of methods and information criteria are evaluated by means of two statistics. First, the ARMSFE for the forecasting horizon h = 1. Second, the percentage with which the estimated number of indexes qb coincides with the true q. In order to asses the signiﬁcance of the diﬀerences between the

two best performers according to each of these statistics, we compute the t-test for the null hypothesis that the diﬀerences between the ARMSFEs are centered on 0, and the McNemar’s test for the null hypothesis that the probabilities of identifying the true number of factors are the same. The results are reported in Tables 9-10.

With the sample size T = 120, we observe that RML in combination with AIC is the best performer according to both the evaluation criteria when the DGP is PGM, whereas all the methods identify the true q when BIC is used and RML in combination with BIC is the best forecaster when the DGP is GM. Results are similar when T = 240. When the DGP is PGM, RML in combination with AIC still delivers the most accurate forecasts but the highest percentage of correct model identiﬁcations is given by ML in combination with AIC, whereas both BIC and HQIC always identify the true q and RML in combination with either BIC or HQIC forecasts best when the DGP is GM. Looking at the comparison between PLS and PC, we notice that PLS outdoes PC with PGM, while their results are similar with GM. 6 In

an alternative Monte Carlo design, the parameters of the simulated MIAAR are those obtained in the empirical application

through RML using the AIC. The results, available upon request, are clearly more favourable for the RML method.

18

Table 9: Simulation results - N = 20, T = 120 Method/Criterion ML/BIC

Statistics

PGM (q = 4)

GM (q = 1)

% in which qb = q

0.00

100.00

% in which qb > q

0.00

0.00

89.71

93.61

% in which qb = q

1.00

99.60

% in which qb > q

0.00

0.40

88.88

93.65

ARMSFE ML/HQIC

ARMSFE ML/AIC

RML/BIC

% in which qb = q

30.00

75.80

% in which qb > q

24.20

24.20

ARMSFE

89.26

95.08

0.00

100.00

% in which qb = q % in which qb > q

0.00

0.00

88.30

91.37

% in which qb = q

0.60

99.80

% in which qb > q

0.00

0.20

86.88

91.38

ARMSFE RML/HQIC

ARMSFE RML/AIC

% in which qb = q

32.80 *

57.40

% in which qb > q

22.80

42.60

ARMSFE PLS/BIC

85.93 ***

% in which qb = q % in which qb > q

PLS/AIC

0.00

0.00 95.42

% in which qb = q

0.00

100.00

% in which qb > q

0.00

0.00

ARMSFE

90.35

95.42

% in which qb = q

15.00

97.20

% in which qb > q

0.80

2.80

ARMSFE PC/BIC

% in which qb = q % in which qb > q

PC/AIC

88.57

95.53

0.00

100.00

0.00

0.00

94.00

95.38

% in which qb = q

1.20

100.00

% in which qb > q

0.00

0.00

ARMSFE

92.69

95.38

% in which qb = q

21.00

100.00

% in which qb > q

5.80

0.00

90.12

95.38

ARMSFE PC/HQIC

93.27 100.00

91.27

ARMSFE PLS/HQIC

0.00

ARMSFE

(P)GM indicates the sample on which the model estimated by PLS/HQIC is used as DGP. * (**) and [***] indicate signiﬁcance at the 10% (5%) and [1%] level of test of equal ARMSFEs, and of the McNemar’s test on the diﬀerences between the proportions in which qb=q for the two best performers. See the notes of Table (3) for the meaning of other acronyms.

19

Table 10: Simulation results - N = 20, T = 240 Method/Criterion ML/BIC

Statistics

PGM (q = 4)

GM (q = 1)

% in which qb = q

0.00

100.00

% in which qb > q

0.00

0.00

88.61

93.20

% in which qb = q

5.80

100.00

% in which qb > q

0.00

0.00

85.12

93.20

ARMSFE ML/HQIC

ARMSFE ML/AIC

% in which qb = q % in which qb > q

4.40

83.91

93.32

% in which qb = q

0.00

100.00

% in which qb > q

0.00

ARMSFE RML/HQIC

RML/AIC

88.38 4.00

100.00

% in which qb > q

0.00

0.00

ARMSFE

84.24

% in which qb = q

72.80

% in which qb > q

6.40 82.80 ***

PLS/AIC

PC/AIC

6.40 92.63

0.00 0.00

0.00

89.51

94.42

% in which qb = q

9.60

100.00

% in which qb > q

0.00

0.00

100.00

ARMSFE

86.25

94.42

% in which qb = q

65.40

99.00

4.80

1.00

84.19

94.44

% in which qb = q

0.80

100.00

% in which qb > q

0.00

0.00

ARMSFE

91.49

94.38

% in which qb = q

30.40

100.00

% in which qb > q

3.40

0.00

ARMSFE

PC/HQIC

93.60

% in which qb > q

% in which qb > q PC/BIC

92.51 ***

% in which qb = q ARMSFE

PLS/HQIC

0.00 92.51 ***

% in which qb = q

ARMSFE PLS/BIC

95.60

8.80

ARMSFE RML/BIC

73.60 ***

ARMSFE

87.56

94.38

% in which qb = q

44.80

99.00

% in which qb > q

40.00

1.00

ARMSFE

84.95

94.38

See the notes of Table (9).

20

5

Conclusions

In this paper we have proposed a new modelling, called MIAAR, that can be derived by means of reasonable and testable restrictions on the VAR parameter space. The basic assumption is that there is a limited number of channels through which each variable is inﬂuenced by the past of other variables in the system. This is line with the common belief that only a few shocks are responsible for the bulk of aggregate ﬂuctuations. Provided that the number of linear combinations that conveys past external information is small compared to the number of variables, the MIAAR is considerably more parsimonious than the unrestricted VAR. The parameters of the MIAAR can be estimated by a switching algorithm that increases the Gaussian likelihood at each iteration. Since ML estimation is known to perform poorly when the number of parameters is large, we also oﬀer a regularized estimator for medium-large VARs. Using both simulations and empirical examples, we have shown that the MIAAR outperforms wellknown methods for macroeconomic forecasting when applied to systems of various dimensions. Moreover, our empirical results conﬁrm that medium dimensional systems are the most useful for macroeconomic forecasting.

References [1] Banbura, M., Giannone, D., and L. Reichlin (2010), Large Bayesian VARs, Journal of Applied Econometrics, 25, 71-92. [2] Bernanke, B., Boivin, J., and P.S. Eliasz (2005), Measuring the Eﬀects of Monetary Policy: A Factor-augmented Vector Autoregressive (FAVAR) Approach, The Quarterly Journal of Economics, 120, 387-422 [3] Bernardini E. and G. Cubadda (2015), Macroeconomic Forecasting and Structural Analysis through Regularized Reduced-Rank Regression, International Journal of Forecasting, 31, 682-691. [4] Boswijk, H.P. (1995), Identiﬁability of cointegrated systems, Tinbergen Institute Working Paper, 78. [5] Boivin, J. and S. Ng (2006), Are more data always better for factor analysis, Journal of Econometrics, 132, 169—194. [6] Caggiano, G., Kapetanios, G., and V. Labhard (2011), Are more data always better for factor analysis? Results for the euro area, the six largest euro area countries and the UK, Journal of Forecasting, 30, 736—752.

21

[7] Carriero, C., Clark, T., and M. Marcellino (2015), Bayesian VARs: speciﬁcation choices and forecast accuracy, Journal of Applied Econometrics, 30, 46—73. [8] Carriero, A. Kapetanios, G., and M. Marcellino (2011), Forecasting Large Datasets with Bayesian Reduced Rank Multivariate Models, Journal of Applied Econometrics, 26, 735-761. [9] Carriero, A. Kapetanios, G., and M. Marcellino (2016), Structural analysis with Multivariate Autoregressive Index models, Journal of Econometrics, 192, 332—348. [10] Chudik, A. and M.H Pesaran (2011), Inﬁnite-dimensional VARs and factor models, Journal of Econometrics, 163, 4-22. [11] Cubadda G. and B. Guardabascio (2012), A Medium-N Approach to Macroeconomic Forecasting, Economic Modelling, 29, 1099-1105. [12] Cubadda G., Guardabascio B., and A. Hecq (2013), A General to Speciﬁc Approach for Constructing Composite Business Cycle Indicators, Economic Modelling, 33, 367-374. [13] Cubadda G., Guardabascio B., and A. Hecq (2017), A Vector Heterogeneous Autoregressive Index Model for Realized Volatility Measures, International Journal of Forecasting, 33, 337—344. [14] Cubadda, G. and A. Hecq (2011), Testing for Common Autocorrelation in Data Rich Environments, Journal of Forecasting, 30, 325—335. [15] Cubadda, G., Hecq A., and F.C. Palm (2009), Studying Interactions without Multivariate Modeling, Journal of Econometrics, 148, 25-35. [16] Engle, R.F. and S. Kozicki (1993), Testing for Common Features (with comments), Journal of Business and Economic Statistics, 11, 369-395. [17] Groen, J. and G. Kapetanios (2016), Revisiting useful approaches to data-rich macroeconomic forecasting, Computational Statistics & Data Analysis, 100, 221—239. [18] Hansen, P.R., Lunde, A., and J. Nason (2011), The model conﬁdence set, Econometrica, 79, 453-497. [19] Hoerl, A.E., Kennard, R.W. and K.F. Baldwin (1975), Ridge Regression: Some Simulations, Communications in Statistics - Theory and Methods, 4 105—123. [20] Johansen, S. (2008), Reduced Rank Regression, in Durlauf, S.N. and L.E. Blume (eds.) The New Palgrave Dictionary of Economics 2nd Edition, Palgrave Macmillan.

22

[21] Kock, A.B. and L. Callot (2015), Oracle inequalities for high dimensional vector autoregressions, Journal of Econometrics, 186, 325—344 [22] Koop, G.M. (2013), Forecasting with Medium and Large Bayesian VARs, Journal of Applied Econometrics, 28, 177—203. [23] Ledoit, O. and M. Wolf (2003), Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, Journal of Empirical Finance, Vol. 10, pp. 603-621. [24] Ledoit, O. and M. Wolf (2004), A well-conditioned estimator for large-dimensional covariance matrices, Journal of Multivariate Analysis, 88, 365-411. [25] Oberhofer, W. and J. Kmenta (1974), A general procedure for obtaining maximum likelihood estimates in generalized regression models, Econometrica, 42, 579—590. [26] Pesaran, M.H., Schuermann, T., and S.M. Weiner (2004), Modelling regional interdependencies using a global error-correcting macroeconometric model, Journal of Business and Economics Statistics, 22, 129-162 [27] Reinsel, G. (1983), Some Results on Multivariate Autoregressive Index Models, Biometrika, 70, 145— 15. [28] Sancetta, A. (2008), Sample Covariance Shrinkage for High Dimensional Dependent Data, Journal of Multivariate Analysis, 99, 949-967. [29] Sims, C. (1980), Macroeconomics and Reality, Econometrica, 48, 1-48. [30] Stock J.H. and M.W. Watson (2002a), Forecasting Using Principal Components from a Large Number of Predictors, Journal of the American Statistical Association, 97, 1167-1179. [31] Stock J.H. and M.W. Watson (2002b), Macroeconomic Forecasting using Diﬀusion Indexes, Journal of Business and Economics Statistics, 20, 147-162. [32] Stock, J.H. and M.W. Watson (2005), Implications of Dynamic Factor Models for VAR Analysis, NBER Working Papers, 11467. [33] Stock, J.H. and M.W. Watson (2011), Dynamic Factor Models, in Clements, M.P., and D.F. Henry (eds.) Oxford Handbook of Economic Forecasting, Oxford University Press. [34] Watson, M.W. (2003), Macroeconomic forecasting using many predictors, in Dewatripont, M., L. Hansen and S. Turnovsky (eds.), Advances in Econometrics, Theory and Applications, Eighth World Congress of the EconometricSociety, Vol. III, 87-115.30.

23

RECENT PUBLICATIONS BY CEIS Tor Vergata

The “Double Expansion of Morbidity” Hypothesis: Evidence from Italy V.Atella, F.Belotti, C.Cricelli, D.Dankova, J.Kopinska, A.Palma and A.Piano Mortari CEIS Research Paper, 396, February 2017 The baraccati of Rome: internal migration, housing, and poverty in fascist Italy (1924-1933) Stefano Chianese CEIS Research Paper, 395, February 2017 Optimal Monetary Policy in a Pure Currency Economy with Heterogenous Agents Nicola Amendola, Leo Ferraris and Fabrizio Mattesini CEIS Research Paper, 394, December 2016 A Closer Look at EU Current Accounts Mariarosaria Comunale CEIS Research Paper, 393, August 2016 The Effect of the Italian Unification on the Comparative Regional Development in Literacy, 1821-1911 Carlo Ciccarelli and Jacob Weisdorf CEIS Research Paper, 392, July 2016 A Vector Heterogeneous Autoregressive Index Model for Realized Volatility Measures Gianluca Cubadda, Barbara Guardabascio and Alain Hecq CEIS Research Paper, 391, July 2016 The Corporate Legality Game. A Lab Experiment on The Impact of Policies, Frames and Information Leonardo Becchetti, Vittorio Pelligra and Fiammetta Rossetti CEIS Research Paper, 390, July 2016 Disentangling Adverse Selection, Moral Hazard and Supply Induced Demand: An Empirical Analysis of The Demand For Healthcare Services Vincenzo Atella, Alberto Holly and Alessandro Mistretta CEIS Research Paper, 389, June 2016 Smallholder productivity and weather shocks: Adoption and impact of widely promoted agricultural practices in Tanzania Aslihan Arslan, Federico Belotti and Leslie Lipper CEIS Research Paper, 388, June 2016 Volatility and Growth with Recursive Preferences Barbara Annicchiarico, Alessandra Pelloni and Fabrizio Valenti CEIS Research Paper, 387, June 2016

DISTRIBUTION Our publications are available online at www.ceistorvergata.it DISCLAIMER The opinions expressed in these publications are the authors’ alone and therefore do not necessarily reflect the opinions of the supporters, staff, or boards of CEIS Tor Vergata. COPYRIGHT Copyright © 2017 by authors. All rights reserved. No part of this publication may be reproduced in any manner whatsoever without written permission except in the case of brief passages quoted in critical articles and reviews. MEDIA INQUIRIES AND INFORMATION For media inquiries, please contact Barbara Piazzi at +39 06 72595652/01 or by email at [email protected] Our web site, www.ceistorvergata.it, contains more information about Center’s events, publications, and staff. DEVELOPMENT AND SUPPORT For information about contributing to CEIS Tor Vergata, please contact Carmen Tata at +39 06 72595615/01 or by e-mail at [email protected]