Restricted Maximum Likelihood Estimation 1

Running head: Restricted Maximum Likelihood Estimation

Implementing Restricted Maximum Likelihood (REML) Estimation in Structural Equation Models

Mike W.-L. Cheung National University of Singapore

This is a post-print version of the following paper: Cheung, M. W.-L. (2013). Implementing restricted maximum likelihood estimation in structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 20(1), 157–167. http://doi.org/10.1080/10705511.2013.742404

Restricted Maximum Likelihood Estimation 2 Abstract Structural equation modeling (SEM) is now a generic modeling framework for many multivariate techniques applied in the social and behavioral sciences. Many statistical models can be considered either as special cases of SEM or as part of the latent variable modeling framework. One popular extension is the use of SEM to conduct linear mixed-effects modeling (LMM) such as cross-sectional multilevel modeling and latent growth modeling. It is well known that LMM can be formulated as structural equation models. However, one main difference between the implementations in SEM and LMM is that maximum likelihood (ML) estimation is usually used in SEM while restricted (or residual) maximum likelihood (REML) estimation is the default method in most LMM packages. This paper shows how REML estimation can be implemented in SEM. Two empirical examples on latent growth model and meta-analysis are used to illustrate the procedures implemented in OpenMx. Issues related to implementing REML in SEM will be discussed.

Key words: structural equation modeling, multilevel modeling, meta-analysis, restricted maximum likelihood estimation

Restricted Maximum Likelihood Estimation 3 Implementing Restricted Maximum Likelihood (REML) Estimation in Structural Equation Models

Structural equation modeling (SEM) is now a generic modeling framework for many multivariate techniques used in the social and behavioral sciences. Many statistical models can be considered either as special cases of SEM or as part of the latent variable modeling framework. To name a few, for example, MANOVA (Raykov, 2001), item response theory (Takane & Deleeuw, 1987), categorical data analysis (B. O. Muthén, 1984), mixture modeling (Lubke & B. O. Muthén, 2005), and meta-analysis (Cheung, 2008, 2010, in press). Most popular SEM packages, e.g., LISREL (Jöreskog & Sörbom, 1996), EQS (Bentler, 2004) and Mplus (B. O. Muthén & L. K. Muthén, 2010) provide several estimation methods, such as maximum likelihood (ML), generalized least squares, and weighted least squares. Arguably, ML estimation is the most popular estimation method in SEM. One reason of its popularity is that ML estimation is the natural choice when SEM is integrated with other techniques such as mixture modeling and handing missing data. One popular extension of SEM is to conduct linear mixed-effects modeling (LMM). LMM, also known as multilevel models or hierarchical linear models, is yet another popular technique to model data spanning more than one levels. Popular applications are to model nested structure in cross-sectional and longitudinal data. It is well known that LMM can be formulated as structural equation models (Bauer, 2003; Bollen & Curran, 2006; Chou, Bentler, & Pentz, 1998; Curran, 2003; Mehta & Neale, 2005; Mehta, & West, 2000; Rovine & Molenaar, 2000). Nowadays, most SEM packages have implemented LMM into the general SEM framework. The

Restricted Maximum Likelihood Estimation 4 main advantage of integrating LMM into the SEM framework is not only that researchers can use their favorite SEM packages to conduct LMM. The key benefit of the integration is to get the best of both worlds – multilevel structural equation modeling (MLSEM). By using MLSEM, models involving latent variables can be analyzed for data spanning more than one levels. Although the mathematical models of LMM are the same for those implemented in SEM and in conventional LMM packages such as HLM (Raudenbush & Bryk, 2002), MLwiN (Goldstein, 2011) and nlme (Pinheiro & Bates, 2000), there are clear differences in these two implementations. One of them is that predictors are usually considered as random variables in the SEM framework while they are treated as fixed design matrix in the LMM framework. Therefore, means and variance-covariance matrix of the predictors have to be estimated in SEM while they are not estimated in the LMM framework. Another main difference is on the estimation method. ML estimation is the usual choice in SEM while both ML and restricted (or residual) maximum likelihood (REML) estimation can be used in the LMM approach. It is well known in the LMM literature that variance components based on the ML estimation are negatively biased. REML estimation was proposed to minimize the bias. However, REML is not without its own limitations. Since the fixed-effect parameters are removed before estimating the variance components, there is no fixed-effect estimate in REML method. Ad hoc calculations are required to compute the fixed-effect estimates. In order words, likelihood ratio test cannot be applied to compare models involving the fixed-effect parameters. “As to the question 'ML' or 'REML?',” Searle, Casella, and McCulloch (1992) succinctly summarized that “there is probably no hard and fast answer.” Thus, it was not the intention of this paper to argue whether ML or REML is preferable. Although there are pros and

Restricted Maximum Likelihood Estimation 5 cons of using REML, REML is always the default estimation in conventional LMM packages. Many LMM users prefer REML when analyzing nested data. Since SEM is becoming more and more popular as an integrated framework for data analysis, some LMM users may want to use it to conduct LMM. As noted by Bauer (2003, p. 163), one of the current limitations of the SEM approach is that “the restricted maximum likelihood estimator commonly used with multilevel models currently has no counterpart in SEM.” The lack of REML estimation can be an obstacle for some users, especially those who want to replicate their findings in conventional LMM packages. The main purpose of this paper is to illustrate how REML estimation can be implemented in SEM. OpenMx (Boker et al., 2011), an open source SEM package in R (R Development Core Team, 2011), is used to demonstrate the procedures. Readers from the LMM tradition may find the results more comparable to those obtained from conventional LMM packages. Moreover, this paper also attempts to shed light on how new estimation method can be implemented in SEM. The remaining sections of the paper are organized as follows. The next section contains a brief review of how REML estimation can be obtained via analyzing transformed data and loglikelihood function. Two empirical examples on latent growth model and meta-analysis are then presented to demonstrate the procedures. The final section discusses issues of implementing REML in SEM. Estimating Parameters with REML Method There are two equivalent model specifications on LMM. One is based on a single equation that specifies both fixed effects and random effects (Laird & Ware, 1982) while the other approach is to represent the models in two levels (Raudenbush & Bryk, 2002). In this paper

Restricted Maximum Likelihood Estimation 6 we use the model advocated by Laird and Ware (1982). This model specification uses the long format which is different from the SEM specification that uses the wide format. As we will discuss later, this model representation makes it easier to implement the REML estimation. Under this model representation, data of the dependent variable in the level 1 unit are stacked into a single column vector yi . yi represents the ith group of participants in cross-sectional data whereas it represents repeated measures of the ith subject in longitudinal data. This format is generally known as the long format. The model for yi of the ith unit is y i = X iβ + Z i ui + e i

(1)

where X i is the design matrix of the fixed effects, β is the parameter vector of the fixed effects, Z i is the design matrix of the random effects, ui is the random effects for the ith unit and ei is the residuals (Pinheiro & Bates, 2000). The random effects and the residuals are normally distributed with u i N 0, G and e i N 0, Ri . It should be noted that X i is the design matrix combining both level-1 and level-2 predictors. Moreover, the between group variance components G is the same for all units while Ri may vary across units. The log-likelihood function without the constant term on yi is logl ML β,G, Ri ; y i =

-1 logVi + ( y i X iβ)T Vi 1 ( y i X iβ) , 2

(2)

where Vi = Z i GZ iT + Ri . This log-likelihood function is the same as that in SEM by setting the model implied mean vectors and variance covariance matrix as μi θ = X iβ and Σ i θ = Z iGZ iT + Ri , respectively. Because of this similar between SEM and LMM, SEM may be

Restricted Maximum Likelihood Estimation 7 used to conduct latent growth models (Bollen & Curran, 2006) and cross-sectional multilevel analysis (Mehta & Neale, 2005). The estimates are based on the ML estimation because both fixed-effects and variance components are estimated simultaneously. One problem of the ML estimation is that the estimated variance components are negatively biased. Take the ML estimate of variance in normally distributed data as an example. If we knew the population mean , an unbiased estimate on variance is xi μ / n where n is 2

the sample size. Since we seldom know the population mean in practice, the ML estimate of the 2

variance is xi x / n where x is the sample mean. It is always true that

x

i

x / n xi μ / n . Thus, the ML estimate of variance is negatively biased. A well2

2

2

known unbiased estimate on variance is xi x / n 1 that adjusts the uncertainty in estimating x . The situation is more complicated in LMM because it cannot be corrected by a simple scalar adjustment. The basic idea behind the REML estimation is to remove the fixed-effects parameters before estimating the variance components. A contrast matrix is chosen in such a way that the fixed-effects parameters are not estimated. Since the fixed-effects parameters are not part of the model, the estimated variance components will not be biased by treating the fixed-effects estimates as known. Let us consider the model of stacking all yi into a single column vector, y = Xβ + Zu + e

(3)

where X and e are stacked over all the units, and Z = Diag Z1, Z 2, ..., Z k and u = Diag u1, u 2, ... , u k are block diagonal matrices. Instead of analyzing y with ML estimation,

Restricted Maximum Likelihood Estimation 8 we may analyze its residuals ~ y . The model on ~ y is ~ y = AXβ + AZu + e

(4)

where A = I X X T X X T with p arbitrary rows removed and I is an identity matrix and p is 1

the number of columns of X (Harville, 1977; Patterson & Thompson, 1971). Several characteristics bear explaining. After the calculations, the contrast A (without deleting the p arbitrary rows) is a k by k matrix where k is the length of ~ y . Since the rank of this matrix is (kp), it is not of full rank. Thus, p redundant rows have to be deleted. Harville (1977) showed that these p rows can be arbitrarily selected without affecting the results. The common practice is to delete the last p rows. Therefore, A becomes a (k-p) by k matrix after deleting the last p rows. It is of importance to note that ~ y is now a column vector with (k-p) rows. In practice, the transpose of it is used in the analysis. That is, the final data is a row vector with (k-p) columns of variables. After the transformation, the expected value of ~ y is E ~ y = =

I X X

T

X X T Xβ 1

X β X X T X X T X β 1

(5)

= 0 The population means of ~ y is always zero regardless of what β is. Effectively, the variance components can be estimated without estimating β. The expected variance-covariance matrix of ~ y is Cov ~ y = AVAT

(6)

Restricted Maximum Likelihood Estimation 9 where V = Diag V1,V2, ...,Vi . Before the transformation, the between unit of y is independent. However, the between units become systematically related after the transformation. In the context of SEM, the variance components can be estimated by fitting a model with μ θ = 0 and Σ θ = AZGZ T + R AT .

Alternatively, the log-likelihood function of the model may be used to directly estimate the variance components with REML. The log-likelihood function without the constant term on ~ y is

logl REML G, R; y =

-1 logV + ( y Xα )T V 1 ( y Xα ) + X TV 1 X 2

(7)

where α = X TV 1 X X T V 1y . Although α is in the log-likelihood function, it is not a function 1

of β . Thus, the log-likelihood function does not include the fixed-effects parameters. Once the variance components have been estimated, we may computer the fixed-effects by

βˆ ( X TVˆ 1 X )1 X TVˆ 1y .

(8)

In the context of SEM, we may fit μ θ = Xβ by fixing Σ θ = ZGˆ Z T + Rˆ obtained from the REML estimation. Empirical Examples Two examples on the latent growth model and mixed-effects meta-analysis were used to illustrate the procedures of the REML estimation. Since latent growth model is fitted the same way as cross-sectional multilevel model under the LMM framework, results can be readily applicable to cross-sectional data. OpenMx (Boker et al., 2011) was used to conduct the analysis. R code for the analyses based on the transformed data and on the log-likelihood is listed in the

Restricted Maximum Likelihood Estimation 10 Appendix. Latent growth model. A classic example from Potthoff and Roy (1964) was used to illustrate fitting a linear growth model. The data set consists of measurements of the distance from the pituitary gland to the pterygomaxillary fissure taken every two years from 8 years of age until 14 years of age on a sample of 27 children. This data set has been used in many LMM papers and textbooks (see Pinheiro & Bates, 2000 for the details). Eight was subtracted from years of age to improve the numerical stability. Thus, the new intercept is located at the age of 8 though it is not estimated under the REML estimation. The model for one subject is y1 y 2 y3 y4

1 1 1 1

0 1 2 0 1 4 1 1 6 1

0 e 2 u 0 e . 4 u1 e 6 e

(9)

where β0 and β1 are the average intercept and average slope, respectively. To be consistent with conventional assumptions in LMM, the level-1 residuals are assumed homogeneous.

u Nevertheless, this assumption can be easily relaxed. The REML estimates of var( 0 ) is u1 3.559 0.089 0.051 while the ML estimate is

3.383 0.095 0.046 . The ML estimate of the variance

component is slightly smaller than that with REML estimation. Meta-analysis. The second example draws from a classic data set in meta-analysis. Colditz et al. (1994) collected effect sizes from 13 clinical trials examining the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine for preventing tuberculosis. A total of 1264 articles or abstracts were identified for meta-analysis on the effectiveness of the BCG. Thirteen studies

Restricted Maximum Likelihood Estimation 11 were used in the illustration. The same data set has been used in many papers illustrating metaanalytic techniques (e.g., Berkey et al., 1995; Viechtbauer & Cheung, 2010). The effect size is the relative risk that indicates the ratio of the risk of having tuberculosis in the BCG vaccine group compared to a group without BCG vaccine. Publication year was used as a potential study characteristic to predict the effect size. The earliest publication year of the data was 1948 that was subtracted from the data to improve numerical stability. This analysis is also known as the mixed-effects meta-analysis or meta-regression. The model is

y1 1 0 1 y 1 1 0 2 0 1 0 y13 1 28

0 0 u0 0 1 0 0 u0 0 1 0 0

0 e1 0 e2 , u0 e13

(10)

where β0 and β1 are the average intercept and the slope, respectively. Mixed-effects meta-analysis can be considered as a special case of LMM (Raudenbush & Bryk, 2002) and SEM (Cheung, 2008, in press) by treating the sampling variance var(e1) to var(e13) as fixed and known. The estimated amount of heterogeneity based on the REML is 0.267. In contrast, the ML estimate of the amount of heterogeneity is 0.212 which is slightly smaller than that with REML. Discussion This paper shows how REML estimation, a popular estimation method in LMM, can be implemented in SEM. Instead of analyzing the raw data, variance components based on the REML estimation can be obtained by analyzing the transformed data in which the fixed-effects parameters have been removed. Since the transformed data do not include the fixed-effects parameters, REML can efficiently adjust the negatively bias of the ML estimates. The following

Restricted Maximum Likelihood Estimation 12 sections discuss some possible extensions and issues related to the implementation of REML estimation in SEM. Computation Efficiency and Numerical Stability As demonstrated above, there are two equivalent approaches to implementing the REML estimation in SEM. One approach is to analyze the transformed data with Equations 4 to 6. The other approach is to fit the log-likelihood function in Equation 7 directly. Analyzing the transformed data seems to be more appealing to many SEM users because only the implied mean and implied variance-covariance matrix are required. There is one main limitation, however. The input data is a (k-p) columns of variables with one row (or subject in SEM). Mehta and Neale (2005) has also noted that this approach was not efficient when “subjects” are treated as “variables” in handling nested data. The situation becomes worse in implementing REML estimation. It is because the implied variance-covariance matrix is a matrix with many constraints as shown in Equation 6. Thus, numerical stability may be an issue. On the other hand, the approach based on the log-likelihood function is more stable. The main limitation is that most SEM packages cannot implement arbitrary fitting functions. Future research needs to address how to obtain REML estimates more efficiently. Possible Extensions One main characteristic of the model in Equation 3 is that it specifies the fixed-effects design matrix and the random-effects design matrix separately. A transformation matrix based on the fixed effects is used to remove the fixed-effects parameters. This means that the REML estimation can be readily extended to other LMM such as multivariate LMM, multivariate metaanalysis and cross-classified LMM (Goldstein, 2011). As long as these models can be specified

Restricted Maximum Likelihood Estimation 13 via Equation 3, the transformation matrix A can be calculated to remove the fixed-effects parameters. For example, Cheung (2011) uses a SEM approach to conduct univariate and multivariate meta-analysis. REML using Equation 7 may be used to obtain the variance components in meta-analysis. Generalization to Models with Latent Variables The current REML implementation is based on the model in Equations 3 and 4 that specify the fixed effects and the random effects separately. This approach does not work for models that cannot be parameterized as Equation 3. Take a nonlinear growth model with estimated factor loadings as an example (Bollen & Curran, 2006). Suppose there are 4 measures, the model with estimated factor loadings is y1 y 2 y3 y4

1 0 1 0 e 1 1 1 1 u e 0 0 . 1 3 1 1 3 u1 e 1 4 1 4 e

(11)

This model captures the non-linear trajectories in development by estimating the trajectories from data empirically. Since λ3 and λ4 are included in both design matrices, it is not possible to find a transformation matrix A without estimating λ3 and λ4 . Further research should address how REML estimation can be implemented in general MLSEM. To summarize, the addition of REML estimation in SEM makes the SEM approach more appealing to the LMM users. They may find the results more comparable to conventional LMM packages. This approach is especially useful when the number of subjects is small.

Restricted Maximum Likelihood Estimation 14 References Bauer, D. (2003). Estimating multilevel linear models as structural equation models. Journal of Educational and Behavioral Statistics, 28(2), 135-167. doi:10.3102/10769986028002135 Bentler, P. M. (2004). EQS 6 [Computer software]. Encino, CA: Multivariate Software. Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. Statistics in Medicine, 14(4), 395-411. doi:10.1002/sim.4780140406 Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., Spies, J., et al. (2011). OpenMx: An Open Source Extended Structural Equation Modeling Framework. Psychometrika, 76(2), 306-317. doi:10.1007/s11336-010-9200-6 Bollen, K. A., & Curran, P. (2006). Latent Curve Models: A Structural Equation Perspective. Hoboken, N.J.: Wiley-Interscience. Cheung, M. W. L. (2008). A model for integrating fixed-, random-, and mixed-effects metaanalyses into structural equation modeling. Psychological Methods, 13(3), 182-202. doi:10.1037/a0013163 Cheung, M. W. L. (2010). Fixed-effects meta-analyses as multiple-group structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 17(3), 481-509. doi:10.1080/10705511.2010.489367 Cheung, M. W. L. (2011). metaSEM: Meta-analysis: A Structural Equation Modeling Approach. R package version 0.7-0. Retrieved November 11, 2011, from http://courses.nus.edu.sg/course/psycwlm/Internet/metaSEM/ Cheung, M. W. L. (in press). Multivariate meta-analysis as structural equation models. Structural

Restricted Maximum Likelihood Estimation 15 Equation Modeling: A Multidisciplinary Journal. Chou, C.-P., Bentler, P. M., & Pentz, M. A. (1998). Comparisons of two statistical approaches to study growth curves: The multilevel model and the latent curve analysis. Structural Equation Modeling: A Multidisciplinary Journal, 5(3), 247. doi:10.1080/10705519809540104 Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG Vaccine in the Prevention of Tuberculosis. JAMA: The Journal of the American Medical Association, 271(9), 698 -702. doi:10.1001/jama.1994.03510330076038 Curran, P. (2003). Have multilevel models been structural equation models all along? Multivariate Behavioral Research, 38(4), 529-568. doi:10.1207/s15327906mbr3804_5 Goldstein, H. (2011). Multilevel statistical models (4th ed.). Hoboken, N.J.: Wiley. Harville, D. A. (1977). Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems. Journal of the American Statistical Association, 72(358), 320338. doi:10.2307/2286796 Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: A user’s reference guide. Chicago, IL: Scientific Software International, Inc. Laird, N. M., & Ware, J. H. (1982). Random-Effects Models for Longitudinal Data. Biometrics, 38(4), 963-974. doi:10.2307/2529876 Lubke, G. H., & Muthén, B. (2005). Investigating Population Heterogeneity With Factor Mixture Models. Psychological Methods, 10(1), 21-39. doi:10.1037/1082-989X.10.1.21 Mehta, P. D., & Neale, M. C. (2005). People are variables too: multilevel structural equations

Restricted Maximum Likelihood Estimation 16 modeling. Psychological Methods, 10(3), 259-284. doi:10.1037/1082-989X.10.3.259 Mehta, P., & West, S. (2000). Putting the individual back into individual growth curves. Psychological Methods, 5(1), 23-43. doi:10.1037//1082-989X.5.1.23 Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115-132. doi:10.1007/BF02294210 Muthén, B. O., & Muthén, L. K. (2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén. Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545 -554. doi:10.1093/biomet/58.3.545 Pinheiro, J. C., & Bates, D. M. (2000). Mixed Effects Models in S and S-Plus. New York: Springer. Potthoff, R. F., & Roy, S. N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 51(3-4), 313 -326. doi:10.1093/biomet/51.3-4.313 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods. Thousand Oaks: Sage Publications. Raykov, T. (2001). Testing Multivariable Covariance Structure and Means Hypotheses via Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 8(2), 224. doi:10.1207/S15328007SEM0802_4 R Development Cor Team. (2011). R: A Language and Environment for Statistical Computing. Vienna, Austria. Retrieved from http://www.R-project.org/

Restricted Maximum Likelihood Estimation 17 Rovine, M., & Molenaar, P. (2000). A structural modeling approach to a multilevel random coefficients model. Multivariate Behavioral Research, 35(1), 51-88. doi:10.1207/S15327906MBR3501_3 Searle, S. R., Casella, G., & McCulloch, C. E. (1992). Variance Components. New York: WileyInterscience. Takane, Y., & Deleeuw, J. (1987). On the relationship between item response theory and factoranalysis of discretized variables. Psychometrika, 52(3), 393-408. doi:10.1007/BF02294363 Viechtbauer, W., & Cheung, M. W. L. (2010). Outlier and influence diagnostics for metaanalysis. Research Synthesis Methods, 1(2), 112-125. doi:10.1002/jrsm.11

Restricted Maximum Likelihood Estimation 18 Author Note Mike W.-L. Cheung, Department of Psychology, National University of Singapore. This research was supported by the Academic Research Fund Tier 1 (R581-000-111-112) from the Ministry of Education, Singapore. Correspondence concerning this article should be addressed to Mike W.-L. Cheung, Department of Psychology, Faculty of Arts and Social Sciences, National University of Singapore, Block AS4, Level 2, 9 Arts Link, Singapore 117570. Tel: (65) 6516-3702; Fax: (65) 6773-1843; E-mail: [email protected]

Restricted Maximum Likelihood Estimation 19 Endnote 1

The data sets, R code, and output are available at

http://courses.nus.edu.sg/course/psycwlm/internet/remlSEM.zip.

Restricted Maximum Likelihood Estimation 20 Appendix Obtaining Variance Components with REML in R: Illustration with Linear Growth Model and Mixed-effects Meta-analysis library(OpenMx)

# Library to conduct SEM

#### Example on linear growth model data(Orthodont, package="nlme") # Sample data Orthodont # Show the data y

Running head: Restricted Maximum Likelihood Estimation

Implementing Restricted Maximum Likelihood (REML) Estimation in Structural Equation Models

Mike W.-L. Cheung National University of Singapore

This is a post-print version of the following paper: Cheung, M. W.-L. (2013). Implementing restricted maximum likelihood estimation in structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 20(1), 157–167. http://doi.org/10.1080/10705511.2013.742404

Restricted Maximum Likelihood Estimation 2 Abstract Structural equation modeling (SEM) is now a generic modeling framework for many multivariate techniques applied in the social and behavioral sciences. Many statistical models can be considered either as special cases of SEM or as part of the latent variable modeling framework. One popular extension is the use of SEM to conduct linear mixed-effects modeling (LMM) such as cross-sectional multilevel modeling and latent growth modeling. It is well known that LMM can be formulated as structural equation models. However, one main difference between the implementations in SEM and LMM is that maximum likelihood (ML) estimation is usually used in SEM while restricted (or residual) maximum likelihood (REML) estimation is the default method in most LMM packages. This paper shows how REML estimation can be implemented in SEM. Two empirical examples on latent growth model and meta-analysis are used to illustrate the procedures implemented in OpenMx. Issues related to implementing REML in SEM will be discussed.

Key words: structural equation modeling, multilevel modeling, meta-analysis, restricted maximum likelihood estimation

Restricted Maximum Likelihood Estimation 3 Implementing Restricted Maximum Likelihood (REML) Estimation in Structural Equation Models

Structural equation modeling (SEM) is now a generic modeling framework for many multivariate techniques used in the social and behavioral sciences. Many statistical models can be considered either as special cases of SEM or as part of the latent variable modeling framework. To name a few, for example, MANOVA (Raykov, 2001), item response theory (Takane & Deleeuw, 1987), categorical data analysis (B. O. Muthén, 1984), mixture modeling (Lubke & B. O. Muthén, 2005), and meta-analysis (Cheung, 2008, 2010, in press). Most popular SEM packages, e.g., LISREL (Jöreskog & Sörbom, 1996), EQS (Bentler, 2004) and Mplus (B. O. Muthén & L. K. Muthén, 2010) provide several estimation methods, such as maximum likelihood (ML), generalized least squares, and weighted least squares. Arguably, ML estimation is the most popular estimation method in SEM. One reason of its popularity is that ML estimation is the natural choice when SEM is integrated with other techniques such as mixture modeling and handing missing data. One popular extension of SEM is to conduct linear mixed-effects modeling (LMM). LMM, also known as multilevel models or hierarchical linear models, is yet another popular technique to model data spanning more than one levels. Popular applications are to model nested structure in cross-sectional and longitudinal data. It is well known that LMM can be formulated as structural equation models (Bauer, 2003; Bollen & Curran, 2006; Chou, Bentler, & Pentz, 1998; Curran, 2003; Mehta & Neale, 2005; Mehta, & West, 2000; Rovine & Molenaar, 2000). Nowadays, most SEM packages have implemented LMM into the general SEM framework. The

Restricted Maximum Likelihood Estimation 4 main advantage of integrating LMM into the SEM framework is not only that researchers can use their favorite SEM packages to conduct LMM. The key benefit of the integration is to get the best of both worlds – multilevel structural equation modeling (MLSEM). By using MLSEM, models involving latent variables can be analyzed for data spanning more than one levels. Although the mathematical models of LMM are the same for those implemented in SEM and in conventional LMM packages such as HLM (Raudenbush & Bryk, 2002), MLwiN (Goldstein, 2011) and nlme (Pinheiro & Bates, 2000), there are clear differences in these two implementations. One of them is that predictors are usually considered as random variables in the SEM framework while they are treated as fixed design matrix in the LMM framework. Therefore, means and variance-covariance matrix of the predictors have to be estimated in SEM while they are not estimated in the LMM framework. Another main difference is on the estimation method. ML estimation is the usual choice in SEM while both ML and restricted (or residual) maximum likelihood (REML) estimation can be used in the LMM approach. It is well known in the LMM literature that variance components based on the ML estimation are negatively biased. REML estimation was proposed to minimize the bias. However, REML is not without its own limitations. Since the fixed-effect parameters are removed before estimating the variance components, there is no fixed-effect estimate in REML method. Ad hoc calculations are required to compute the fixed-effect estimates. In order words, likelihood ratio test cannot be applied to compare models involving the fixed-effect parameters. “As to the question 'ML' or 'REML?',” Searle, Casella, and McCulloch (1992) succinctly summarized that “there is probably no hard and fast answer.” Thus, it was not the intention of this paper to argue whether ML or REML is preferable. Although there are pros and

Restricted Maximum Likelihood Estimation 5 cons of using REML, REML is always the default estimation in conventional LMM packages. Many LMM users prefer REML when analyzing nested data. Since SEM is becoming more and more popular as an integrated framework for data analysis, some LMM users may want to use it to conduct LMM. As noted by Bauer (2003, p. 163), one of the current limitations of the SEM approach is that “the restricted maximum likelihood estimator commonly used with multilevel models currently has no counterpart in SEM.” The lack of REML estimation can be an obstacle for some users, especially those who want to replicate their findings in conventional LMM packages. The main purpose of this paper is to illustrate how REML estimation can be implemented in SEM. OpenMx (Boker et al., 2011), an open source SEM package in R (R Development Core Team, 2011), is used to demonstrate the procedures. Readers from the LMM tradition may find the results more comparable to those obtained from conventional LMM packages. Moreover, this paper also attempts to shed light on how new estimation method can be implemented in SEM. The remaining sections of the paper are organized as follows. The next section contains a brief review of how REML estimation can be obtained via analyzing transformed data and loglikelihood function. Two empirical examples on latent growth model and meta-analysis are then presented to demonstrate the procedures. The final section discusses issues of implementing REML in SEM. Estimating Parameters with REML Method There are two equivalent model specifications on LMM. One is based on a single equation that specifies both fixed effects and random effects (Laird & Ware, 1982) while the other approach is to represent the models in two levels (Raudenbush & Bryk, 2002). In this paper

Restricted Maximum Likelihood Estimation 6 we use the model advocated by Laird and Ware (1982). This model specification uses the long format which is different from the SEM specification that uses the wide format. As we will discuss later, this model representation makes it easier to implement the REML estimation. Under this model representation, data of the dependent variable in the level 1 unit are stacked into a single column vector yi . yi represents the ith group of participants in cross-sectional data whereas it represents repeated measures of the ith subject in longitudinal data. This format is generally known as the long format. The model for yi of the ith unit is y i = X iβ + Z i ui + e i

(1)

where X i is the design matrix of the fixed effects, β is the parameter vector of the fixed effects, Z i is the design matrix of the random effects, ui is the random effects for the ith unit and ei is the residuals (Pinheiro & Bates, 2000). The random effects and the residuals are normally distributed with u i N 0, G and e i N 0, Ri . It should be noted that X i is the design matrix combining both level-1 and level-2 predictors. Moreover, the between group variance components G is the same for all units while Ri may vary across units. The log-likelihood function without the constant term on yi is logl ML β,G, Ri ; y i =

-1 logVi + ( y i X iβ)T Vi 1 ( y i X iβ) , 2

(2)

where Vi = Z i GZ iT + Ri . This log-likelihood function is the same as that in SEM by setting the model implied mean vectors and variance covariance matrix as μi θ = X iβ and Σ i θ = Z iGZ iT + Ri , respectively. Because of this similar between SEM and LMM, SEM may be

Restricted Maximum Likelihood Estimation 7 used to conduct latent growth models (Bollen & Curran, 2006) and cross-sectional multilevel analysis (Mehta & Neale, 2005). The estimates are based on the ML estimation because both fixed-effects and variance components are estimated simultaneously. One problem of the ML estimation is that the estimated variance components are negatively biased. Take the ML estimate of variance in normally distributed data as an example. If we knew the population mean , an unbiased estimate on variance is xi μ / n where n is 2

the sample size. Since we seldom know the population mean in practice, the ML estimate of the 2

variance is xi x / n where x is the sample mean. It is always true that

x

i

x / n xi μ / n . Thus, the ML estimate of variance is negatively biased. A well2

2

2

known unbiased estimate on variance is xi x / n 1 that adjusts the uncertainty in estimating x . The situation is more complicated in LMM because it cannot be corrected by a simple scalar adjustment. The basic idea behind the REML estimation is to remove the fixed-effects parameters before estimating the variance components. A contrast matrix is chosen in such a way that the fixed-effects parameters are not estimated. Since the fixed-effects parameters are not part of the model, the estimated variance components will not be biased by treating the fixed-effects estimates as known. Let us consider the model of stacking all yi into a single column vector, y = Xβ + Zu + e

(3)

where X and e are stacked over all the units, and Z = Diag Z1, Z 2, ..., Z k and u = Diag u1, u 2, ... , u k are block diagonal matrices. Instead of analyzing y with ML estimation,

Restricted Maximum Likelihood Estimation 8 we may analyze its residuals ~ y . The model on ~ y is ~ y = AXβ + AZu + e

(4)

where A = I X X T X X T with p arbitrary rows removed and I is an identity matrix and p is 1

the number of columns of X (Harville, 1977; Patterson & Thompson, 1971). Several characteristics bear explaining. After the calculations, the contrast A (without deleting the p arbitrary rows) is a k by k matrix where k is the length of ~ y . Since the rank of this matrix is (kp), it is not of full rank. Thus, p redundant rows have to be deleted. Harville (1977) showed that these p rows can be arbitrarily selected without affecting the results. The common practice is to delete the last p rows. Therefore, A becomes a (k-p) by k matrix after deleting the last p rows. It is of importance to note that ~ y is now a column vector with (k-p) rows. In practice, the transpose of it is used in the analysis. That is, the final data is a row vector with (k-p) columns of variables. After the transformation, the expected value of ~ y is E ~ y = =

I X X

T

X X T Xβ 1

X β X X T X X T X β 1

(5)

= 0 The population means of ~ y is always zero regardless of what β is. Effectively, the variance components can be estimated without estimating β. The expected variance-covariance matrix of ~ y is Cov ~ y = AVAT

(6)

Restricted Maximum Likelihood Estimation 9 where V = Diag V1,V2, ...,Vi . Before the transformation, the between unit of y is independent. However, the between units become systematically related after the transformation. In the context of SEM, the variance components can be estimated by fitting a model with μ θ = 0 and Σ θ = AZGZ T + R AT .

Alternatively, the log-likelihood function of the model may be used to directly estimate the variance components with REML. The log-likelihood function without the constant term on ~ y is

logl REML G, R; y =

-1 logV + ( y Xα )T V 1 ( y Xα ) + X TV 1 X 2

(7)

where α = X TV 1 X X T V 1y . Although α is in the log-likelihood function, it is not a function 1

of β . Thus, the log-likelihood function does not include the fixed-effects parameters. Once the variance components have been estimated, we may computer the fixed-effects by

βˆ ( X TVˆ 1 X )1 X TVˆ 1y .

(8)

In the context of SEM, we may fit μ θ = Xβ by fixing Σ θ = ZGˆ Z T + Rˆ obtained from the REML estimation. Empirical Examples Two examples on the latent growth model and mixed-effects meta-analysis were used to illustrate the procedures of the REML estimation. Since latent growth model is fitted the same way as cross-sectional multilevel model under the LMM framework, results can be readily applicable to cross-sectional data. OpenMx (Boker et al., 2011) was used to conduct the analysis. R code for the analyses based on the transformed data and on the log-likelihood is listed in the

Restricted Maximum Likelihood Estimation 10 Appendix. Latent growth model. A classic example from Potthoff and Roy (1964) was used to illustrate fitting a linear growth model. The data set consists of measurements of the distance from the pituitary gland to the pterygomaxillary fissure taken every two years from 8 years of age until 14 years of age on a sample of 27 children. This data set has been used in many LMM papers and textbooks (see Pinheiro & Bates, 2000 for the details). Eight was subtracted from years of age to improve the numerical stability. Thus, the new intercept is located at the age of 8 though it is not estimated under the REML estimation. The model for one subject is y1 y 2 y3 y4

1 1 1 1

0 1 2 0 1 4 1 1 6 1

0 e 2 u 0 e . 4 u1 e 6 e

(9)

where β0 and β1 are the average intercept and average slope, respectively. To be consistent with conventional assumptions in LMM, the level-1 residuals are assumed homogeneous.

u Nevertheless, this assumption can be easily relaxed. The REML estimates of var( 0 ) is u1 3.559 0.089 0.051 while the ML estimate is

3.383 0.095 0.046 . The ML estimate of the variance

component is slightly smaller than that with REML estimation. Meta-analysis. The second example draws from a classic data set in meta-analysis. Colditz et al. (1994) collected effect sizes from 13 clinical trials examining the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine for preventing tuberculosis. A total of 1264 articles or abstracts were identified for meta-analysis on the effectiveness of the BCG. Thirteen studies

Restricted Maximum Likelihood Estimation 11 were used in the illustration. The same data set has been used in many papers illustrating metaanalytic techniques (e.g., Berkey et al., 1995; Viechtbauer & Cheung, 2010). The effect size is the relative risk that indicates the ratio of the risk of having tuberculosis in the BCG vaccine group compared to a group without BCG vaccine. Publication year was used as a potential study characteristic to predict the effect size. The earliest publication year of the data was 1948 that was subtracted from the data to improve numerical stability. This analysis is also known as the mixed-effects meta-analysis or meta-regression. The model is

y1 1 0 1 y 1 1 0 2 0 1 0 y13 1 28

0 0 u0 0 1 0 0 u0 0 1 0 0

0 e1 0 e2 , u0 e13

(10)

where β0 and β1 are the average intercept and the slope, respectively. Mixed-effects meta-analysis can be considered as a special case of LMM (Raudenbush & Bryk, 2002) and SEM (Cheung, 2008, in press) by treating the sampling variance var(e1) to var(e13) as fixed and known. The estimated amount of heterogeneity based on the REML is 0.267. In contrast, the ML estimate of the amount of heterogeneity is 0.212 which is slightly smaller than that with REML. Discussion This paper shows how REML estimation, a popular estimation method in LMM, can be implemented in SEM. Instead of analyzing the raw data, variance components based on the REML estimation can be obtained by analyzing the transformed data in which the fixed-effects parameters have been removed. Since the transformed data do not include the fixed-effects parameters, REML can efficiently adjust the negatively bias of the ML estimates. The following

Restricted Maximum Likelihood Estimation 12 sections discuss some possible extensions and issues related to the implementation of REML estimation in SEM. Computation Efficiency and Numerical Stability As demonstrated above, there are two equivalent approaches to implementing the REML estimation in SEM. One approach is to analyze the transformed data with Equations 4 to 6. The other approach is to fit the log-likelihood function in Equation 7 directly. Analyzing the transformed data seems to be more appealing to many SEM users because only the implied mean and implied variance-covariance matrix are required. There is one main limitation, however. The input data is a (k-p) columns of variables with one row (or subject in SEM). Mehta and Neale (2005) has also noted that this approach was not efficient when “subjects” are treated as “variables” in handling nested data. The situation becomes worse in implementing REML estimation. It is because the implied variance-covariance matrix is a matrix with many constraints as shown in Equation 6. Thus, numerical stability may be an issue. On the other hand, the approach based on the log-likelihood function is more stable. The main limitation is that most SEM packages cannot implement arbitrary fitting functions. Future research needs to address how to obtain REML estimates more efficiently. Possible Extensions One main characteristic of the model in Equation 3 is that it specifies the fixed-effects design matrix and the random-effects design matrix separately. A transformation matrix based on the fixed effects is used to remove the fixed-effects parameters. This means that the REML estimation can be readily extended to other LMM such as multivariate LMM, multivariate metaanalysis and cross-classified LMM (Goldstein, 2011). As long as these models can be specified

Restricted Maximum Likelihood Estimation 13 via Equation 3, the transformation matrix A can be calculated to remove the fixed-effects parameters. For example, Cheung (2011) uses a SEM approach to conduct univariate and multivariate meta-analysis. REML using Equation 7 may be used to obtain the variance components in meta-analysis. Generalization to Models with Latent Variables The current REML implementation is based on the model in Equations 3 and 4 that specify the fixed effects and the random effects separately. This approach does not work for models that cannot be parameterized as Equation 3. Take a nonlinear growth model with estimated factor loadings as an example (Bollen & Curran, 2006). Suppose there are 4 measures, the model with estimated factor loadings is y1 y 2 y3 y4

1 0 1 0 e 1 1 1 1 u e 0 0 . 1 3 1 1 3 u1 e 1 4 1 4 e

(11)

This model captures the non-linear trajectories in development by estimating the trajectories from data empirically. Since λ3 and λ4 are included in both design matrices, it is not possible to find a transformation matrix A without estimating λ3 and λ4 . Further research should address how REML estimation can be implemented in general MLSEM. To summarize, the addition of REML estimation in SEM makes the SEM approach more appealing to the LMM users. They may find the results more comparable to conventional LMM packages. This approach is especially useful when the number of subjects is small.

Restricted Maximum Likelihood Estimation 14 References Bauer, D. (2003). Estimating multilevel linear models as structural equation models. Journal of Educational and Behavioral Statistics, 28(2), 135-167. doi:10.3102/10769986028002135 Bentler, P. M. (2004). EQS 6 [Computer software]. Encino, CA: Multivariate Software. Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. Statistics in Medicine, 14(4), 395-411. doi:10.1002/sim.4780140406 Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., Spies, J., et al. (2011). OpenMx: An Open Source Extended Structural Equation Modeling Framework. Psychometrika, 76(2), 306-317. doi:10.1007/s11336-010-9200-6 Bollen, K. A., & Curran, P. (2006). Latent Curve Models: A Structural Equation Perspective. Hoboken, N.J.: Wiley-Interscience. Cheung, M. W. L. (2008). A model for integrating fixed-, random-, and mixed-effects metaanalyses into structural equation modeling. Psychological Methods, 13(3), 182-202. doi:10.1037/a0013163 Cheung, M. W. L. (2010). Fixed-effects meta-analyses as multiple-group structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 17(3), 481-509. doi:10.1080/10705511.2010.489367 Cheung, M. W. L. (2011). metaSEM: Meta-analysis: A Structural Equation Modeling Approach. R package version 0.7-0. Retrieved November 11, 2011, from http://courses.nus.edu.sg/course/psycwlm/Internet/metaSEM/ Cheung, M. W. L. (in press). Multivariate meta-analysis as structural equation models. Structural

Restricted Maximum Likelihood Estimation 15 Equation Modeling: A Multidisciplinary Journal. Chou, C.-P., Bentler, P. M., & Pentz, M. A. (1998). Comparisons of two statistical approaches to study growth curves: The multilevel model and the latent curve analysis. Structural Equation Modeling: A Multidisciplinary Journal, 5(3), 247. doi:10.1080/10705519809540104 Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG Vaccine in the Prevention of Tuberculosis. JAMA: The Journal of the American Medical Association, 271(9), 698 -702. doi:10.1001/jama.1994.03510330076038 Curran, P. (2003). Have multilevel models been structural equation models all along? Multivariate Behavioral Research, 38(4), 529-568. doi:10.1207/s15327906mbr3804_5 Goldstein, H. (2011). Multilevel statistical models (4th ed.). Hoboken, N.J.: Wiley. Harville, D. A. (1977). Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems. Journal of the American Statistical Association, 72(358), 320338. doi:10.2307/2286796 Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: A user’s reference guide. Chicago, IL: Scientific Software International, Inc. Laird, N. M., & Ware, J. H. (1982). Random-Effects Models for Longitudinal Data. Biometrics, 38(4), 963-974. doi:10.2307/2529876 Lubke, G. H., & Muthén, B. (2005). Investigating Population Heterogeneity With Factor Mixture Models. Psychological Methods, 10(1), 21-39. doi:10.1037/1082-989X.10.1.21 Mehta, P. D., & Neale, M. C. (2005). People are variables too: multilevel structural equations

Restricted Maximum Likelihood Estimation 16 modeling. Psychological Methods, 10(3), 259-284. doi:10.1037/1082-989X.10.3.259 Mehta, P., & West, S. (2000). Putting the individual back into individual growth curves. Psychological Methods, 5(1), 23-43. doi:10.1037//1082-989X.5.1.23 Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115-132. doi:10.1007/BF02294210 Muthén, B. O., & Muthén, L. K. (2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén. Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545 -554. doi:10.1093/biomet/58.3.545 Pinheiro, J. C., & Bates, D. M. (2000). Mixed Effects Models in S and S-Plus. New York: Springer. Potthoff, R. F., & Roy, S. N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 51(3-4), 313 -326. doi:10.1093/biomet/51.3-4.313 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods. Thousand Oaks: Sage Publications. Raykov, T. (2001). Testing Multivariable Covariance Structure and Means Hypotheses via Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 8(2), 224. doi:10.1207/S15328007SEM0802_4 R Development Cor Team. (2011). R: A Language and Environment for Statistical Computing. Vienna, Austria. Retrieved from http://www.R-project.org/

Restricted Maximum Likelihood Estimation 17 Rovine, M., & Molenaar, P. (2000). A structural modeling approach to a multilevel random coefficients model. Multivariate Behavioral Research, 35(1), 51-88. doi:10.1207/S15327906MBR3501_3 Searle, S. R., Casella, G., & McCulloch, C. E. (1992). Variance Components. New York: WileyInterscience. Takane, Y., & Deleeuw, J. (1987). On the relationship between item response theory and factoranalysis of discretized variables. Psychometrika, 52(3), 393-408. doi:10.1007/BF02294363 Viechtbauer, W., & Cheung, M. W. L. (2010). Outlier and influence diagnostics for metaanalysis. Research Synthesis Methods, 1(2), 112-125. doi:10.1002/jrsm.11

Restricted Maximum Likelihood Estimation 18 Author Note Mike W.-L. Cheung, Department of Psychology, National University of Singapore. This research was supported by the Academic Research Fund Tier 1 (R581-000-111-112) from the Ministry of Education, Singapore. Correspondence concerning this article should be addressed to Mike W.-L. Cheung, Department of Psychology, Faculty of Arts and Social Sciences, National University of Singapore, Block AS4, Level 2, 9 Arts Link, Singapore 117570. Tel: (65) 6516-3702; Fax: (65) 6773-1843; E-mail: [email protected]

Restricted Maximum Likelihood Estimation 19 Endnote 1

The data sets, R code, and output are available at

http://courses.nus.edu.sg/course/psycwlm/internet/remlSEM.zip.

Restricted Maximum Likelihood Estimation 20 Appendix Obtaining Variance Components with REML in R: Illustration with Linear Growth Model and Mixed-effects Meta-analysis library(OpenMx)

# Library to conduct SEM

#### Example on linear growth model data(Orthodont, package="nlme") # Sample data Orthodont # Show the data y