SPSS Syntax for Applying Rules for Combining ...

82 downloads 2870 Views 741KB Size Report
Apr 6, 2010 - First, we use the SPSS 'Split File' option (task bar: Data, Split File). ... Second, we scroll down to Parameter Estimates in the Table Subtypes.
SPSS Syntax for Applying Rules for Combining Multivariate Estimates in Multiple Imputation Joost R. van Ginkel Leiden University April 6, 2010

1

Files in this zip-file • MI-mul2manual.pdf: this file • incomplete2 imp.sav: A multiple-imputation dataset in SPSS format containing the responses of 300 ’respondents’ to 41 items, denoted V1, . . ., V41. The original incomplete dataset is a simulated dataset from a simulation study by Van Ginkel, Van der Ark & Sijtsma (2007). The data file contains the original incomplete dataset, plus five completed versions of the incomplete dataset. The different versions are indicated by an additional variable imputation , which contains the dataset number (0 indicating the original incomplete dataset). The five completed versions were created using multiple Two-Way imputation for separate scales (Van Ginkel, Van der Ark, & Sijtsma, 2007). Variables V1 to V40 have ordered answer categories ranging from 0 to 4, variable V41 is a dichotomous variable with values 1 and 2. Five percent of the scores are missing. Missing values are indicated by a comma. • ExampleMixed.sav: An example SPSS data file containing the results of five regression analyses resulting from five completed datasets. • ExampleMultinomial.sav: An example SPSS data file containing the results of five multinomial logistic regression analyses resulting from five completed datasets.

1

• ExampleLogistic.sav: An example SPSS data file containing the results of five binary logistic regression analyses resulting from five completed datasets. • ExampleOrdinal.sav: An example SPSS data file containing the results of five ordinal regression analyses resulting from five completed datasets. • MI-mul2.sps: SPSS syntax file. This is a read-only file. • runMI-mul2.sps: SPSS syntax file. This file may be modified to suit your needs (see Below).

2 2.1

About the SPSS Syntax The Purpose

The file MI-mul2.sps is the second version the syntax file MI-mul.sps. The SPSS syntax allows a researcher to draw inferences for multivariate estimates from a dataset with missing values, where the values are estimated multiple times using multiple imputation (Rubin, 1987). When drawing inferences from multiple imputation, a distinction must be made between univariate estimates, for example, a regression coefficient of a continuous or dichotomous variable, and multivariate estimates, for example, multiple regression coefficients of one categorical variable with multiple categories. To test whether such a categorical variable has a significant effect, one overall test must be carried out which tests whether all coefficients of this variable differ significantly from 0. Thus, for multivariate estimates, different rules for multiple imputation apply than for univariate estimates. The SPSS file MI-mul2.sps performs the calculations for combining the results of multivariate estimates. It should be noted that SPSS has a built-in procedure for combining the results of a multiple-imputation dataset. However, this procedure is limited to univariate estimates only. The most important difference with the old file MI-mul.sps is that this new version is more user-friendly than the old one, and that it has been programmed such that the procedure is more compatible with the built-in procedures from SPSS. The old file MI-mul.sps can still be downloaded from the same link as MI-mul2.sps.

2

2.2

The Method

For multivariate estimates, the parameters from the statistical analysis are combined using the following procedure: suppose Q is a k × 1 vector of parameter estimates (for example, a set of regression coefficients of one categorical variable), and we have t = 1, ..., m complete versions of an incomplete ˆ (t) be the sampling estimate of Q in completed dataset t, and dataset. Let Q (t) ˆ (t) . For the m completed let U be a covariance matrix associated with Q datasets, the overall estimate of vector Q is estimated as the mean of m vectors: m 1 X ˆ (t) ¯ Q . (1) Q= m t=1 ¯ has two parts: a between-imputation The covariance matrix associated with Q part, and a within-imputation part. The within-imputation covariance matrix is computed as the mean of m variance estimates: m

1 X (t) U . U¯ = m t=1

(2)

The between-imputation covariance matrix is computed as m

1 X ˆ (t) ¯ 2 B= (Q − Q) . m − 1 t=1

(3)

Define the relative increase in variance r1 due to the missing data across the components of Q as r1 = (1 + m−1 )tr(B U¯ −1 )/k.

(4)

The total variance is computed as T = (1 + r1 )U¯ .

(5)

¯ is tested against parameter value Q using a statistic which The estimate Q is computed as follows: ¯ − Q0 )T T −1 (Q ¯ − Q0 )/k. D1 = (Q

(6)

The p-value for testing Q = Q0 is p = P (Fk,ν1 ≥ D1 ).

(7)

Define t = k(m − 1). The number of degrees of freedom is ν1 = 4 + (t − 4)[1 + (1 − 2t−1 )r1−1 ]2 3

(8)

for t > 4 and ν1 = t(1 + k −1 )(1 + r1−1 )2 /2

(9)

for t ≤ 4. ¯ The idea behind this procedure is that the overall test for estimate Q is corrected for the extra uncertainty caused by the missing data. For more information about multiple imputation, we refer to Schafer (1997) and Rubin (1987).

2.3

Disclaimer and Bugs

It should be emphasized that this SPSS syntax is distributed without any warranty on the part of the author. Although the SPSS syntax has been tested thoroughly, one can never fully exclude the possibility of errors. The author appreciates suggestions and reports of detected errors (please enclose SPSS data file). All correspondence can be sent to Joost R. van Ginkel, Leiden University, Faculty of Social and Behavioural Sciences, PO Box 9555, 2300 RB Leiden, The Netherlands [email protected]

3 3.1

Using the SPSS Syntax Preparing Your SPSS File with parameter estimates.

In order to apply the rules for multiple imputation, we must make a data file with parameters resulting from statistical analysis performed on the m completed datasets. The preparation of this file is now illustrated with an example. Suppose we want to test the effect of variables V41 and V17 and their interaction on the sumscore of items V1 to V10. Variable V41 is a dichotomous background variable and variable V17 is an item with five answer categories, which is chosen only for illustrative purposes. If the data were complete, we could test significance of these effects by means of, for example, analysis of variance or regression analysis. However, because the data are incomplete, and we want to handle the missing data using multiple imputation, we must perform the analyses in several steps. Before performing statistical analyses, 4

Figure 1: Choosing the Split File Option in SPSS.

we must impute the data five times (see file incomplete2 imp.sav). Thus, we get five plausible complete versions of the incomplete dataset. We compute the sumscore of items V1 to V10 in the data file incomplete2 imp.sav. This can either be done using the menu (task bar: Transform, Compute) or using the syntax: COMPUTE score1 = sum(V1 to V10) . EXECUTE . Now that we have computed the sumscore for each completed dataset, we can perform statistical analyses for each of the completed datasets separately, and combine the results into one overall analysis. To this end, we perform the following steps: • First, we use the SPSS ’Split File’ option (task bar: Data, Split File). We choose the variable imputation as grouping variable, and click OK. See Figures 1 and 2. If we now perform statistical analyses, they will be carried out for each completed dataset separately. • Because no explicit rules are available for combining the results of analysis of variance in multiple imputation, the analyses must be performed by means of regression analysis. The procedure mixed models in SPSS 5

Figure 2: Choosing Variable imputation as Grouping Variable.

allows to perform regression analysis with interaction terms and categorical independent variables with more than two answer categories. To obtain a data file that contains the results from the five regression analyses, we may use the ’OMS’ option in SPSS (task bar: Utilities, OMS Control Panel). See Figure 3. To write the results of the statistical analyses to a new data file, we must select Tables in the Output Types menu. Once this option has been selected, we must specify the type of analysis we want to perform. In this example, we must select Mixed in the Command Identifiers menu. Next, we have to specify the types of estimates from the output that we want to write to a data file. As already noted in section 2.2, we need a set of parameter estimates, and a covariance matrix for combining the results of statistical analyses. To this end, we first select Covariance Matrix in the Table Subtypes for Selected Commands menu (see Figure 4). Choosing this option will write the covariance matrices of the parameter estimates to a data file. Second, we scroll down to Parameter Estimates in the Table Subtypes for Selected Commands menu. We click on Parameter Estimates while holding the Ctrl-key. In this way, both the covariance matrix 6

Figure 3: Choosing the OMS Option in SPSS.

and the parameters estimates will be selected. Third, we choose the option that will write the results to a data file (Output Destinations menu, File). By clicking Options and choosing SPSS Data File in the Format menu, the destination file will be an SPSS data file. Fourth, we specify a name for the data file (here, ExampleMixed.sav). See Figure 5. Finally, we click Add and click OK twice. • We are now ready to perform the regression analysis using mixed models (task bar: Analyze, Mixed Models, Linear). The results of the five analyses have to be combined into one overall analysis using the rules ˆ (t) and for multiple imputation. To this end, the parameter estimates Q their covariance matrices have to be displayed in the output. How this is done is shown in Figure 6. • We return to the OMS option (task bar: Utilities, OMS Control Panel). We click on the request that was just created, we click End and click OK twice. The results of the five regression analyses are now written to a new dataset. See Figure 7. Note that Figure 7 only shows a part of the dataset. This part becomes visible when scrolling to the right in the SPSS datasheet. 7

Figure 4: Choosing the OMS Option in SPSS.

Figure 5: Choosing Options Within OMS for Mixed Models.

8

Figure 6: Selecting the Parameters and the Covariance Matrix in Mixed Models.

3.2

Using the Syntax for a Standard Linear Model.

After saving the results, we open the syntax file runMI-mul2.sps (task bar: File, Open, Syntax). The SPSS syntax file looks like INCLUDE ’{path}MI-mul2.sps’. RULESMIMUL FILE = ’{path + filename}’ /ESTIMATE = {estimates} /COV = {covariance matrices} /LEVELSIND = {number of levels independent variables} /M = {number of completed datasets}. The file may be modified to suit your needs. In the first two lines the paths of the files MI-mul2.sps and the file with the estimates have to be specified. For example, if MI-mul2.sps is located in C:\Program Files\SPSS\MI-mul2.sps and the file with the estimates has been saved to C:\DataFiles\ExampleMixed.sav this should be specified as INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixed.sav’ 9

Figure 7: Dataset with Parameter Estimates and Covariance Matrices.

In the next two lines, the variables in the data file ExampleMixed.sav that represent the estimates and the standard errors, are specified. In our example, variable Estimate contains the estimates and variables Intercept to V412V174 contain the variances and covariances of the five completed datasets. This is specified as /ESTIMATE = Estimate /COV = Intercept to V412V174 The line /LEVELSIND is needed to specify the number of levels of each independent variable, plus the number of levels of the interactions. In our example, the independent variables are V17 and V41. The former has five answer categories and the latter has two. The interaction of V17 × V41 has 5 × 2 = 10 levels. In the syntax file runMI-mul2.sps this is specified as /LEVELSIND = 2,5,10 Note that the levels must be specified in the same order as the order of the effects in the file ExampleMixed.sav. Finally, the last line specifies the number of completed datasets (5). Thus, the final syntax looks like this: 10

INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixed.sav’ /ESTIMATE = Estimate /COV = Intercept to V412V174 /LEVELSIND = 2,5,10 /M = 5. Run the syntax file (task bar: Run, All) and the combined results of the five separate analyses will appear in the output. 3.2.1

Adjusted number of degrees of freedom

In our example, the F -tests of the five separate analyses all had 290 degrees of freedom in the denominator (N = 300). The combined results of the five analyses, however, have a much larger number of degrees of freedom in the denomininator. This is because the calculation of the number of degrees of freedom in Equations 8 and 9 are based on the assumption that the sample size is sufficiently large for the asymptotic normal approximation (Schafer, 1997, p. 108). However, for smaller samples, an adjusted number of degrees of freedom may be needed that is smaller than the number of degrees of freedom of the five separate analyses. If we want to use an adjusted number of degrees of freedom in the denominator, we should specify this in the syntax. In the file ExampleMixed.sav, the variable df contains the number of degrees of freedom of each effect. By changing the syntax into INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixed.sav’ /ESTIMATE = Estimate /COV = Intercept to V412V174 /LEVELSIND = 2,5,10 /DF = df /M = 5. the analyses are performed with an adjusted number of degrees of freedom smaller than 290. This loss of degrees of freedom compared to the number of 290 represents the extra uncertainty caused by the missing data. The adjusted number of degrees of freedom is approximated with an extremely complex formula, which we shall not give here. The interested reader is referred to Reiter (2007). Finally, it may be noted that by default the number of degrees of freedom is approximated using Equations 8 and 9, but this can also be specified by means of: 11

INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMixed.sav’ /ESTIMATE = Estimate /COV = Intercept to V412V174 /LEVELSIND = 2,5,10 /DF = 0 /M = 5. where the zero indicates that the standard approximation is used.

3.3

Multinomial Logistic Regression.

Suppose we would like to apply a multinomial logistic regression to the dataset incomplete2 imp.sav with score1 and V41, and the interaction of score1 × V41 as the independent variables, and V17 as the dependent variable. By using the OMS option, an SPSS data file can be obtained that contains the results of these analyses. This can be achieved by choosing Nominal Regression in the Command Identifiers menu, and Asymptotic Covariance Matrix and Parameter Estimates in the Table Subtypes for Selected Commands menu (see Figure 5). Next, a full factorial multinomial logistic regression is carried out (task bar: Analyze, Regression, Multinomial Logistic) for the five completed datasets, and the OMS request is ended. File ExampleMultinomial.sav contains the results of these analyses. If we would like to combine the results of these analyses into one final results, we proceed as follows. The variable that contains the parameter estimates is B, and the variables that contain the covariance matrices, are @0 Intercept to @3 V412score1. Thus, lines 1 to 4 become INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMultinomial.sav’ /ESTIMATE = B /COV = @0_Intercept to @3_V412score1 Furthermore, a continuous variable (here, variable score1), always has one level. Thus, line 5 becomes /LEVELSIND = 1,2,2 (two levels for both variable V41 and the interaction score1 × V41). However, in a multinomial logistic regression, the dependent variable also has a number of levels, which have to be specified. An additional line is added to the syntax for this purpose. In this line, the number of levels of the dependent variables is specified, minus the reference category. Thus, the final syntax looks like this: 12

INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleMultinomial.sav’ /ESTIMATE = B /COV = @0_Intercept to @3_V412score1 /LEVELSIND = 1,2,2 /LEVELSDEP = 4 /M = 5. Furthermore it is important to note that in multinomial logistic regression, no degrees of freedom must be specified. The example data file does have a variable that contains the degrees of freedom, but those degrees of freedom are the degrees freedom of the effects, while in multiple imputation the error degrees of freedom are adjusted. Standard statistical tests for multinomial logistic regression do not have error degrees of freedom.

3.4

Binary Logistic Regression.

For binary logistic regression we cannot use the standard option in SPSS (task bar: Analyze, Regression, Binary Logistic) because this option does not provide an asymptotic covariance matrix of parameters estimates, which we need for combining the results. Instead, we use multinomial logistic regression for this purpose too. By using a binary outcome variable and specifying the first answer category as the reference category, the resulting analysis is equivalent to a binary logistic regression. Thus, we can proceed in the same way as in the former multinomial logistic regression example. File ExampleLogistic.sav contains the results of five logistic regression analyses applied to the five completed datasets. The independent variables of these analyses are score1, V17, and the interaction of score1 × V17. The dependent variable is V41. The syntax that combines the analyses into one result looks like this: INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleLogistic.sav’ /ESTIMATE = B /COV = @0_Intercept to @3_V412score1 /LEVELSIND = 1,5,5 /LEVELSDEP = 1 /M = 5. It may be noted that the penultimate line (/LEVELSDEP = 1) can also be omitted because the default number of levels of the dependent variable (minus the reference category) is 1. 13

3.5

Ordinal Regression.

Finally, we will show how the results of ordinal regression are combined. Again we carry out an analysis with score1, V41, and the interaction of score1 × V41 as the independent variables, and V17 as the dependent variable, only now we assume that variable V17 is ordinal rather than nominal (which makes sense because variable V17 is supposed to be a ratingscale item). The OMS options for this analysis are PLUM in the Command Identifiers menu, and Asymptotic Covariance Matrix and Parameter Estimates in the Table Subtypes for Selected Commands menu (see Figure 5). Next, we carry out a full factorial ordinal regression (task bar: Analyze, Regression, Ordinal) for the five completed datasets, and we end the OMS request. File ExampleOrdinal.sav contains the results of these analyses. One property of an ordinal regression is that it contains multiple intercepts: one for each level of the dependent variable, minus the reference category. The number of intercepts is specified by adding an extra line to the syntax, namely /LEVELSINT = 4. The complete syntax file for combining the results of the ordinal regressions, is given by: INCLUDE ’C:\Program Files\SPSS\MI-mul2.sps’. RULESMIMUL FILE = ’C:\DataFiles\ExampleLogistic.sav’ /ESTIMATE = Estimate /COV = V170 to V412score1 /LEVELSINT = 4 /LEVELSIND = 1,2,2 /M = 5.

References Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika, 94, 502-508. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall. Van Ginkel, J. R., Van der Ark, L. A., & Sijtsma, K. (2007). Multiple imputation for item scores when test data are factorially complex. British Journal of Mathematical and Statistical Psychology, 60, 315-337. 14